Anurag Gupta

All posts

Using Gemini to list gemstones on Etsy for my mom

2026-04-19

My mom sells gemstones and crystals. She's good at sourcing them and terrible at writing product listings. Every stone needs a title, description, SEO keywords, healing properties (yes, buyers expect those), sizing info, a quality grade, and six clean product photos arranged in an Etsy-friendly grid. She was spending an hour per stone doing this manually, and she has hundreds of stones.

So I built her a tool. You photograph a stone, feed the image to Gemini on Vertex AI, and get back a full structured listing. Then a second pass generates a six-image product set from the original photo.

Structured output with JSON schema

The interesting technical problem is getting reliable structured data from a multimodal model. Gemini's API supports a response_schema parameter that constrains the model's output to valid JSON matching a provided schema. This isn't "please format as JSON" prompt engineering. The model's decoding is actually constrained by the schema during token generation, so it can't produce invalid structure.

schema = {
    "type": "object",
    "required": ["title", "gemstone_type", "color", "transparency",
                  "cut_shape", "is_dyed", "quality_grade", "chakra",
                  "description", "seo_title", "seo_description"],
    "properties": {
        "title": {"type": "string", "maxLength": 140},
        "gemstone_type": {"type": "string"},
        "color": {"type": "string"},
        "transparency": {
            "type": "string",
            "enum": ["transparent", "translucent", "opaque"]
        },
        "cut_shape": {"type": "string"},
        "size_estimate": {"type": "string"},
        "is_dyed": {"type": "boolean"},
        "quality_grade": {
            "type": "string",
            "enum": ["AAA", "AA", "A", "B", "C"]
        },
        "chakra": {
            "type": "array",
            "items": {
                "type": "string",
                "enum": ["root", "sacral", "solar_plexus",
                         "heart", "throat", "third_eye", "crown"]
            }
        },
        "healing_properties": {"type": "string"},
        "description": {"type": "string"},
        "care_tip": {"type": "string"},
        "origin_story": {"type": "string"},
        "seo_title": {"type": "string", "maxLength": 70},
        "seo_description": {"type": "string", "maxLength": 160}
    }
}

response = model.generate_content(
    [image_part, prompt],
    generation_config=GenerationConfig(
        response_mime_type="application/json",
        response_schema=schema
    )
)
listing = json.loads(response.text)

The enum constraints are the important part. quality_grade can only be one of five values. chakra can only contain valid chakra names. transparency is one of three options. Without these constraints, the model would invent categories ("semi-translucent," "near-opaque") that don't match the Etsy attribute filters.

The is_dyed boolean is there because my mom insists on honesty (she's right), and the model is surprisingly good at detecting treated stones from photos. Dyed agate has characteristic color saturation patterns that differ from natural coloring, and heat-treated amethyst (sold as citrine) has a specific orange-yellow that natural citrine doesn't. The model catches these about 80% of the time based on my testing with 50 known-treated and 50 natural stones.

Marketplace-safe description generation

Healing properties are a minefield. Etsy will flag listings that make medical claims ("amethyst cures headaches"), but buyers expect metaphysical descriptions and filter by them. The prompt instructs the model to frame all healing references with hedging language: "traditionally associated with," "believed by crystal practitioners to," "historically used for." The schema doesn't enforce this (it's a free-text string), so I added a post-processing validation step that rejects descriptions containing banned phrases:

BANNED = ["cures", "heals", "treats", "medical", "therapy",
          "prescription", "diagnosis", "clinical"]

def validate_description(text: str) -> bool:
    lower = text.lower()
    return not any(word in lower for word in BANNED)

If validation fails, the tool re-calls Gemini with the rejection reason appended to the prompt. In practice this happens about 5% of the time, usually when the model is overly specific about alleged benefits.

Image grid generation

The image pipeline uses Pillow. Etsy's listing gallery performs best with a specific layout: first image is a clean product shot on neutral background, second is a scale reference (hand holding the stone), third through fifth are detail angles, sixth is a lifestyle or context shot. Since we're starting from a single photo, the tool generates these variants programmatically:

  1. Hero crop: Center-crop to 1:1 aspect ratio, white balance normalization, slight contrast bump (+10%)
  2. Detail crops: Four quadrant crops at 2x zoom, each centered on the highest-detail region detected via Laplacian variance
  3. Grid composite: Arrange the six images into a 2x3 grid at 2000x3000 pixels (Etsy's recommended listing image size)

The detail-region detection is simple: compute the Laplacian of the grayscale image, threshold, find connected components, sort by area, and crop around the largest components. This consistently picks the most textured/interesting parts of the stone rather than cropping randomly.

gray = cv2.cvtColor(np.array(img), cv2.COLOR_RGB2GRAY)
laplacian = cv2.Laplacian(gray, cv2.CV_64F)
variance_map = ndimage.uniform_filter(laplacian**2, size=50)
# Find top-4 peaks in variance map for detail crops

The backgrounds are neutral (#F5F5F5, slightly off-white) because it makes a real difference in how professional a small seller's shop looks next to the operations that have studio lighting.

Is the code elegant? Not especially. It's a Python script, maybe 400 lines, that calls the Vertex AI API and saves files to disk. But it cut my mom's listing time from an hour to about five minutes per stone, and she doesn't need to touch the code to use it.