GEO Optimization Guide — 전체 시리즈 1. What Is GEO - AI Citation Strategy Beyond SEO 2. Each AI Cites Different Sources 3. On-Site GEO Technical Architecture - From Product DB to JSON-LD ← 현재 글 4. Off-Site GEO - How to Win Over AI That Ignores Your Official Site 5. AEO - Why Coding Agents Read Documentation Differently Where Do You Build JSON-LD and Where Does It Go In the previous article , we confirmed that each AI platform prefers different citation sources. Gemini favors official websites, ChatGPT leans on directories, and Perplexity gravitates toward community discussions. One thing they share: pages with structured data get cited more often across all platforms. So the technical core of On-Site GEO boils down to this question. How do you transform product master DB data into JSON-LD and inject it into the HTML ? It sounds simple, but once you dig in, the tangles pile up fast. Product DB field names are cryptic abbreviations. The attributes AI needs do not exist in the DB. Sites built as SPAs cannot serve JSON-LD to crawlers. This article covers how to solve these problems with a structured approach. The Concentric Architecture of a GEO System A GEO system expands outward through four layers. Layer Components Role Core Product Master DB SSOT (Single Source of Truth). The origin of all data Channel Website / Mobile App JSON-LD injection, SSR rendering API Product Query API Interface for AI agent integration Agent ChatGPT / Gemini / Perplexity End consumer touchpoint Data flows from Core through Channel to Agent. Its shape changes at each layer. Raw DB fields become structured JSON-LD, which becomes the citation source in AI answers. The API layer is easy to overlook. You might think just embedding JSON-LD is enough, but once you consider AI agent integrations like ChatGPT Plugins or MCP (Model Context Protocol), a separate API layer becomes necessary. Even if you do not need it right now, accounting for it in the design phase saves pain later. The 3-Stage Data Pipeline Instead of managing product descriptions as monolithic blobs, decompose them into individual fields. AI cites more accurately when data is field-level structured. That is the core idea behind this pipeline. Stage 1: DB Refinement - Field Mapping This stage maps existing product master DB fields to Schema.org fields. You are not creating new data – just organizing what already exists. DB Field → Schema.org Field ───────────────────────────────────────── PROD_NM → name BRND_CD (code lookup) → brand.name GTIN_13 → gtin13 PRC_AMT → offers.price STCK_YN → offers.availability IMG_URL → image CTG_NM → category The field count runs around 15-18 depending on the industry. Since most values already exist in the DB, development effort is modest. The catch is converting code values to human-readable text. You need to transform BRND_CD = P1042 into brand.name = "FoodCo" for AI to understand it. The biggest stumbling block at this stage is GTIN. It is a GS1 standard identifier, and different variants of the same product (size, flavor) need different GTINs. If you lump “Choco Stick Original” and “Choco Stick Almond” under one master code, AI cannot tell them apart. Stage 2: LLM Extraction - AI-Generated Attributes Some attributes AI needs for citation do not exist in the DB. Target users, usage occasions, sentiment keywords. Having humans write these manually becomes impractical when you have thousands of SKUs. Instead, let an LLM read existing product descriptions, reviews, and category data to extract them automatically. Source Field Description Example DB @type Schema.org type Product DB name Product name Gram 16 DB gtin13 GS1 identifier 8801056038800 LLM targetUser Target user Students, professionals LLM occasion Usage occasion Graduation gift, work use LLM sentiment Sentiment keywords Lightweight, sleek LLM nutrition Nutrition info Sugar-free LLM safety Safety info CAS 9002-88-4 LLM extraction fields vary by industry. For food, nutrition facts and ingredients are key. For hotels, amenities and check-in times matter. For chemicals/B2B, it is material properties and certifications. This stage adds 10-15 fields. Combined with Stage 1, each product ends up with 25-33 structured fields. Stage 3: JSON-LD Output - Automated Conversion and SSR Deployment Fields from Stages 1 and 2 are converted into Schema.org-compliant JSON-LD and automatically injected into the HTML via SSR. { "@context": "https://schema.org", "@type": "Product", "name": "Choco Stick Original", "gtin13": "8801234567890", "brand": { "@type": "Brand", "name": "FoodCo" }, "description": "Chocolate-coated crispy stick snack. 200kcal per 46g serving.", "offers": { "@type": "Offer", "price": 1500, "priceCurrency": "KRW", "availability": "https://schema.org/InStock" }, "nutrition": { "@type": "NutritionInformation", "calories": "200 calories", "servingSize": "1 pack (46g)" } } Once this JSON-LD sits inside the tag, the Invisible GEO discussed in Part 1 is complete. Invisible to users, but parsed directly by AI and search engines. See how the before and after compares for a food product in the demo. Demo - JSON-LD Before/After Four Principles for Writing Descriptions The most human-dependent part of the pipeline is product descriptions. Descriptions that AI cites well follow a pattern. Fact-based - Include only objective information. AI ignores advertising copy like “industry-leading” or “customer satisfaction #1.” 100-300 characters - The sweet spot for AI reference. Too short lacks context; too long buries the key points. Natural keywords - As confirmed by Princeton/Georgia Tech research , keyword stuffing actually decreases AI visibility. Weave keywords into natural sentences. Unique per SKU - Copy-pasting the same template with only the product name swapped gets flagged as duplicate content by AI. Each product needs its own description. <meta name="description" content="About Us"/> <meta name="description" content="ChemCo is a global petrochemical company supplying PE/PP products to 50 countries with annual revenue of $11B. Leading ESG management and carbon neutrality by 2050."/> Why SSR Is Non-Negotiable Even if you build the JSON-LD perfectly, it is useless if AI crawlers cannot read it. This is where SPAs (Single Page Applications) become a bottleneck. SPAs require JavaScript execution in the browser to render content. It looks fine to humans, but AI crawlers like GPTBot and Google-Extended mostly do not execute JS. Even if you put JSON-LD in the , when the server sends an empty HTML shell, crawlers see nothing. Switching to SSR (Server-Side Rendering) means the server sends fully rendered HTML, so crawlers can read JSON-LD immediately without JS execution. Here is how it looks with the Next.js App Router: // app/product/[id]/page.tsx export default async function ProductPage({ params }) { const product = await fetchProduct(params.id); const jsonLd = { "@context": "https://schema.org", "@type": "Product", "name": product.name, "gtin13": product.gtin, "brand": { "@type": "Brand", "name": product.brand }, "description": product.description, "image": product.imageUrl, "offers": { "@type": "Offer", "price": product.price, "priceCurrency": "KRW", "availability": "https://schema.org/InStock", "url": product.pageUrl } }; return ( <> <script type="application/ld+json" dangerouslySetInnerHTML={{ __html: JSON.stringify(jsonLd) }} /> <ProductDetail product={product} /> </> ); } The server fetches DB data via fetchProduct, builds the JSON-LD object, and injects it as a ' If your site is still an SPA without SSR, curl results will likely show no JSON-LD. That is exactly why SSR is non-negotiable. Try building JSON-LD yourself with the interactive builder to get a hands-on feel for the structure. Demo - JSON-LD Builder Common issues encountered in practice: Symptom Cause Fix JSON-LD not crawled robots.txt blocking Set GPTBot, Google-Extended to Allow AI not citing data Schema.org type error Validate with Rich Results Test Slow API response No caching Apply Redis caching + minimize fields Server overload after SSR DB query on every request ISR + Redis caching