GEO Optimization Guide — 전체 시리즈

  1. 1. What Is GEO - AI Citation Strategy Beyond SEO
  2. 2. Each AI Cites Different Sources
  3. 3. On-Site GEO Technical Architecture - From Product DB to JSON-LD ← 현재 글
  4. 4. Off-Site GEO - How to Win Over AI That Ignores Your Official Site
  5. 5. AEO - Why Coding Agents Read Documentation Differently

Where Do You Build JSON-LD and Where Does It Go

In the previous article , we confirmed that each AI platform prefers different citation sources. Gemini favors official websites, ChatGPT leans on directories, and Perplexity gravitates toward community discussions. One thing they share: pages with structured data get cited more often across all platforms.

So the technical core of On-Site GEO boils down to this question. How do you transform product master DB data into JSON-LD and inject it into the HTML <head>?

It sounds simple, but once you dig in, the tangles pile up fast. Product DB field names are cryptic abbreviations. The attributes AI needs do not exist in the DB. Sites built as SPAs cannot serve JSON-LD to crawlers. This article covers how to solve these problems with a structured approach.

The Concentric Architecture of a GEO System

A GEO system expands outward through four layers.

LayerComponentsRole
CoreProduct Master DBSSOT (Single Source of Truth). The origin of all data
ChannelWebsite / Mobile AppJSON-LD injection, SSR rendering
APIProduct Query APIInterface for AI agent integration
AgentChatGPT / Gemini / PerplexityEnd consumer touchpoint

Data flows from Core through Channel to Agent. Its shape changes at each layer. Raw DB fields become structured JSON-LD, which becomes the citation source in AI answers.

The API layer is easy to overlook. You might think just embedding JSON-LD is enough, but once you consider AI agent integrations like ChatGPT Plugins or MCP (Model Context Protocol), a separate API layer becomes necessary. Even if you do not need it right now, accounting for it in the design phase saves pain later.

The 3-Stage Data Pipeline

Instead of managing product descriptions as monolithic blobs, decompose them into individual fields. AI cites more accurately when data is field-level structured. That is the core idea behind this pipeline.

Stage 1: DB Refinement - Field Mapping

This stage maps existing product master DB fields to Schema.org fields. You are not creating new data – just organizing what already exists.

DB Field             →  Schema.org Field
─────────────────────────────────────────
PROD_NM              →  name
BRND_CD (code lookup) → brand.name
GTIN_13              →  gtin13
PRC_AMT              →  offers.price
STCK_YN              →  offers.availability
IMG_URL              →  image
CTG_NM               →  category

The field count runs around 15-18 depending on the industry. Since most values already exist in the DB, development effort is modest. The catch is converting code values to human-readable text. You need to transform BRND_CD = P1042 into brand.name = "FoodCo" for AI to understand it.

The biggest stumbling block at this stage is GTIN. It is a GS1 standard identifier, and different variants of the same product (size, flavor) need different GTINs. If you lump “Choco Stick Original” and “Choco Stick Almond” under one master code, AI cannot tell them apart.

Stage 2: LLM Extraction - AI-Generated Attributes

Some attributes AI needs for citation do not exist in the DB. Target users, usage occasions, sentiment keywords. Having humans write these manually becomes impractical when you have thousands of SKUs.

Instead, let an LLM read existing product descriptions, reviews, and category data to extract them automatically.

SourceFieldDescriptionExample
DB@typeSchema.org typeProduct
DBnameProduct nameGram 16
DBgtin13GS1 identifier8801056038800
LLMtargetUserTarget userStudents, professionals
LLMoccasionUsage occasionGraduation gift, work use
LLMsentimentSentiment keywordsLightweight, sleek
LLMnutritionNutrition infoSugar-free
LLMsafetySafety infoCAS 9002-88-4

LLM extraction fields vary by industry. For food, nutrition facts and ingredients are key. For hotels, amenities and check-in times matter. For chemicals/B2B, it is material properties and certifications.

This stage adds 10-15 fields. Combined with Stage 1, each product ends up with 25-33 structured fields.

Stage 3: JSON-LD Output - Automated Conversion and SSR Deployment

Fields from Stages 1 and 2 are converted into Schema.org-compliant JSON-LD and automatically injected into the HTML <head> via SSR.

{
  "@context": "https://schema.org",
  "@type": "Product",
  "name": "Choco Stick Original",
  "gtin13": "8801234567890",
  "brand": {
    "@type": "Brand",
    "name": "FoodCo"
  },
  "description": "Chocolate-coated crispy stick snack. 200kcal per 46g serving.",
  "offers": {
    "@type": "Offer",
    "price": 1500,
    "priceCurrency": "KRW",
    "availability": "https://schema.org/InStock"
  },
  "nutrition": {
    "@type": "NutritionInformation",
    "calories": "200 calories",
    "servingSize": "1 pack (46g)"
  }
}

Once this JSON-LD sits inside the <head> tag, the Invisible GEO discussed in Part 1 is complete. Invisible to users, but parsed directly by AI and search engines.

See how the before and after compares for a food product in the demo.

Four Principles for Writing Descriptions

The most human-dependent part of the pipeline is product descriptions. Descriptions that AI cites well follow a pattern.

Fact-based - Include only objective information. AI ignores advertising copy like “industry-leading” or “customer satisfaction #1.”

100-300 characters - The sweet spot for AI reference. Too short lacks context; too long buries the key points.

Natural keywords - As confirmed by Princeton/Georgia Tech research , keyword stuffing actually decreases AI visibility. Weave keywords into natural sentences.

Unique per SKU - Copy-pasting the same template with only the product name swapped gets flagged as duplicate content by AI. Each product needs its own description.

<!-- Bad: advertising copy + keyword stuffing -->
<meta name="description" content="About Us"/>

<!-- Good: fact-based, natural language, proper length -->
<meta name="description"
  content="ChemCo is a global petrochemical company
  supplying PE/PP products to 50 countries with
  annual revenue of $11B. Leading ESG management
  and carbon neutrality by 2050."/>

Why SSR Is Non-Negotiable

Even if you build the JSON-LD perfectly, it is useless if AI crawlers cannot read it. This is where SPAs (Single Page Applications) become a bottleneck.

SPAs require JavaScript execution in the browser to render content. It looks fine to humans, but AI crawlers like GPTBot and Google-Extended mostly do not execute JS. Even if you put JSON-LD in the <head>, when the server sends an empty HTML shell, crawlers see nothing.

Switching to SSR (Server-Side Rendering) means the server sends fully rendered HTML, so crawlers can read JSON-LD immediately without JS execution.

Here is how it looks with the Next.js App Router:

// app/product/[id]/page.tsx
export default async function ProductPage({ params }) {
  const product = await fetchProduct(params.id);

  const jsonLd = {
    "@context": "https://schema.org",
    "@type": "Product",
    "name": product.name,
    "gtin13": product.gtin,
    "brand": { "@type": "Brand", "name": product.brand },
    "description": product.description,
    "image": product.imageUrl,
    "offers": {
      "@type": "Offer",
      "price": product.price,
      "priceCurrency": "KRW",
      "availability": "https://schema.org/InStock",
      "url": product.pageUrl
    }
  };

  return (
    <>
      <script
        type="application/ld+json"
        dangerouslySetInnerHTML={{
          __html: JSON.stringify(jsonLd)
        }}
      />
      <ProductDetail product={product} />
    </>
  );
}

The server fetches DB data via fetchProduct, builds the JSON-LD object, and injects it as a <script> tag. This HTML reaches crawlers as-is.

If SSR adoption feels too heavy, Google Tag Manager (GTM) can inject JSON-LD as a transitional approach. Less effective than full SSR, but viable when you cannot convert an SPA right away.

SSR Trade-offs

AspectAdvantageDisadvantageMitigation
SEO optimizationCrawlers read without JSInitial dev costSDK/shared module
Data reflectionAuto-updates on DB changesIncreased server loadRedis caching + ISR
Central managementSite-wide uniform deploymentDev team dependencyAdmin console for non-devs
ValidationBuild-time schema validationLegacy system migrationGTM hybrid fallback

Server load is largely mitigated by Redis caching and ISR (Incremental Static Regeneration). As long as product data has not changed, cached HTML is served directly.

Data Freshness Drives Citations

Even well-structured data gets deprioritized when stale.

Analyzing pages with high Perplexity citations, over three-quarters had been updated within the past month. ChatGPT Shopping refreshes feeds every 15 minutes (OpenAI). Pages untouched for over three months are likely to drop in AI citation rankings.

Freshness management guidelines:

  • Critical data (price, inventory, promotions): refresh within 24 hours
  • General data (descriptions, images): refresh within 7 days
  • Static data (brand info, company overview): monthly review

Keeping lastmod dates in sitemap.xml aligned with actual update timestamps, and using the IndexNow API to notify search engines of changes immediately, also makes a difference.

// next-sitemap.config.js
module.exports = {
  siteUrl: 'https://www.example.com',
  generateRobotsTxt: true,
  changefreq: 'daily',
  transform: async (config, path) => ({
    loc: path,
    changefreq: path.includes('/product/') ? 'daily' : 'weekly',
    priority: path.includes('/product/') ? 0.9 : 0.5,
    lastmod: new Date().toISOString(),
  }),
};

Validation - If You Added It, Verify It

Inserting JSON-LD is not the finish line. You need to confirm crawlers can actually read it.

Google Rich Results Test - Enter your URL at search.google.com/test/rich-results to instantly check whether structured data is being recognized.

Crawler simulation with curl - Send requests with AI crawler User-Agents to verify JSON-LD is included in the HTML response.

# Request as GPTBot
curl -A "GPTBot" https://www.example.com/product/12345 | grep "application/ld+json"

# Extract JSON-LD from HTML source
curl -s https://www.example.com/product/12345 \
  | grep -oP '<script type="application/ld\+json">.*?</script>'

If your site is still an SPA without SSR, curl results will likely show no JSON-LD. That is exactly why SSR is non-negotiable.

Try building JSON-LD yourself with the interactive builder to get a hands-on feel for the structure.

Common issues encountered in practice:

SymptomCauseFix
JSON-LD not crawledrobots.txt blockingSet GPTBot, Google-Extended to Allow
AI not citing dataSchema.org type errorValidate with Rich Results Test
Slow API responseNo cachingApply Redis caching + minimize fields
Server overload after SSRDB query on every requestISR + Redis caching