5. Automating Metadata Maintenance: Karpathy's LLM Wiki Architecture

Why Data Dictionaries Don’t Survive Six Months

I once worked with multiple vendors for over a year on a next-gen data warehouse project. Early on, we built an ambitious data dictionary. We aligned definitions of “revenue,” “cost,” and “net revenue” across vendors and meticulously filled in table mappings. Six months in, new tables weren’t being registered, existing definitions hadn’t caught up with schema changes, and cross-department interpretation differences sat untouched. “It’s not accurate anyway” became the team’s official stance.

Whether it’s called a catalog, a wiki, or a data dictionary, only the name changes. People fill it in enthusiastically at the start, then can’t keep up with maintenance and abandon it.

A few days ago, Karpathy posted an LLM knowledge base idea on X, then followed up with the full architecture in a GitHub Gist (Karpathy LLM Wiki). It was about building a personal knowledge base, but the problems looked similar to what I’ve been hitting while building the DataNexus catalog.

RAG Starts from Scratch Every Time

Most LLM document workflows use RAG. NotebookLM, ChatGPT file uploads, internal document search – they all chunk the source and vector-search. It works well enough, but there’s one thing that bugs me. Yesterday the LLM synthesized an answer from five documents; ask the same thing today and it starts the same search over again. It remembers nothing.

This bothered me with DataNexus too. In Post 1 , I designed a structure that injects context by feeding the ontology into a RAG Store. But if the ontology falls out of sync with the actual schema, NL2SQL generates wrong SQL. I haven’t solved the problem of keeping the RAG Store itself up to date.

Hand the Wiki to the LLM

Karpathy’s approach, put simply, is this: instead of searching raw sources every time like RAG, have the LLM manage a markdown wiki directly. Humans add the raw materials (Raw Sources), the LLM reads them and organizes into the wiki (Wiki). A configuration document (Schema) defines the wiki’s structure and rules so the LLM doesn’t write however it pleases but maintains consistency.

When a new source comes in (Ingest), the LLM summarizes and updates 10-15 related pages. When you ask the wiki a question (Query), discoveries from the answer get written back. A periodic check (Lint) catches contradictions and stale information.

Karpathy himself works with Obsidian open alongside an LLM agent. He used the analogy of Obsidian as the IDE, the LLM as the programmer, the wiki as the codebase. If you’re a developer, that clicks immediately.

What caught my eye was the Schema. In Post 3 , I wrote about the struggle of using DataHub’s Business Glossary as an ontology store with only 4 relationship types. Karpathy’s Schema serves a similar purpose – a rulebook telling the LLM “connect this term using only these relationship types.” Without it, the LLM organizes however it wants and makes a mess.

The Real Bottleneck Is Maintenance

Writing a data dictionary or wiki isn’t hard. Put a few people on it at the start of a project and it gets done. The problem is what comes after. Once pages exceed a hundred, the time spent updating cross-references, refreshing summaries, and detecting contradictions grows noticeably. People start quietly stepping away.

This is what worries me about DataNexus. Term registration, relationship configuration, DozerDB sync – all built. But DW schemas keep changing after go-live. Phased releases add tables, new deduction items appear in the “net revenue” formula. The question is who reflects these changes in the catalog.

An LLM can update 15 files simultaneously. DataHub already emits MCL (Metadata Change Log) events, so a setup where the LLM receives these events, updates affected term pages, and refreshes cross-references is feasible. The SKOS compatibility layer rules from Post 4 would serve as the Schema.

In Post 1, I wrote that “we need a pipeline that detects changes and automatically refreshes the RAG Store.” Back then it was vague. After seeing Karpathy’s Ingest/Query/Lint pattern, that pipeline finally has a sketch.

I don’t expect this to work right away. The LLM might create wrong relationships when auto-updating the ontology, and I don’t yet know how much domain expert review is needed. That’s something to figure out while building the metadata change detection pipeline.

What Humans Can’t Do

Karpathy pulled in Vannevar Bush’s 1945 Memex. An 80-year-old vision that kept failing because people couldn’t keep up with the management cost.

Karpathy said he’s spending more LLM tokens on organizing knowledge than writing code these days. Building DataNexus, I’m heading the same direction. Term definitions, mappings, reconciling interpretation gaps. People give up on this within six months. I want to see what happens when you hand it to an LLM.

Documenting the process of designing and building DataNexus. GitHub | LinkedIn

Why Data Dictionaries Don’t Survive Six Months#

RAG Starts from Scratch Every Time#

Hand the Wiki to the LLM#

The Real Bottleneck Is Maintenance#

What Humans Can’t Do#

Why Data Dictionaries Don’t Survive Six Months

RAG Starts from Scratch Every Time

Hand the Wiki to the LLM

The Real Bottleneck Is Maintenance

What Humans Can’t Do