5. Automating Metadata Maintenance: Karpathy's LLM Wiki Architecture

RAG starts from scratch every time. Karpathy proposes having the LLM maintain a wiki directly so knowledge accumulates. DataNexus’s ontology catalog needs the same principle to avoid abandonment.

April 5, 2026 · 4 min · Junho Lee
OpenDocuments: A Local RAG Platform That Unifies Fragmented Team Knowledge

OpenDocuments: A Local RAG Platform That Unifies Fragmented Team Knowledge

An open-source platform that connects documents scattered across Notion, GitHub, and S3, then queries them with a local LLM. Runs entirely on-premise without external APIs.

April 1, 2026 · 2 min · Junho Lee
Using Korean Legal Data for AI Agent Development via Beopmang API

Using Korean Legal Data for AI Agent Development via Beopmang API

Parsing raw legal XML means broken table structures and half a day lost on preprocessing. Beopmang is an API that delivers pre-cleaned JSON, solving that problem upfront.

April 1, 2026 · 2 min · Junho Lee
OpenDocuments: A Self-Hosted RAG Platform Connecting Scattered Team Knowledge

OpenDocuments: A Self-Hosted RAG Platform Connecting Scattered Team Knowledge

A self-hosted RAG platform that runs on Ollama to unify team documents scattered across Notion, GitHub, and S3. Natural language search works even in air-gapped environments without external APIs.

April 1, 2026 · 2 min · Junho Lee
The Unexpected Walls When Converting Web Pages with defuddle

The Unexpected Walls When Converting Web Pages with defuddle

Tried extracting web data for a RAG pipeline with defuddle and found results vary wildly by site structure. On sites where semantic HTML has collapsed, body text and ads mix together, and in dynamic rendering environments the content itself vanishes.

March 31, 2026 · 2 min · Junho Lee
Practical Agent Design Lessons from the OpenAI Guide

Practical Agent Design Lessons from the OpenAI Guide

Letting an agent handle planning end-to-end causes it to loop and stall in production. The OpenAI guide addresses this with explicit workflow control.

March 31, 2026 · 2 min · Junho Lee
How Data Preprocessing Determines RAG Quality and Leveraging Markdown Conversion Tools

How Data Preprocessing Determines RAG Quality and Leveraging Markdown Conversion Tools

Examining key features of a Python-based document conversion tool released by a global IT company, and proposing efficient approaches to handling data in LLM pipelines.

March 30, 2026 · 3 min · Junho Lee