“What’s Your VIP Criteria?” This happened during a BI Agent project for a retail company. A business user was testing the Agent and asked, “Show me last month’s VIP customer revenue.” The system spit out a number, but the user didn’t look happy. “Something’s off. I think the VIP criteria are different from what our team uses.” Marketing’s VIP and CRM’s VIP were different. Same with revenue. Depending on whether you meant net revenue (순매출) or gross revenue (총매출), the difference could be hundreds of millions of won. This wasn’t the first time I’d seen this. I saw it when migrating a DW to the cloud, and again when building an next-gen analytics system with multiple vendors over a year-long project. Each vendor had different definitions of “revenue” and “cost,” and we’d lose weeks trying to reconcile data. One misaligned term could push back the entire schedule. I’ve never worked on a DW/BI project where this problem didn’t come up. Enterprise data warehouses have tables and columns. What they don’t have is context. “What does this column mean in business terms” isn’t defined anywhere in a machine-readable format. So I decided to build it myself. NL2SQL Is Not a Silver Bullet There are plenty of NL2SQL tools out there now. Converting natural language to SQL is already possible. Real-world deployment is a different story. We connected a model with high benchmark scores to an actual DW, and the perceived accuracy dropped significantly. The environment integrated internal and external data – card companies, telcos, public data. The LLM couldn’t handle this level of complexity (table structures and question difficulty). Open up an enterprise DW’s DDL and the reason becomes clear. Abbreviated tables like T_CUST_MST.CUST_GRD_CD and T_ORD_DTL.SALE_AMT number in the thousands. This is a completely different world from benchmark DBs with columns named customer_name and order_date. Even within the same company, naming conventions differ across business units, and the word “revenue” can point to different tables depending on which unit you ask. Derived metrics are even worse. “Net revenue” (순매출) isn’t a single column. It’s a formula like SUM(SALE_AMT) - SUM(RTN_AMT) - SUM(DC_AMT), and this formula isn’t written in any DDL. It lives in someone’s head, or at best, buried somewhere in an Excel spec document. The bottleneck for NL2SQL isn’t SQL generation capability. It’s the lack of context. Choosing Ontology No matter how much I optimized prompt engineering for conversational BI, DDL alone had clear limits. I also designed a multi-agent architecture, but the root problem was the same. There was simply no context to give the LLM. I decided to attach an ontology. It’s not as academic as it sounds. In practical terms, it looks like this: # Net revenue term definition - term: Net Revenue (순매출) definition: Amount after deducting returns and discounts from gross revenue formula: SUM(SALE_AMT) - SUM(RTN_AMT) - SUM(DC_AMT) synonyms: [Net Sales, 순매출액, 넷세일즈] related_tables: [T_SALE_DTL, T_RTN_DTL] owner: Finance Team You register these term definitions in a metadata catalog and auto-sync them to the NL2SQL engine’s RAG Store. When the LLM encounters the word “net revenue,” it knows which tables, which columns, and which formula to combine. Here’s the processing flow: Our internal target for the before/after difference with ontology is a +15-20%p improvement in EX (Execution Accuracy). MVP goal is EX 80%+, stabilization phase 90%+. Whether these numbers are realistic – we’ll validate as we build. Why Pasting DDL Doesn’t Work Pasting an entire enterprise DW’s DDL means tens to hundreds of thousands of tokens. In environments with hundreds or thousands of tables, you can’t fit it all in the context window. Even if you could, having the LLM pick out the right table from all that is a Needle-in-a-Haystack problem. Security is another issue. You can’t send an entire corporate schema to an external API. Even for the same “revenue,” users from Group Company A and Group Company B should see different scopes. This is an environment that requires Row-level Security. The biggest problem is sustainability. DDL changes. Business term definitions change. Even after launching a next-gen analytics system, phased releases keep coming and metadata changes every time. You need a pipeline that detects changes and automatically refreshes the RAG Store – not a one-time prompt. That’s when I knew I needed to build a separate platform. What DataNexus Is Trying to Do It consists of four components. Metadata Catalog – Manages business term definitions, table metadata, and data lineage in one place. The Source of Truth for the ontology. NL2SQL Engine – Converts natural language to SQL, but injects context pulled from the ontology into the prompt. The accuracy gap compared to just throwing DDL at it is significant. Document Knowledge Engine – Searches unstructured data like annual reports and policy documents using GraphRAG + vector hybrid retrieval. Graph DB – Stores the ontology as a knowledge graph. With Multi-DB isolation per group company. The ontology defined in the catalog auto-syncs to NL2SQL and document search, so user questions get served with context. The open-source tools chosen for each component and the reasoning behind them will be covered in the next post. Why Now General-purpose models are getting better fast, and simple planning or document generation will soon be commodity. To differentiate, you need a system that can structure enterprise data context and inject it into models. No matter how smart the LLM gets, it doesn’t know your company’s definition of “net revenue” (순매출). Because that knowledge only exists inside the enterprise. LLM research calls this the “Non-verifiable Domain”. Math and coding have auto-verifiable answers, but tacit enterprise knowledge, role-specific interpretations, and private operational data are hard to judge from the outside. The competitive advantage built on this kind of data is what AI strategy calls a “Data Moat”. I don’t think this advantage is permanent. The data accumulation speed of DataNexus needs to outpace the generalization speed of general-purpose models. Here’s how to build the Data Moat: Ontology-based context – A metadata catalog that gets thicker as domain experts refine terms Role-specific interpretation – Persona optimization that gives different answers to finance and marketing for the same question. Gets more personalized as usage patterns accumulate. Temporal Knowledge Graph – Distinguishes “VIP definition as of Q4 last year” from “VIP definition as of this year” Private data assets – Graph DB isolation per group company + Row-level Security. Each group company’s data becomes an independent asset. The goal is to ship the MVP by the first half of this year and start spinning the data accumulation loop. Purpose of This Blog I’m documenting the decisions, struggles, and solutions encountered while building DataNexus. Topics I’ll cover: How we selected the tech stack (including why candidates were eliminated) Limitations and workarounds when using the metadata catalog’s Business Glossary as an ontology Why we added a SKOS compatibility layer User-Aware design and Row-level Security in the NL2SQL engine How to pre-validate an ontology with CQ (Competency Questions) Criteria for splitting deterministic vs. probabilistic routing in the Query Router The 79% Rule for splitting agent tasks I’ll focus on problems we actually hit and how we solved them (or haven’t yet), rather than theory. Next Post DataNexus Tech Stack – The process of deciding on this combination of 4 open-source tools. What was eliminated from the candidate list, and why. Documenting the process of designing and building DataNexus. GitHub | LinkedIn