Why Are We Suddenly Talking About External System Integration

Through Post 3, I wrote from a single perspective: NL2SQL accuracy. “How well can we inject business context into the LLM” was the criterion for every decision.

From here, a second perspective enters the picture. Platform.

If DataNexus ends up as an NL2SQL tool that only runs inside one client, external integration isn’t needed. The DozerDB graph works fine on its own. The problem is that Post 1 already laid out a bigger vision – multi-tenancy per group company, Data Moat, temporal knowledge graph. All of those words sit on the premise that DataNexus needs to become a platform where multiple organizations exchange ontologies, not just a standalone system.

Consider a retail conglomerate with department stores, hypermarkets, and an online mall, each defining “revenue” differently. To unify these, you need to export each subsidiary’s ontology in a common format and map them. A proprietary DataNexus-only format can’t do that.

The SKOS compatibility layer doesn’t directly improve NL2SQL accuracy. Instead, it helps in a different way.

  • Importing industry standard ontologies like FIBO (finance) or GPC (retail) means less time defining terms from scratch. Faster build means earlier context injection into the NL2SQL engine. In Post 1, I wrote “DataNexus’s data accumulation speed needs to outpace the generalization speed of general-purpose models” – leveraging standards is one way to do that.
  • If a client already uses Collibra or Alation, an inability to export in a standard format blocks adoption entirely. No matter how good the NL2SQL accuracy is, if it can’t coexist with existing infrastructure, it won’t get used in the field. Lesson learned from the retail project – field fit, not technology, drives adoption.

Post 4 isn’t about the NL2SQL engine’s internal performance. It’s about interface design for DataNexus to function as a platform. Different perspective, different problems to solve.

External System Integration Didn’t Work

In the previous post , the DataHub + DozerDB dual structure solved the internal ontology problem. It was sufficient for internal use only.

The problem was external integration. While exploring the finance domain, I found FIBO (Financial Industry Business Ontology) – an industry-standard term hierarchy with concepts like “Financial Product,” “Loan,” “Interest Rate” organized in layers. Retail has its counterparts. GS1’s GPC (Global Product Classification) has standardized product taxonomies like “Apparel -> Women’s Wear -> Dresses.” Healthcare has SNOMED CT, manufacturing has ISA-95. Every domain has thousands of pre-organized terms – if we could import these, there’d be no need to build an ontology from scratch.

I opened a FIBO file. It was in OWL format. Trying to load it into the DozerDB graph, the structures just didn’t match. The reverse direction was the same – to export the DataNexus ontology to a client’s existing system (Collibra, TopBraid, etc.), there was no standard format to use. It worked perfectly internally but became useless the moment you tried to take it outside.

The problems with no external compatibility stack up. Many large enterprises already use metadata management tools like Collibra or Alation. Adopting DataNexus doesn’t mean abandoning their existing term systems. If you can export in a standard format, coexistence is possible. If you can’t, you’re looking at manually migrating hundreds of terms. That alone eats months.

For retail conglomerates where department stores, hypermarkets, and online malls each define “revenue” differently, unifying or at least mapping terms at the group level requires a common format. Without one, each subsidiary operates in isolation. Financial institutions face regulatory requirements to report data lineage and term definitions to supervisory authorities. Then there’s vendor lock-in. If you use DataNexus but need to switch platforms later, standard format export means you can migrate. Without it, you’re trapped. This weighs heavily in adoption decisions.

Same Graph, Different Language

DozerDB uses the LPG (Labeled Property Graph) model.

  • Nodes (circles) get names and properties: Net Revenue {definition: "Gross Revenue - Returns - Discounts"}
  • Arrows between nodes, and properties on those arrows too: -[MANUFACTURES {since: "2024-01-01"}]->

The key point is that you can put information like “since when” and “confidence level” directly on the arrow itself. This is what we leveraged in the previous post when creating MANUFACTURES and STOCKS relationships.

SKOS and other web standards use a completely different system. RDF (Resource Description Framework) – every piece of information is broken into three-word sentences.

  • Net Revenue -> broader -> Revenue (Net Revenue’s broader concept is Revenue)
  • Net Revenue -> prefLabel -> "Net Revenue"@en (The English name is “Net Revenue”)

Subject-predicate-object – these three words form one unit. Called a triple.

This is where they diverge. LPG can freely attach properties to relationships, but in RDF, the triple is the atomic unit, so you can’t directly put properties on a relationship. In exchange, RDF is URI-based, so the same concept can be referenced by the same address from anywhere in the world. For inter-system data exchange, RDF wins decisively.

So we needed LPG’s expressiveness internally and RDF’s compatibility externally. Both were needed.

OWL Is Overkill, RDFS Is Too Light

The RDF world has multiple standards.

OWL (Web Ontology Language) is the most powerful. Class inheritance, constraints, automated reasoning. Think of it as legal text – you can precisely specify every clause and exception, but you need a separate Reasoner engine and the learning curve is steep. FIBO uses OWL precisely because of the complexity of financial regulations.

What DataNexus is doing isn’t reasoning. It’s context provision – telling the NL2SQL engine “what is average transaction value, in which table, in which column.” OWL was overkill.

RDFS (RDF Schema) swings the other way, too lightweight. subClassOf works, but there are no standard properties for synonyms or term definitions.

SKOS (Simple Knowledge Organization System) landed in the middle. It’s a W3C standard originally built for library classification systems and thesauri (a thesaurus maps synonyms, near-synonyms, and hierarchical terms). What DataNexus does is basically business term dictionary management, so it’s not a stretch to use SKOS for this.

Here’s how SKOS concepts map to the DataNexus structure:

SKOSDataNexus (DataHub + DozerDB)In plain terms
skos:ConceptGlossary Term / Entity nodeA single term
skos:broaderIsA relationship (broader concept)“ATV is a type of sales metric”
skos:narrowerIsA reverse (narrower concept)“Sales metrics includes ATV”
skos:relatedRelatedTo family“Related terms” *
skos:prefLabelTerm name (primary label)Official name
skos:altLabelSynonyms (translations, abbreviations)“ATV” = “Average Transaction Value”
skos:definitionTerm definitionWhat the term means
skos:ConceptSchemeDomain-specific term grouping“Retail Terms”, “Finance Terms”

* Note: skos:related is bidirectional. “A related B” automatically implies “B related A.” DozerDB relationships like SELLS or SUPPLIED_BY are directional. Store A sells Product B, but Product B doesn’t sell Store A. This directional information is lost when exporting to SKOS. More on this later.

Overlaying SKOS on DozerDB

The one rule I set was: don’t touch the existing graph.

We’d already built MANUFACTURES, STOCKS, CALCULATED_FROM relationships in DozerDB and had queries running against them. Ripping that apart to comply with a standard would be the kind of mistake I’ve seen too many times on other projects.

So we overlaid SKOS metadata on existing nodes instead. Like placing a transparent film on top.

// Add SKOSConcept label and SKOS properties to existing Entity node
MATCH (net:Entity {name: 'Net Revenue'})
SET net:SKOSConcept
SET net.skos_prefLabel = 'Net Revenue'
SET net.skos_altLabel = ['Net Sales', '순매출액']
SET net.skos_definition = 'Amount after deducting returns and discounts from gross revenue'
SET net.skos_inScheme = 'finance-terms'

Same for the retail domain.

// Retail domain term example
MATCH (atv:Entity {name: 'ATV'})
SET atv:SKOSConcept
SET atv.skos_prefLabel = 'ATV'
SET atv.skos_altLabel = ['Average Transaction Value', '객단가', '객단']
SET atv.skos_definition = 'Total revenue divided by number of purchasing customers'
SET atv.skos_inScheme = 'retail-terms'

Existing Entity nodes remain untouched. Just a SKOSConcept label and skos_-prefixed properties added on top. No impact on existing Cypher queries.

For broader/narrower relationships, there were two approaches: create BROADER and NARROWER edges alongside IsA upfront, or convert existing IsA relationships to skos:broader at export time.

We chose the latter. Creating dual edges means every time IsA changes, BROADER needs to sync too. If sync drifts, data gets corrupted. The Source of Truth should be singular. Converting once at export time is simpler and safer.

Import and Export

Once the overlay was in place, two things became possible that weren’t before.

Import – Pulling finance terms from FIBO, product taxonomy from GS1 GPC into DataNexus. FIBO is originally distributed in OWL, but there are derived SKOS versions. GPC is also SKOS-mappable. Product hierarchies like “Apparel -> Women’s Wear -> Dresses” can be imported directly as the backbone for a retail client’s ontology. OWL’s complex constraints get dropped, but what DataNexus needs is just term names, definitions, and hierarchical relationships. The SKOS subset is sufficient.

Export – Sending DataNexus terms to a client’s system. Extract nodes and relationships for a specific domain (e.g., retail-terms) from the DozerDB graph and convert to SKOS Turtle format.

@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix dnx:  <http://datanexus.ai/ontology/> .

dnx:atv a skos:Concept ;
  skos:prefLabel "ATV"@en ;
  skos:altLabel  "Average Transaction Value"@en, "객단가"@ko ;
  skos:definition "Total revenue divided by number of purchasing customers"@en ;
  skos:broader dnx:sales-metrics ;
  skos:inScheme dnx:retail-terms .

dnx:sales-metrics a skos:Concept ;
  skos:prefLabel "Sales Metrics"@en ;
  skos:narrower dnx:atv, dnx:net-sales, dnx:upt ;
  skos:inScheme dnx:retail-terms .

In the retail field, what one system calls “ATV” (Average Transaction Value), another calls “average spend per customer” (객단가). Put all these aliases in altLabel and the NL2SQL engine can find the same table regardless of which name is used in the question. This file can be loaded into any SKOS-compatible system – Collibra, TopBraid, you name it.

With import/export working, the problems mentioned earlier are solved. Say a retail conglomerate’s department store defines “revenue” as store-level POS totals, the online mall uses payment confirmation basis, and the hypermarket uses post-return basis. Each subsidiary exports their terms via SKOS from DataNexus, and headquarters can receive these and build a mapping table. “Department store revenue = Online mall confirmed revenue = Hypermarket net revenue” – this relationship gets standardized. Financial clients needing to report term definitions and data lineage to regulators can submit the SKOS Turtle file directly or convert to the required format. Without standards, all of this is manual work.

Schema.org and other RDFS/OWL-based standards are outside the scope of this SKOS layer. If needed, a separate converter can be built, but it’s not a priority right now.

Remaining Limitations

There are things SKOS can’t do, and I knew that going in.

SKOS has an extension called SKOS-XL that lets you attach metadata to labels themselves. You could record when “Net Revenue” was registered or who approved it. If multilingual label management gets complex, we might need it. Haven’t added it yet.

OWL-level reasoning is also outside SKOS scope. “If A is narrower than B, and B is narrower than C, then A is narrower than C” – that kind of automated inference. Not needed when the ontology is small, but things may change with thousands of terms.

The biggest limitation is custom relationship export. DozerDB’s domain-specific retail relationships like SELLS, STOCKS, SUPPLIED_BY have no SKOS standard equivalent. “Store A sells Product B” is a directional relationship, but lumping it into skos:related erases both direction and meaning. Extending with a custom namespace like dnx:sells preserves the information, but the receiving system needs to understand these custom relationships. Information loss vs. compatibility – it’s a tradeoff.

How Custom Relationships Are Actually Exported

I said lumping into skos:related erases meaning. So what do we actually do?

We define a DataNexus-specific namespace.

@prefix dnx: <http://datanexus.ai/ontology/relation/> .

dnx:atv-store a skos:Concept ;
  skos:prefLabel "ATV-Store Relationship"@en ;
  dnx:relationshipType "SoldBy" ;
  dnx:direction "outgoing" ;
  dnx:confidence 0.95 ;
  dnx:validFrom "2024-01-01" .

Custom properties like dnx:relationshipType, dnx:direction, dnx:confidence preserve the directionality and metadata from DozerDB’s SELLS relationship. If the receiving system understands the dnx: namespace, it can restore the information without loss. If not, it falls back to skos:related. The information doesn’t disappear – it’s just only visible to systems that can read it.

In practice, we operate like this:

Export TargetMethodInformation Preservation
SKOS-native systems (Collibra, TopBraid)Standard skos: properties only~80% (direction, properties lost)
DataNexus-to-DataNexus (subsidiary to subsidiary)Include dnx: custom namespace~95% (near-complete preservation)
Regulatory reportingskos: + custom relationships as text in skos:note~85% (human-readable level)

On the DataHub side, we’ve also defined rules for handling unmapped properties at export time:

DozerDB PropertySKOS Export Handling
confidencednx:confidence (custom) or text in skos:note
since / valid_untildnx:validFrom / dnx:validUntil or skos:historyNote
cardinalitydnx:cardinality (custom only, no SKOS equivalent)
operator (CALCULATED_FROM)dnx:calculationOperator

I’m aware this isn’t clean. The dnx: namespace only means something inside the DataNexus ecosystem, and there’s no guarantee external systems will understand it. Filling standard gaps with custom extensions is basically making up a new non-standard. To do this properly, we’d need SKOS-XL or a formal Application Profile. That feels like overkill right now. We’ll add it if clients actually need it.

Roughly 80% is covered by SKOS standard, the remaining 20% by DozerDB custom properties. Not ideal, but better than pretending everything fits neatly into SKOS when it doesn’t.


Documenting the process of designing and building DataNexus. GitHub | LinkedIn