“Who put this Unknown record in here?”

This question comes up at every DW model review.

Open the product master table and you’ll find a record named “Unknown.” Same thing in the employee table. If you’ve spent your career in transactional systems, this is naturally puzzling. Dummy data in a master table?

A follow-up question usually arrives: “What’s this ‘Point-in-Time Sales Rep’ column in the order fact table? That didn’t exist in the source order table.” For someone new to DW models, this is equally baffling.

There’s no shortage of material explaining the differences between the two models using keywords — denormalization, star schema, snowflake. A quick search will surface them. The problem is that keywords alone don’t explain why the design works this way. You need to start with purpose.

Transactional Models Guard Integrity

The goal of an OLTP data model is clear: maintain data integrity through frequent inserts and updates.

This goal determines the model’s shape. Relationships between entities are strict. You can’t register an employee without a department, you can’t create an order without a product, and an order without a customer doesn’t exist. Every relationship has preconditions, and those conditions must be met at the moment data is created.

Normalization is how you enforce this. Reducing redundancy means there’s only one place to update, which minimizes the chance of breaking integrity. You register top-level masters (like code tables) first, then stack transactional data on top. The sequence cannot be broken.

Think of it this way: a grandfather must exist before a father can exist, and a father before a son. These are existence relationships. A person must exist before their actions can be recorded. These are behavioral relationships. OLTP models focus on reflecting all such relational constraints without exception.

DW Models Design Access Paths

DW data models solve a different problem: load all data without omission, and create paths for accessing analytical subjects.

“Access paths” is the key concept. Say the analytical subject is order performance. You need to be able to approach it from the employee dimension, the product dimension, or the customer dimension. Whichever path you take, the results should be the same, and performance should be comparable. The star schema expresses this structure most intuitively.

        [Employee]
            |
[Product] — Order Facts — [Customer]
            |
         [Job]

Order facts sit at the center, with dimension tables — the access paths — surrounding them.

Someone with deep OLTP experience might look at this model and think it’s just “denormalized OLTP.” Because the tool is the same (ERD), the output must be the same kind of thing, right? The tool is the same. The design starting point is not.

Point-in-Time Data: An Unfamiliar Concept

An OLTP order table has a “Sales Rep” column pointing to the current sales rep. A DW order fact table has a Point-in-Time Sales Rep — the sales rep assigned at the exact moment the order was placed.

Why is this needed? Say a product’s assigned sales rep changed from A to B this year. In OLTP, the current rep is B. That’s it. In the DW, it’s different. “I want to see last year’s performance under A and this year’s under B” is a natural requirement. Using historical data like product sales rep history or customer job history, the point-in-time data is constructed at the moment of loading into the order fact table.

In OLTP, when an employee leaves, you deactivate them in the master table and move on. In the DW, employees who only existed at a past point in time, and job codes that are no longer valid, must all remain in the master tables. Data needed for historical analysis cannot be missing.

Why Unknown Records Exist

There’s a common situation in DW projects. You want to analyze 10 years of order history, but product master management was sloppy and only recent products remain. The order records have product IDs, but the product table has no matching entries.

In OLTP, this situation simply cannot occur. The design prevents orders from being created without a corresponding product. The DW has a different stance. Historical data that already happened must be loaded as-is.

At this point, there are several options:

  • Replace the order’s product ID with an ID corresponding to “Unknown”
  • Add an extra analytical product ID column for dual management
  • Insert the unmapped product IDs into the product master at load time, filling remaining attributes with NULL or placeholder values

Regardless of the approach, one thing is common: a reference record called Unknown is pre-inserted into the product master table. Since the sales rep for that product is also unknown, an Unknown record goes into the employee table too. Relationships are formally satisfied — but at load time rather than at the time the data originated, through deliberate intervention.

OLTP modelers may find this uncomfortable. Satisfying relationships with artificial dummy data? Considering the purpose of a DW, it’s a rational decision. Analytical data cannot be dropped, and the structure must remain consistent regardless of which access path is used.

When Relationships Are Established Differs

Here’s the summary.

OLTP satisfies relationship conditions at the time data is generated. You can’t register an employee without a department, and you can’t place an order without a customer. If a relationship is violated, the data simply doesn’t get in.

DW aligns relationships at the time data is loaded. If the source has gaps, Unknown fills them. If point-in-time data is needed, it’s derived from history. Relationships are established through deliberate intervention during the loading process.

OLTPDW
PurposeTransaction processing, integrity guaranteeAnalytical data loading, access path design
When relationships are metAt data generationAt data loading
Missing dataNot allowedHandled with Unknown
History managementCurrent state focusPoint-in-time data generation
Design directionNormalization (minimize redundancy)Access-path-centric (analytical convenience)

Once you understand this difference, most of the “why did they do it this way?” questions about DW models resolve themselves.

Same Story in the Cloud Era

The previous post covered how physical constraints have changed in cloud DW environments. Storage got cheaper, columnar storage shifted join patterns, and we moved to the ELT paradigm.

Physical constraints have changed, but the difference in purpose between OLTP and DW remains. Whether you’re using BigQuery or Synapse, you still need to design access paths for analytical data. Situations requiring Unknown records and requirements for point-in-time data management don’t disappear just because the infrastructure changed.

What has changed is that history management can now be more aggressive. With lower storage costs, maintaining dimension history via SCD Type 2 is less burdensome. SCD type-specific design approaches will be covered later in this series.

The next post examines ERD notation differences. Even when the same relationship is drawn, the interpretation changes depending on whether it’s Crow’s Foot or IDEF1X. Without understanding the notation, two people looking at the same model will be having different conversations.