The Customer Table Has Both SSN and Business Registration Number

This is a structure you see all the time in transactional systems. A single customer table mixes individual customer attributes (social security number, date of birth) with corporate customer attributes (business registration number, CEO name). For individual customers, the business registration number is NULL; for corporate customers, the SSN is NULL. A single customer type code column differentiates them.

When data volumes are small, it’s not a big deal. When customers number in the tens of millions, the story changes. Columns needed only for individuals take up space in corporate rows, and columns needed only for corporates sit as NULLs in individual rows. As columns grow, the table gets wider and its meaning becomes muddier. You can’t tell from the DDL alone which columns belong to which customer type.

Super-sub typing is the method for sorting this out at the logical model stage.

Separating Common and Type-Specific Attributes

The principle behind super-sub types is simple. Common attributes go in the super type (Customer), and type-specific attributes go in the sub types (Individual Customer, Corporate Customer).

[Customer]                    ← Super type: CustomerID, CustomerName, Contact
  ├─ [Individual Customer]    ← Sub type: SSN, DateOfBirth
  └─ [Corporate Customer]     ← Sub type: BusinessRegNo, CEOName

A single CustomerID links the super type to its sub types. The Individual Customer table contains only attributes relevant to individual customers. The wide table full of NULLs disappears.

There’s another reason to split into sub types. Each sub type can independently form relationships with other entities. Perhaps only corporate customers have a relationship with credit limits, or only individual customers relate to membership tiers. When all relationships hang off a single super type, the meaning of each relationship becomes ambiguous. Splitting into sub types makes “which type does this relationship apply to?” immediately readable from the model.

Exclusive or Overlapping?

When designing sub types, there’s one thing you must determine first: whether a single instance belongs to exactly one sub type (Exclusive) or can belong to multiple sub types simultaneously (Inclusive).

Exclusive is overwhelmingly more common. A customer is either individual or corporate. An account is savings, time deposit, or fixed deposit — one of them. A product is physical or digital. A single type code handles the classification.

Inclusive is rare, but missing it means major rework later. Service products are a typical example — a single product might target both B2B and B2C simultaneously. Employee roles work similarly. When one person handles both sales and technical support, the “Employee Role” sub type becomes Inclusive.

At the design stage, always double-check: “Is this classification truly exclusive?” If you build the model assuming Exclusive and then overlapping cases surface, you’ll have to rework everything from the type code scheme to the relationship structure.

Options When Moving to a Physical Model

In the logical model, super-sub types are clean. The choices diverge when converting to a physical model.

Single table. Merge the super type and sub types into one table. You end up with the same wide table that was the original problem, but there are no joins, so queries are simple. When type-specific attributes are few, this is a pragmatic choice.

Separate tables. Create individual tables for the super type and each sub type. No NULLs and the structure is clear, but to see complete customer information, you need to join the super type with the sub type.

Sub types only. No super type table — just Individual Customer and Corporate Customer tables. Common attributes are duplicated in each. This fits when analysis is fully independent per sub type, but seeing all customers requires a UNION.

There’s no right answer. It depends on the number of sub types, the volume of type-specific attributes, and query patterns.

When It Feeds Into DW Dimension Design

In a DW, this choice directly ties into dimension design. The “access paths” perspective from Part 2 becomes the decision criteria.

Consider designing a customer dimension. If you split individual and corporate customers into separate dimensions, the fact table gets more FKs, and analysts have to choose which dimension to join every time. If you build a unified dimension, you get a wide table with many NULLs — but as discussed in Part 1 , cloud columnar storage makes the scan cost of NULL columns virtually zero.

The deciding factor is the analysis pattern. If individual customer revenue is analyzed by age group and region, while corporate customer revenue is analyzed by industry and revenue size, then the dimension attributes themselves differ — so splitting makes more sense. If most analysis treats all customers as a single axis, a unified dimension is more convenient.

A common compromise seen in practice is to maintain a unified dimension as the baseline, and add separate views or marts when sub-type-specific analysis is frequent. In cloud environments where storage costs are low, the overhead of redundant storage is minimal.

The next post compares the Inmon and Kimball approaches. We’ll dig deeper into what was briefly mentioned in Part 1.