A complete guide to slowly changing dimension design — when to use SCD Type 1 overwrite versus Type 2 history versus Type 3 attribute tracking, the practical engineering of SCD Type 2 in dbt, and the hybrid Type 6 pattern for complex historical analysis requirements.
Slowly changing dimensions (SCDs) are one of the foundational concepts in dimensional data modelling. They define how dimension tables handle attribute changes over time — when a customer moves to a new city, when a product changes its category, when an employee transfers to a different department. The design choice determines whether historical analysis is accurate, approximate, or impossible.
The Core Problem
A dimension table represents descriptive attributes used to filter, group, and label facts. A customer dimension has columns like city, region, and account_tier. A product dimension has product_category, supplier, and price_tier.
These attributes change. The question is: what happens in your dimension table when they do?
If you overwrite the old value with the new one, historical facts join to the new attribute — a sale made when the customer was in London now looks like it was made when they were in New York. If you preserve history, you can report "sales by customer location at the time of the sale" accurately. The SCD type you choose is this decision, formalised.
SCD Type 1: Overwrite
Type 1 is the simplest approach. When an attribute changes, update the dimension record in place. The old value is lost.
Use Type 1 when:
- History genuinely does not matter for this attribute (e.g., correcting a data entry error — a misspelled name should be corrected everywhere)
- The attribute is a derived or calculated field that should always reflect the current state
- Storage constraints make preserving history impractical (rarely relevant in cloud warehouses)
- The business never asks historical questions about this attribute
The risk: if the business later asks "what was the customer's tier when they made that purchase three years ago?" — and you used Type 1 — the answer is gone. Type 1 is irreversible in most implementations. Apply it only to attributes you are certain will never need historical tracking.
In dbt, Type 1 is the default snapshot behaviour when no unique_key conflict strategy is defined. It is also the default for most dimension tables built with incremental models that merge on the primary key.
SCD Type 2: Add a Row
Type 2 is the standard approach for attributes that require historical accuracy. When an attribute changes, the existing dimension record is closed and a new record is inserted with the new attribute values. Both records remain in the table.
The mechanism requires three additional columns:
**effective_date (or valid_from):** The date this record became active.
**expiry_date (or valid_to):** The date this record was replaced. Conventionally set to a far-future date (2099-12-31 or 9999-12-31) for the current record to simplify "as-of" queries.
**is_current:** Boolean flag indicating whether this is the active record. Redundant with expiry_date but useful for readability and simple joins.
A Type 2 customer dimension for a customer who moved from London to New York in March 2024 looks like:
customer_key | customer_id | city | effective_date | expiry_date | is_current
1001 | C-555 | London | 2020-01-15 | 2024-03-10 | false
1002 | C-555 | New York| 2024-03-11 | 2099-12-31 | true
Fact records written before March 2024 carry customer_key 1001. Fact records after carry 1002. Historical analysis is accurate at the time the sale occurred.
**Joining with Type 2 dimensions** requires care. A simple JOIN on customer_id returns multiple rows per customer. The correct join for current state is WHERE is_current = true. The correct join for point-in-time accuracy is on both customer_id AND the fact date falling between effective_date and expiry_date.
**SCD Type 2 in dbt** uses the snapshot macro. Snapshots track changes in a source table, insert new rows for changed records, and close previous rows by setting dbt_valid_to. A minimal dbt snapshot:
{% snapshot customer_snapshot %}
{{ config(
target_schema='snapshots',
unique_key='customer_id',
strategy='timestamp',
updated_at='updated_at',
) }}
select * from {{ source('crm', 'customers') }}
{% endsnapshot %}
dbt adds dbt_scd_id, dbt_valid_from, dbt_valid_to, and dbt_updated_at columns automatically. The snapshot runs on a schedule and detects new or changed rows by comparing updated_at timestamps (or row hashes under the check strategy).
SCD Type 3: Add a Column
Type 3 tracks the previous value of a specific attribute alongside the current value, without preserving full history.
Typical implementation: add a previous_city and a city_changed_date column alongside the current city column. When city changes, previous_city is set to the old city, city_changed_date is set to the change date, and city is updated.
Use Type 3 only when:
- You need to compare current versus one previous state for a specific attribute
- You know in advance which attributes need this treatment
- History beyond one change back is not required
Type 3 is rarely used in practice. It solves a narrow problem, handles only a single version of history, and does not generalise. Most implementations that start with Type 3 eventually require Type 2 when more than one change needs tracking. Consider it a design shortcut with a known expiry date.
SCD Type 6: The Hybrid
Type 6 is a hybrid of Types 1, 2, and 3 — it adds both a full Type 2 history and a current-value column denormalised onto every historical record. The naming comes from multiplying 1 x 2 x 3 = 6, which is a practitioner's joke, not an official standard.
A Type 6 customer dimension includes:
- All Type 2 columns (effective_date, expiry_date, is_current)
- A current_city column on every row, always set to the current value regardless of when the record was active
This allows two query patterns simultaneously:
- "What was the customer's city when they made this purchase?" — join via the date range, use city
- "What city is this customer in now, for all their historical purchases?" — join on customer_id + is_current, or use current_city on any historical row
Type 6 is useful when reporting requires both analytical question types without changing the join strategy. The cost is that the current_city column on historical records must be updated every time the current value changes — either via a full refresh or a targeted update. In large dimension tables with many historical records per entity, this update can be expensive.
Choosing the Right Type
Most dimension tables in a production environment use a mix:
**Type 1:** Correction attributes (name spelling corrections, data quality fixes), attributes with no analytical history value (internal IDs, system flags), attributes you can prove the business will never want historically.
**Type 2:** Any attribute used in analytical queries where historical accuracy matters — customer geography, customer segment, product category, employee department, account tier. Default to Type 2 unless you have a specific reason not to.
**Type 3:** Almost never. The scenarios where Type 3 is genuinely the right answer are narrow. If you are considering Type 3, reconsider Type 2.
**Type 6:** When the same dimension is used in both point-in-time historical analysis and current-state reporting, and you want a single dimension table to support both patterns without requiring analysts to change their join logic.
Common Implementation Mistakes
**Forgetting the far-future expiry date.** Using NULL for the current record's expiry_date makes date range joins complex. Using 2099-12-31 or 9999-12-31 simplifies queries considerably and is conventional.
**Using natural keys in fact tables instead of surrogate keys.** If your fact table joins on customer_id (natural key) instead of customer_key (surrogate), you cannot support Type 2 history — multiple dimension rows have the same customer_id. Fact tables must carry the surrogate key, not the natural key, to support SCD Type 2.
**Running dbt snapshots infrequently.** A dbt snapshot that runs daily captures daily granularity of changes. If a customer's tier changed and changed back within a single day, you miss both changes. Snapshot frequency must match the change frequency that matters for your analysis. For high-velocity attributes, snapshots may need to run hourly.
**Not documenting which SCD type applies to which columns.** Mixed Type 1 and Type 2 columns in the same dimension table are common and correct — but if undocumented, analysts join incorrectly and produce wrong results. Document SCD type per column in your data dictionary or dbt schema.yml.
Slowly changing dimension design is one of the places where upfront architectural decisions make the largest downstream difference. The choice between Type 1 and Type 2 cannot easily be reversed after data is in production — Type 1 loses history irrecoverably, and retrofitting Type 2 onto a live dimension table is a substantial engineering effort.
Our data architecture consulting practice designs dimensional models that handle historical accuracy requirements correctly from the start — contact us to discuss your data modelling requirements.
A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.
Book a Call →