dbt Snapshots: Tracking Historical Changes in Slowly Changing Dimensions

dbt snapshots capture point-in-time historical records for slowly changing dimension data — customer addresses, subscription statuses, account tiers — so you can answer questions about what something was at a specific point in time, not just what it is now. This guide covers when to use snapshots, the snapshot strategy options, and the most common implementation mistakes.

Most data in a warehouse represents current state. The customer table shows what the customer looks like now. The subscription table shows the current subscription status. But analytical questions often require historical state: what was this customer's subscription tier last quarter? What was the account owner when this deal was closed? What was the customer's country when they made their first purchase?

Without historical state tracking, answering these questions requires either an audit table (if the application built one) or a time-travel feature in the warehouse (if available). dbt snapshots provide a systematic way to capture historical records for slowly changing dimension data — recording each version of a row as it existed at a specific point in time.

What Snapshots Do

A snapshot model is a special dbt materialisation that:

1. Compares the current state of source data against the previously recorded snapshot

2. When it detects a changed row (based on a configured strategy), it closes the existing record by setting an end timestamp

3. Inserts a new record with the current state and a new start timestamp

The result is a type-2 slowly changing dimension (SCD2): each version of each record is preserved, with valid_from and valid_to timestamps that allow you to query the state at any point in time.

A customer who changed their country from US to UK has two records in the snapshot table:

- customer_id: 123, country: US, dbt_valid_from: 2023-01-01, dbt_valid_to: 2024-06-15

- customer_id: 123, country: UK, dbt_valid_from: 2024-06-15, dbt_valid_to: null

To find the customer's country as of 2024-01-01, query where dbt_valid_from <= '2024-01-01' AND (dbt_valid_to > '2024-01-01' OR dbt_valid_to IS NULL).

When to Use Snapshots

Snapshots are appropriate when:

**The source does not preserve history:** If the source system updates the record in place (a CRM contact's job title is overwritten when it changes), the previous value is lost. Snapshots capture each version before it changes.

**Historical state affects analytical accuracy:** Analysing sales rep performance requires knowing which sales rep owned each account when the deal was created — not the current owner. Cohort analysis requires knowing each customer's tier at the time of acquisition. Attribution analysis requires knowing which channel was attributed at conversion time.

**The entity changes slowly enough that snapshot overhead is manageable:** A user profile table that changes weekly is a good snapshot candidate. An event table that appends millions of rows per hour is not.

Do not use snapshots for:

- Tables that change very rarely (historical accuracy not required) — the complexity is not worth it

- Append-only tables — incremental models are the right pattern

- High-volume tables — snapshot overhead (comparing every row on every run) does not scale

Snapshot Strategies

### timestamp Strategy

The snapshot checks whether the row's updated_at timestamp has changed since the last snapshot run. If the timestamp is newer, the row is recorded as a new version.

{% snapshot customers_snapshot %}

{{

config(

target_schema='snapshots',

unique_key='customer_id',

strategy='timestamp',

updated_at='updated_at'

)

}}

select * from {{ source('app_db', 'customers') }}

{% endsnapshot %}

The timestamp strategy is the preferred approach when the source table has a reliable updated_at column. It is efficient — only rows with a changed timestamp are processed.

**Requirement:** The source must have a reliable updated_at timestamp that is updated whenever any field in the row changes. If updated_at is only updated for some changes (some applications only update timestamps for certain field edits), the snapshot will miss changes that do not update the timestamp.

### check Strategy

The snapshot checks whether any of the specified columns have changed values. If any configured column has a different value than the last snapshot, the row is recorded as a new version.

{% snapshot customers_snapshot %}

{{

config(

target_schema='snapshots',

unique_key='customer_id',

strategy='check',

check_cols=['country', 'plan_tier', 'account_owner_id']

)

}}

select * from {{ source('app_db', 'customers') }}

{% endsnapshot %}

The check strategy does not require an updated_at column — it compares column values directly. The downside: it must compare every row against every previous snapshot record, which is more expensive than the timestamp comparison.

check_cols='all' compares every column — useful when you want to track changes to any field but want to avoid listing them individually.

**When to use:** When the source does not have a reliable updated_at, or when you want to snapshot only specific columns rather than the entire row.

Key Configuration Options

**target_schema:** The schema where the snapshot table is created. Typically a dedicated snapshots schema, separate from the mart schema, since snapshot tables have a different structure (extra system columns) from mart models.

**unique_key:** The column (or expression) that uniquely identifies each row in the source. Must be truly unique in the source — duplicate unique_key values produce unreliable snapshot results.

**dbt_updated_at:** dbt adds system columns to snapshot tables: dbt_scd_id (a unique identifier for each snapshot record), dbt_updated_at (when dbt last updated this record), dbt_valid_from (when this version became current), and dbt_valid_to (when this version was superseded — null for the current record).

**Invalidate hard deletes:** When rows are deleted from the source, by default dbt does not update the snapshot. The deleted row's snapshot record remains open (dbt_valid_to remains null). The invalidate_hard_deletes option (available from dbt 1.0) detects rows deleted from the source and sets their dbt_valid_to to the snapshot run time.

Querying Snapshots

The pattern for querying the current state from a snapshot (equivalent to the source table):

select *

from {{ ref('customers_snapshot') }}

where dbt_valid_to is null

The pattern for querying historical state as of a specific date:

select *

from {{ ref('customers_snapshot') }}

where dbt_valid_from <= '2024-01-01'

and (dbt_valid_to > '2024-01-01' or dbt_valid_to is null)

The pattern for joining a fact table to the historical state of a dimension at the time of the fact:

select

o.order_id,

o.order_date,

o.revenue,

c.account_owner_id as owner_at_order_time,

c.plan_tier as tier_at_order_time

from {{ ref('fact_orders') }} o

join {{ ref('customers_snapshot') }} c

on o.customer_id = c.customer_id

and o.order_date >= c.dbt_valid_from

and (o.order_date < c.dbt_valid_to or c.dbt_valid_to is null)

This join pattern — joining on the natural key plus a date range condition — is the defining query pattern for SCD2 dimensions.

Common Snapshot Mistakes

**Snapshotting a large, frequently-changing table:** Snapshots run on every dbt execution. A table with millions of rows and frequent changes creates a large snapshot table that grows quickly and is expensive to query. Use incremental models for high-change-volume entities; reserve snapshots for genuinely slowly-changing dimensions.

**Using a non-unique unique_key:** If the source table has duplicate values on the configured unique_key, the snapshot produces unpredictable results — each run may capture different versions of the duplicate rows. Test unique_key uniqueness before and during snapshot deployment.

**Not handling soft deletes:** If the source marks deletions with an is_deleted flag rather than removing rows, the snapshot captures the is_deleted=true state as a new version — which is correct. But downstream models that filter out deleted records need to exclude snapshots where the current record has is_deleted = true.

**Inconsistent snapshot run frequency:** Snapshots capture state at the time they run. If a record changes and then changes back between two snapshot runs, the intermediate state is never captured. Snapshot run frequency should be appropriate for the change frequency of the data — daily for entities that change a few times per week; more frequent for rapidly changing entities.

Our data engineering consulting practice designs dbt transformation architectures including snapshot strategies — contact us to discuss historical state tracking for your data warehouse.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →