dbt snapshots capture the state of a source table at regular intervals, building a history of changes that a simple SELECT cannot reconstruct. For dimensions that change over time — customer segments, product categories, account statuses — snapshots are the mechanism for answering historical questions accurately: what segment was this customer in when they placed that order six months ago?
dbt snapshots capture the historical state of a mutable database table by recording changes over time. Each time dbt snapshot runs, it compares the current state of the source table against the previously captured state, identifies rows that have changed or been added, and appends new records with timestamps recording when each version of the row was valid. The result is a Type 2 slowly changing dimension (SCD2) table — a historical record of every state a row has been in, with valid-from and valid-to timestamps.
The Problem Snapshots Solve
Most source tables in a data warehouse only show the current state. A customers table shows each customer's current segment, status, and attributes. A query run today returns today's values. A query run six months ago would have returned different values — but without snapshots, those historical values are gone.
This becomes a problem for analytical accuracy. Consider a question like: "what was the average order value for customers in the 'Enterprise' segment at the time of purchase?" If customer segment assignments change over time — customers move between segments based on spend patterns — answering this question accurately requires knowing what segment each customer was in at the time of each order, not their current segment.
Without snapshots, the only answer to this question is based on current segment assignments applied retroactively — which will be wrong for customers who changed segments. With snapshots, the historical segment assignment at the time of each order can be joined to order data accurately.
Snapshot Configuration
Snapshots are defined in .sql files in the snapshots/ directory:
{% snapshot customers_snapshot %}
{{
config(
target_schema='snapshots',
unique_key='customer_id',
strategy='timestamp',
updated_at='updated_at',
)
}}
SELECT
customer_id,
customer_name,
segment,
status,
account_manager,
updated_at
FROM {{ source('crm', 'customers') }}
{% endsnapshot %}
Key configuration options:
**unique_key** — the column that uniquely identifies a row. When this key appears in a new snapshot run, dbt compares the row's other columns to detect changes.
**strategy** — how dbt detects changes:
- *timestamp* — compares the updated_at column value. If updated_at is newer than the snapshot record, the row has changed.
- *check* — compares a specified set of columns. If any column in the check_cols list has changed, the row is considered changed. Use when the source table has no reliable updated_at column.
**updated_at** — the timestamp column used by the timestamp strategy.
**target_schema** — the schema where the snapshot table is written. Typically a dedicated snapshots schema separate from other mart schemas.
Snapshot Table Structure
The snapshot table produced by dbt contains the source table's columns plus four metadata columns added by dbt:
**dbt_scd_id** — a surrogate key that uniquely identifies each record in the snapshot table (each version of each row).
**dbt_updated_at** — the timestamp of when dbt detected this version of the row.
**dbt_valid_from** — the timestamp from which this version of the row was valid. For new rows, this is the snapshot run time.
**dbt_valid_to** — the timestamp at which this version of the row became invalid. For the current version of a row, dbt_valid_to is NULL. For historical versions, dbt_valid_to is the timestamp when the next version was detected.
Querying Snapshots
To query the current state of all rows (equivalent to the source table):
SELECT * FROM customers_snapshot
WHERE dbt_valid_to IS NULL
To query the state at a specific point in time:
SELECT * FROM customers_snapshot
WHERE '2024-06-01' >= dbt_valid_from
AND ('2024-06-01' < dbt_valid_to OR dbt_valid_to IS NULL)
To join orders to the customer segment that was active at the time of the order:
SELECT
o.order_id,
o.order_date,
o.revenue,
c.segment as segment_at_order_time
FROM orders o
LEFT JOIN customers_snapshot c
ON o.customer_id = c.customer_id
AND o.order_date >= c.dbt_valid_from
AND (o.order_date < c.dbt_valid_to OR c.dbt_valid_to IS NULL)
This join pattern — matching on the entity key and the valid-from/valid-to window — is the standard pattern for using snapshot data in analytical queries.
Snapshot Run Frequency
Snapshot granularity is determined by how frequently dbt snapshot runs. A daily snapshot captures the state once per day; changes that happen and revert within a day are invisible. For most slowly changing dimensions — customer segments, account statuses, product categorisations — daily snapshots provide sufficient granularity.
For dimensions that change very rapidly or where intra-day change history is required, snapshots may not be the right mechanism. Consider change data capture (CDC) for near-real-time change history requirements.
Snapshot Limitations
**Initial history** — snapshots only capture history from the point when dbt snapshot first runs. They do not retroactively reconstruct history from the source table. If a snapshot is first run today, all rows will have dbt_valid_from equal to today; no prior history is recorded.
For sources that have their own created_at and updated_at history, it may be possible to reconstruct historical snapshots by querying the source table's change log or audit tables, but this requires source-system-specific work outside of dbt.
**Source table must not lose history** — the timestamp strategy requires that the source table's updated_at column reliably updates when a row changes. If updated_at does not update on every change, or if rows are hard-deleted from the source rather than soft-deleted, the snapshot will miss changes. Validate the source table's update behaviour before relying on timestamp-based snapshots for analytical accuracy.
**Hard deletes** — by default, hard-deleted source rows are not captured in snapshots — the row simply stops appearing in new snapshot runs, but its dbt_valid_to is never set. The invalidate_hard_deletes snapshot configuration option changes this behaviour: dbt sets dbt_valid_to for rows that disappear from the source, treating disappearance as a "deletion" event.
Our data architecture practice designs historical data capture strategies including dbt snapshots and SCD Type 2 implementations for enterprise analytics teams — contact us to discuss your historical data architecture.
A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.
Book a Call →