BlogData Engineering

dbt Incremental Models: When to Use Them and How to Get Them Right

Obed Tsimi
Obed Tsimi
Founder & Senior Tableau Architect
·January 3, 202711 min read

Incremental models in dbt process only new or changed data rather than rebuilding the entire table on every run. This guide covers when incremental models are appropriate, the three incremental strategies (append, delete-insert, merge), the unique_key pattern, late-arriving data handling, and the subtle bugs that trip up most implementations.

Most dbt models are table or view materializations: they rebuild the entire dataset on every run. For small to medium tables, this is fine. For tables with billions of rows, rebuilding from scratch on every run is prohibitively expensive and slow. Incremental models solve this by processing only the new or changed data since the last run. But incremental models introduce complexity that full rebuilds do not have — and getting that complexity wrong produces subtle bugs that are hard to detect.

When to Use Incremental Models

Incremental models are appropriate when:

- **The table is large enough that full rebuilds are expensive.** For most organisations, this threshold is around 50–100M rows depending on row width and warehouse pricing. Below that threshold, full rebuilds are fast and cheap enough that the additional complexity of incremental materialisation is not worth it.

- **New data arrives in an append-only or update-known-keys pattern.** Incremental models handle "new rows since yesterday" and "updates to specific records" well. They do not handle arbitrary changes to historical data (late-arriving corrections applied across many historical rows) without additional logic.

- **The full history needs to be preserved.** If you only need the current state of a slowly changing dimension, a snapshot model is more appropriate.

Do not default to incremental models for performance reasons before confirming that a full rebuild is actually too slow or too expensive. Full rebuilds are simpler and less error-prone.

The Three Incremental Strategies

dbt's incremental materialisation offers three strategies, configurable via the incremental_strategy parameter:

### append

The simplest strategy. On incremental runs, dbt inserts new rows matching the incremental filter; it never updates or deletes existing rows. For fact tables where events are immutable (a transaction never changes after it is created), append is correct and efficient.

Configuration:

{{ config(

materialized='incremental',

incremental_strategy='append'

) }}

The incremental filter must be defined in the model's SQL using the is_incremental() macro:

select *

from {{ ref('stg_events') }}

{% if is_incremental() %}

where event_timestamp > (select max(event_timestamp) from {{ this }})

{% endif %}

The is_incremental() block is only evaluated on incremental runs. On the first run (or after a full refresh), the filter is omitted and the entire source is processed.

**When to use:** Immutable event streams — clickstream, transactions, log data where records are never updated after creation.

### delete+insert

On incremental runs, dbt deletes the rows in the target table that match the unique_key, then inserts the new rows from the incremental run. This handles updates to existing records by replacing them.

Configuration:

{{ config(

materialized='incremental',

incremental_strategy='delete+insert',

unique_key='order_id'

) }}

**When to use:** Tables where individual records can be updated (order status changes, subscription state changes) and you want to replace the old version with the current version. The unique_key identifies which records to delete before inserting.

**Limitation:** delete+insert processes updates only for the records in the incremental window. Historical corrections (a record created 18 months ago that is updated today) are handled correctly only if the incremental filter captures the updated_at timestamp, not just the created_at.

### merge

The merge strategy uses a SQL MERGE statement (UPSERT — insert new records, update existing ones). It is more efficient than delete+insert because it processes the update in a single statement rather than two.

Configuration:

{{ config(

materialized='incremental',

incremental_strategy='merge',

unique_key='order_id'

) }}

**When to use:** Same use cases as delete+insert, but preferred where the data warehouse supports MERGE efficiently (Snowflake, BigQuery, Databricks). Redshift historically had limited MERGE support; check your warehouse's implementation.

The unique_key Pattern

For delete+insert and merge strategies, the unique_key tells dbt which column (or columns) identifies a unique record. On incremental runs, existing records matching the unique_key are updated; new records are inserted.

For single-column keys:

{{ config(unique_key='order_id') }}

For composite keys (when no single column uniquely identifies a record):

{{ config(unique_key=['order_id', 'line_item_id']) }}

**Critical requirement:** The unique_key must actually be unique in both the source and target. If it is not unique, merge and delete+insert will produce incorrect results — either updating the wrong record or creating duplicates. Test uniqueness on the unique_key columns before implementing incremental materialisation.

The Incremental Filter Pattern

The incremental filter defines which rows from the source are processed on each incremental run. The most common pattern uses a timestamp column:

{% if is_incremental() %}

where updated_at > (select max(updated_at) from {{ this }})

{% endif %}

This looks up the maximum updated_at value in the existing target table and selects only rows updated after that timestamp.

**Late-arriving data:** The max(updated_at) filter has a late-arriving data problem. If a record is created with a past timestamp (backdated data entry, delayed event delivery), it will have an updated_at value below the current maximum and will never be processed by incremental runs. For sources with known late-arriving data problems, use a lookback window:

{% if is_incremental() %}

where updated_at > (select max(updated_at) from {{ this }}) - interval '3 days'

{% endif %}

The lookback window reprocesses the last 3 days on every incremental run, at the cost of slightly higher processing volume but catching late-arriving data within the lookback period.

**Clock skew and timezone issues:** If the source system stores timestamps in a different timezone than the warehouse, the max(updated_at) comparison can miss records. Ensure timestamps are normalised to UTC in staging models before using them in incremental filters.

Full Refresh

Incremental models can be rebuilt from scratch with the dbt run --full-refresh flag. Full refresh drops the existing table and rebuilds it from the first run as a non-incremental model.

When to run full refresh:

- When the model's SQL logic changes in a way that requires reprocessing historical data

- When the incremental filter has failed and the table may be incomplete or inconsistent

- On a periodic schedule to prevent incremental drift accumulation

Establish a regular full refresh cadence for critical incremental models — monthly or quarterly is typical. Waiting for problems to appear before running a full refresh means the refresh happens under pressure during an incident.

Testing Incremental Models

Incremental models require more testing than full rebuild models because bugs can accumulate silently over many runs.

**Test unique_key uniqueness:** A not_null + unique test on the unique_key columns catches cases where the source is not producing the expected uniqueness guarantee.

**Test row count reasonableness:** After each incremental run, the row count should increase by approximately the expected new row volume. A row count that did not increase when new data was expected indicates the incremental filter is too restrictive. A row count that increased much more than expected indicates potential duplication.

**Periodic reconciliation tests:** Monthly or quarterly, compare the incremental model's aggregate outputs against a fresh full rebuild or against the source system's aggregates. Incremental drift — small discrepancies that accumulate over time — is detected by reconciliation and missed by run-level testing.

Common Incremental Model Bugs

**Duplicate records:** Caused by a unique_key that is not actually unique in the source, or by using the append strategy on a source where records can be updated. Undetected for long periods; causes silent overcounting in downstream aggregations.

**Missing records:** Caused by an incremental filter that is too restrictive — using created_at instead of updated_at for records that change after creation. Updated records are never reprocessed. Causes stale state in the model without any error signal.

**Schema evolution errors:** When the source schema gains a new column, the incremental model's schema does not automatically update on incremental runs (it would require a full rebuild). dbt's on_schema_change configuration handles this: setting on_schema_change='sync_all_columns' adds new columns from the source to the incremental model on the next run.

Our data engineering consulting practice designs and implements dbt model architectures including incremental strategies — contact us to discuss your transformation layer.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →