BlogData Architecture

Advanced dbt Patterns: Scaling Transformations Beyond the Basics

James Okafor
James Okafor
Senior Data Engineer
·August 14, 202713 min read

dbt is well-understood at the basic level — staging models, marts, tests, docs. The patterns that separate dbt projects that scale from those that accumulate technical debt are less commonly documented: how to manage incremental models correctly, design generic tests that catch real issues, structure macros without overcomplicating them, and operate dbt in a production environment reliably.

The basic dbt project — staging models, intermediate models, marts, tests, and docs — is well understood. What is less documented are the patterns that determine whether dbt projects scale cleanly or accumulate technical debt: how to model incremental loads without subtle correctness bugs, how to write generic tests that catch real issues without generating noise, how to use macros without building unmaintainable complexity, and how to operate dbt in production reliably.

Incremental Model Correctness

Incremental models process only new or changed records rather than rebuilding the full table on every run. They are essential for large tables where a full rebuild would take too long; they are also the most common source of correctness bugs in dbt projects.

The fundamental requirement for an incremental model is: the is_incremental() filter must correctly identify all new and changed records. For append-only sources (event tables, log tables where records are never updated), filtering on 'created_at > max(created_at)' in the target table is straightforward. For sources where existing records are updated (order status tables, user profile tables, any mutable entity), the filter must capture both new records and changed records.

Common correctness bugs:

**High-water mark on a non-monotonic column** — filtering on max(updated_at) assumes that updated_at is always set to the current timestamp on every update. If some records are updated without touching updated_at (or if the application has a bug that sometimes misses the update), those records will not be captured in the incremental run.

**Window aggregates over incomplete history** — if the incremental model computes window functions (rolling sums, lead/lag) that depend on records before the incremental cut-off, the window will have incorrect results for records at the boundary. The fix is to extend the lookback window to include records that might be affected by boundary effects, then deduplicate after processing.

**Missing hard deletes** — filtering on updated_at only captures insertions and updates. Records deleted from the source are not captured. If the target table should reflect source deletions, either use a full refresh periodically or implement soft delete handling (a separate process that identifies and marks deleted records).

The 'unique_key' configuration in incremental models determines the merge behaviour: when a record with the same unique_key already exists in the target, it is updated rather than inserted. Ensuring unique_key correctly identifies the business entity (not a surrogate or auto-generated ID) is critical for correct upsert behaviour.

Generic Test Design

dbt's built-in generic tests (not_null, unique, accepted_values, relationships) are the correct starting point, but they miss the class of errors that matter most in analytical contexts: business logic violations.

Effective generic test patterns beyond the defaults:

**Numeric range tests** — for metrics that should always be positive, always below a threshold, or always within a seasonal range, row-level range tests catch data that is technically valid (not null, correct type) but analytically wrong. Revenue per transaction that is negative indicates a refund that was not handled correctly; a percentage that is above 100 indicates a calculation error.

**Recency tests** — for tables that should be refreshed regularly, a test that fails if the maximum date in the table is more than N days in the past detects stale data. A mart table that has not been updated in 3 days when it should refresh daily indicates a pipeline failure that may not have triggered a visible error.

**Referential integrity tests** — the relationships generic test checks that foreign key values exist in the referenced table. Extending this to composite keys and to relationships across schemas requires custom generic tests, but they catch the join breakages that produce unexplained drops in mart record counts.

**Aggregate consistency tests** — verifying that a derived aggregate (total revenue in the mart) matches the source aggregate (sum of line items in the staging table). This is the most valuable test class for catching transformation logic errors, and the least commonly implemented.

Avoiding test noise is as important as writing good tests. A test that fails frequently due to expected conditions (late-arriving data, known source system quirks) generates alert fatigue and is eventually silenced or ignored. Tests should be written to fail only when something is genuinely wrong — which sometimes requires configuring warning thresholds rather than hard failures, or excluding known-bad records from the test scope.

Macro Design Principles

Macros in dbt abstract repeated SQL patterns into reusable Jinja functions. They are valuable; they are also the most common source of unmaintainable dbt code.

The appropriate uses for macros:

**Abstracting project-specific repeated patterns** — date truncation with project-specific timezone handling, standard column aliasing conventions, project-wide fiscal year calculation. These are patterns that change uniformly across the project and benefit from a single definition.

**Cross-database compatibility** — adapters for functions that differ between Snowflake, BigQuery, and Redshift (date arithmetic, string functions, approximate counting). dbt's built-in cross-database macros cover most common cases; project-level macros fill gaps.

**Dynamic SQL generation** — generating UNION ALL across a known list of sources, pivoting dynamic column lists, building filter conditions from variable lists.

Macros should not replace legible SQL with abstraction. A macro that generates a complex SELECT statement with many parameters is typically harder to understand, debug, and test than the equivalent SQL written directly. The test for whether a macro is appropriate: does it make the calling code clearer, or does it obscure what the model does?

Macros should be unit-testable via the dbt-unit-testing package or via manual testing of their rendered output. A macro with a bug that affects many models produces widespread failures that are difficult to diagnose without knowing which macro to inspect.

Production Operations

Running dbt reliably in production requires operational discipline beyond the dbt project itself:

**Environment separation** — development, staging, and production schemas should be separate. Development models run in developer-isolated schemas (dev_username); staging runs in a shared pre-production schema; production runs in the production schema with governance controls. The separation prevents development work from affecting production data.

**Orchestration integration** — dbt runs triggered by Airflow, Prefect, or Dagster should have explicit upstream dependencies. The dbt run should not start until upstream data sources have been loaded. Scheduling dbt at a fixed time without source dependency tracking produces failures when upstream sources are late.

**Artifact management** — the dbt manifest.json and catalog.json produced on each run contain project metadata. Storing these artifacts (in S3, GCS, or the dbt Cloud artifacts API) enables historical comparison: detecting which models changed between runs, tracking test failure trends over time, and powering metadata catalogue integrations.

**Partial failure handling** — when one model in a large project fails, dbt marks all dependent downstream models as skipped rather than running them with potentially incorrect upstream data. The orchestration should handle partial failure gracefully: alert on the failure, allow investigation and remediation, and rerun the failed node and its dependents without rerunning the entire project.

**Slim CI** — running the full project in CI on every pull request is expensive and slow. Slim CI runs only the modified models and their direct downstream dependents using 'dbt run --select state:modified+', comparing against the production manifest. This dramatically reduces CI run time while validating that changed models and their immediate dependents pass tests.

Our data architecture practice reviews and improves dbt projects for organisations scaling their analytics engineering practice — contact us to discuss dbt optimisation and governance for your project.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →