dbt Testing Strategies: Building a Data Quality Test Suite That Actually Works

How to build a dbt test suite that catches real data quality problems without becoming a maintenance burden — the four built-in test types, singular vs generic tests, when to use dbt-expectations and other packages, a tiered testing strategy by data criticality, and how to integrate test results into your data quality monitoring programme.

A dbt test suite that tests everything equally is a dbt test suite that tests nothing meaningfully. The goal of dbt testing is not coverage maximisation — it is risk reduction. The right strategy tests the right things at the right severity levels, integrates into your deployment pipeline without blocking it unnecessarily, and produces actionable signal when quality degrades. Here is how to build that.

The Four Built-In Test Types

dbt ships with four generic tests:

**not_null:** Asserts that no rows contain a null value for the specified column. Apply to any column that should always have a value — primary keys, foreign keys, required business fields. A null in an order_id column is always a data problem; test for it.

**unique:** Asserts that all values in the column are distinct. Apply to primary keys and natural keys. Uniqueness violations on a primary key break join cardinality and corrupt downstream aggregations silently.

**accepted_values:** Asserts that all values in the column appear in a defined list. Apply to categorical columns with known domains — status fields, type fields, categorical dimensions. Catches upstream system changes when new values are introduced without warning.

**relationships:** Asserts that every value in a foreign key column has a corresponding value in the referenced table's primary key column. Apply to all foreign key relationships in your dimensional model. Referential integrity violations produce null joins and silent undercounting in aggregations.

These four tests, applied systematically to every model's primary keys and critical foreign keys, catch a high proportion of common data quality failures with minimal configuration overhead. Start here before adding complexity.

Singular Tests

Generic tests cover structural constraints. Singular tests cover business logic. A singular test is a SQL query that returns the rows that fail the test — if the query returns any rows, the test fails.

Examples of what singular tests catch that generic tests cannot:

-- Orders with a ship_date before the order_date

select order_id, order_date, ship_date

from {{ ref('fact_orders') }}

where ship_date < order_date

-- Revenue rows with negative amounts (legitimate refunds handled separately)

select order_id, revenue

from {{ ref('fact_orders') }}

where revenue < 0 and order_type != 'refund'

-- Customer records with no associated orders after 30 days of existence

select c.customer_id

from {{ ref('dim_customers') }} c

left join {{ ref('fact_orders') }} o on c.customer_id = o.customer_id

where o.order_id is null

and c.created_date < current_date - interval '30 days'

Singular tests live in the 'tests/' directory. Name them descriptively: 'assert_ship_date_after_order_date.sql'. The file name becomes the test name in test results.

The most valuable singular tests encode the assumptions your data consumers rely on but that no generic test captures. Go through your critical reports and ask: what must be true in the data for this report to be correct? Then write tests for those assumptions.

The dbt-expectations Package

The dbt-expectations package (from Calogica) extends dbt's testing vocabulary with a broader set of generic tests modelled on Great Expectations. Useful additions:

**expect_column_values_to_be_between:** Asserts values fall within a numeric range. Apply to revenue columns (no negative values), age fields, percentage columns (0–100), and any column with known bounds.

**expect_column_proportion_of_unique_values_to_be_between:** Asserts that the proportion of unique values falls in a range. Useful for low-cardinality columns where uniqueness is expected to be partial — a country_code column should have low uniqueness but not a single value.

**expect_table_row_count_to_be_between:** Asserts that row count falls within expected bounds. Catches truncated loads (too few rows) and duplication events (too many rows). One of the most valuable tests in the package.

**expect_column_values_to_match_regex:** Asserts values conform to a pattern — email format, phone number format, postal code format. Catches upstream data entry issues before they reach consumers.

**expect_column_pair_values_a_to_be_greater_than_b:** Asserts a column-to-column relationship holds. Apply to any pair where order matters: end_date > start_date, close_date > open_date.

Add dbt-expectations to your packages.yml and prioritise the row count and range tests — they catch the failure modes (truncated loads, out-of-range values) that generic tests miss.

A Tiered Testing Strategy

Not all models are equally critical. A tiered strategy applies testing effort proportionally to business risk.

### Tier 1: Certified Production Sources

These are the tables business users trust for reporting. Any quality issue here produces a visible business impact.

Test everything:

- Primary key: not_null + unique on every model

- All foreign keys: relationships to referenced tables

- All required fields: not_null

- All categorical fields: accepted_values

- Row count bounds: expect_table_row_count_to_be_between with historical range

- Business logic: all relevant singular tests

- Severity: error — pipeline fails on any test failure

### Tier 2: Intermediate Transformation Models

These are the staging and intermediate models that feed Tier 1 outputs. Issues here will surface in Tier 1 tests, but catching them earlier speeds diagnosis.

Test structural constraints:

- Primary key: not_null + unique

- Critical foreign keys: relationships

- High-risk business logic singular tests

- Severity: warn on most tests, error on primary key tests

### Tier 3: Raw Source Models

Raw source models reflect upstream systems directly. Quality issues here are upstream problems, not transformation problems — but you still want visibility.

Test minimally:

- Primary key (if exists): not_null + unique

- Freshness: source freshness check in sources.yml

- Severity: warn — failures indicate upstream issues requiring investigation, not pipeline failures

The key principle: error severity is for tests that indicate a pipeline defect. Warn severity is for tests that indicate upstream or monitoring signal. Misconfiguring severity — making everything an error — produces alert fatigue and teams that learn to ignore test failures.

Severity Configuration

In schema.yml, configure severity per test:

models:

- name: fact_orders

columns:

- name: order_id

tests:

- not_null:

severity: error

- unique:

severity: error

- name: customer_id

tests:

- not_null:

severity: error

- relationships:

to: ref('dim_customers')

field: customer_id

severity: warn

- name: order_status

tests:

- accepted_values:

values: ['pending', 'confirmed', 'shipped', 'delivered', 'cancelled']

severity: warn

Source Freshness Testing

dbt's source freshness testing is separate from model tests but equally important. Define freshness expectations in sources.yml, specifying warn_after and error_after thresholds with a count and period — for example, warn after 6 hours, error after 24 hours. Set the loaded_at_field to the timestamp column that indicates when the row was ingested (typically _fivetran_synced for Fivetran-managed sources).

Run 'dbt source freshness' before 'dbt run' in your pipeline. A stale source should warn before transformations run on stale data. Source freshness failures are almost always upstream — Fivetran connector failures, API rate limits, source system outages — and catching them before transformation runs avoids building downstream artefacts on bad inputs.

Integrating Tests into CI/CD

The pipeline integration pattern:

1. Run 'dbt source freshness' — fail on error-severity freshness issues

2. Run 'dbt run --select state:modified+' — run modified models and their downstream dependents

3. Run 'dbt test --select state:modified+' — run tests for modified models

4. On PR merge to main: run 'dbt run' and 'dbt test' on the full DAG

The state:modified+ selector limits CI runtime significantly on large DAGs. Only the models changed in the PR and their downstream dependents are tested, not the entire model graph.

For critical production pipelines, run the full test suite after every production run. For development, run modified-only. The distinction keeps CI fast without sacrificing production safety.

Making Test Results Actionable

Running tests only produces value if failures are investigated. Two failure patterns to watch for:

**Persistent warnings:** Tests configured at warn severity that fail on every run become background noise. Either fix the underlying issue, raise severity to error, or remove the test. Persistent warnings that nobody acts on are worse than no tests — they train the team to ignore test output.

**New failures in production:** When a test that previously passed begins failing, treat it as an incident. Identify the source (upstream change? pipeline bug? schema change?) and resolve it before the next business day. Quality issues that persist for multiple days before resolution indicate a broken ownership model, not a testing problem.

Our data engineering consulting practice implements dbt testing frameworks as part of production analytics platforms — contact us to discuss your data quality and testing requirements.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →