dbt sources are YAML definitions that describe the raw database tables and views that dbt models build on. They provide a governed entry point into the raw data layer — with freshness assertions, data tests, and documentation — so that transformation models have a reliable, tested foundation rather than direct references to undocumented upstream tables.
dbt sources are YAML definitions that declare the raw database tables — the tables in your warehouse that were written by your data ingestion pipelines, ELT tools, or operational database replication — that your dbt models read from. Without sources, a dbt model references an upstream table using a direct schema.table reference. With sources, the same table is referenced using the source() function, which provides documentation, testing, freshness monitoring, and lineage tracking that a direct table reference cannot.
What Sources Enable
A source definition looks like this:
version: 2
sources:
- name: salesforce
database: raw
schema: salesforce
description: Raw Salesforce data loaded by Fivetran
tables:
- name: accounts
description: One row per Salesforce account
loaded_at_field: _fivetran_synced
freshness:
warn_after: {count: 12, period: hour}
error_after: {count: 24, period: hour}
columns:
- name: id
description: Salesforce account ID
tests:
- not_null
- unique
- name: name
description: Account name
tests:
- not_null
And in a staging model:
SELECT
id as account_id,
name as account_name,
industry
FROM {{ source('salesforce', 'accounts') }}
The source() function compiles to the full qualified table reference (raw.salesforce.accounts) but also registers the dependency in dbt's lineage graph — the model appears downstream of the source in the dbt DAG. Direct schema.table references are invisible to dbt's lineage tracking.
Source Freshness Monitoring
The most operationally valuable feature of sources is freshness checking. dbt source freshness compares the maximum value of a freshness timestamp column (loaded_at_field) against the current time and reports whether the source is within configured freshness thresholds.
Run freshness checking with:
dbt source freshness
This command queries every source table that has loaded_at_field configured and reports:
- **Pass** — the source is within the warn threshold
- **Warn** — the source is older than warn_after but newer than error_after
- **Error** — the source is older than error_after
Freshness checking is the first line of defence against silent data pipeline failures. An extract that stops refreshing will not produce obvious errors in downstream models — the models run successfully, they just process stale data. Without freshness monitoring, stale data propagates silently through the transformation layer into dashboards and reports. With freshness monitoring, the pipeline failure is detected at the source before it reaches downstream consumers.
Freshness checking is typically run as a step in the dbt orchestration workflow — before the main dbt build — so that model execution is skipped if source data is stale.
Source Tests
Source columns can be tested the same way model columns can — using the same not_null, unique, accepted_values, and relationships tests. Source tests run on the raw data before transformation:
- name: status
tests:
- accepted_values:
values: ['open', 'closed', 'pending', 'cancelled']
Source tests serve a different purpose than model tests. Model tests verify that transformation logic produces correct outputs. Source tests verify that raw inputs match expectations. A source test that fails indicates a problem with the upstream data pipeline or source system, not with the dbt transformation logic.
Testing sources separately from models also makes debugging faster: when a test fails, you know immediately whether the issue is in the source data (source test failure) or in the transformation logic (model test failure).
Source Documentation
Sources support the same documentation features as models: description fields at the source, table, and column levels, rendered in the dbt docs site. Well-documented sources are the foundation for data discovery — they describe the origin, meaning, and quality characteristics of the raw data that feeds every downstream transformation.
Source documentation should include:
- The system the data comes from (Salesforce, Stripe, PostgreSQL operations database)
- The tool or process that loads the data (Fivetran, Airbyte, custom pipeline)
- The refresh cadence
- Known data quality issues or limitations
- Contact information for the team responsible for the data
This documentation is especially valuable for onboarding new analytics engineers and for debugging data quality issues in production.
Staging Models and the Source Convention
The standard dbt project convention is that staging models — the first layer of transformation — are the only models that reference sources directly. All other models reference staging models, not sources.
This convention produces a clean separation:
- Sources: raw, as-loaded data with source() references
- Staging: one-to-one with sources, applying basic cleaning and renaming
- Marts: business logic built on staging models, never on sources directly
The convention ensures that if a source schema changes, the impact is confined to the corresponding staging model. Downstream mart models do not need to be updated because they never reference the source directly.
Source Overrides for Testing
dbt supports replacing source references with custom queries using --vars or the source override capability, allowing CI tests to run against test data rather than production source tables. This is useful for CI environments where the raw tables may not contain appropriate test data or where running dbt tests against production operational systems is undesirable.
Our data architecture practice designs dbt project structures including source configuration, freshness monitoring, and staging architecture for enterprise analytics teams — contact us to discuss your dbt project design.
A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.
Book a Call →