dbt Project Structure: Organising Models, Tests, and Documentation

A well-structured dbt project is maintainable, testable, and navigable. This guide covers the standard folder structure, naming conventions for each layer, how to organise tests and documentation, and the configuration patterns that make projects scale.

A well-structured dbt project is easier to navigate, easier to test, and easier to maintain as the team and model count grow. Most dbt projects that become hard to maintain share the same structural problems: models in flat directories with no naming conventions, business logic in staging models, tests missing or inconsistent, and documentation absent. This guide covers the standard project structure, naming conventions, and configuration patterns that keep dbt projects manageable at scale.

The standard folder structure

dbt recommends a specific folder structure that most mature projects follow with minor variations:

models/

staging/ -- one-to-one with source tables, minimal transformation

intermediate/ -- multi-source joins, business logic not ready for end consumers

marts/ -- final analytical models for specific business domains

finance/

marketing/

operations/

tests/ (or inline in schema.yml)

macros/

seeds/

analyses/

snapshots/

docs/

This structure reflects the data transformation layers. Each folder's contents have a defined scope:

**staging/**: One model per source table. The only transformations are: renaming columns to business-friendly names, casting data types, and basic null handling. No joins. No business logic. No aggregations. Naming convention: stg_{source_name}__{table_name} (e.g., stg_stripe__payments, stg_salesforce__accounts).

**intermediate/**: Models that join or transform staging data but are not yet ready for direct business consumption. Row-level logic that applies to multiple downstream marts. Joins that prepare data for a specific analytical use case but that are not the final mart model. Naming convention: int_{entity}_{verb} (e.g., int_orders_enriched, int_customers_with_lifetime_value).

**marts/**: The final analytical models consumed by BI tools and analytics teams. Organised by business domain (finance/, marketing/, operations/). Naming convention: fact_{entity} for event-grain tables, dim_{entity} for dimension tables, rpt_{name} for report-specific aggregates.

Naming conventions and why they matter

Consistent naming conventions make the project navigable for engineers who did not write the models, for analysts who reference model names in their BI tools, and for documentation that references specific models.

**Staging models (stg_)**: The stg_ prefix signals "this is a thin layer on top of a source, not business logic." If someone needs to know what Stripe's payments table looks like after basic cleaning, they look in stg_stripe__payments.

**Intermediate models (int_)**: The int_ prefix signals "this is a building block, not a final product." Intermediate models should not be queried directly by BI tools — they are inputs to mart models.

**Fact tables (fct_)**: Append-only or slowly growing event tables at a specific grain. fct_orders, fct_pageviews, fct_payments.

**Dimension tables (dim_)**: Entities with descriptive attributes. dim_customers, dim_products, dim_accounts.

**Report models (rpt_)**: Pre-aggregated summaries for specific reporting use cases. rpt_daily_revenue, rpt_weekly_cohorts. Use sparingly — most BI tools can aggregate from fact tables efficiently.

Schema.yml: tests and documentation

Each model layer should have a corresponding schema.yml file that defines:

- Tests on every primary key (unique and not_null at minimum)

- Tests on every foreign key (relationships to the referenced dimension)

- Column descriptions for all significant columns

- Model descriptions stating what each model represents and its grain

The schema.yml file is co-located with the models it documents — staging schema.yml is in models/staging/, marts schema.yml is in models/marts/. This keeps documentation physically near the code it documents.

A complete staging model schema.yml entry includes: the model name, description, grain statement, and column definitions with tests and descriptions for every column.

dbt_project.yml configuration

The dbt_project.yml file configures project-wide defaults. Important configurations:

**model paths and materialisation defaults**: Set default materialisation by folder — staging as views (lightweight, no storage cost), marts as tables (pre-materialized for query performance). Override per-model for exceptions.

**Tags**: Apply tags at the folder level for selective test or run execution. Apply a staging tag to all staging models, a marts tag to all marts models. Use tags in run commands: dbt run --select tag:marts.

**Variables**: Define project-level variables for environment-specific configuration — schema prefixes for development vs production, date cutoffs for development data limiting, feature flags.

**Schema and database overrides**: Use the generate_schema_name and generate_database_name macros to control where models materialise in different environments. Development: dev_{username}_{model_schema}. Production: prod_{model_schema}.

Sources configuration

Source tables (the raw tables from upstream systems) are defined in sources.yml files, typically in the staging directory. Source definitions enable: dbt source() references (more explicit than ref()), source freshness testing (alerting when a source table is stale), and documentation of the source system's contract.

Each source should specify: the database and schema, the table name, a description, and a freshness SLA (loaded_at_field and max_freshness). Run dbt source freshness to check whether all sources are within their freshness windows.

Snapshot configuration

Slowly changing dimensions that require Type 2 history are implemented as dbt snapshots in the snapshots/ directory. Snapshots capture the state of a table at each run and add dbt_valid_from and dbt_valid_to columns. The target for snapshot models is typically a separate database schema (e.g., snapshots schema) from the mart models.

Testing strategy

A minimal testing strategy for a maintainable project:

- Every primary key: unique + not_null

- Every foreign key: relationships test

- Status or type columns with known values: accepted_values

- At least one generic custom test per model verifying row count is non-zero

Beyond this minimum, add business-logic-specific tests for the most critical models — revenue calculations, customer counts, active user definitions. These tests catch the regressions that matter most when business logic changes.

For the broader dbt context, see dbt best practices and dbt macros guide. Our data architecture consulting practice audits and refactors dbt projects for maintainability and correctness — book a free review if your dbt project has become hard to maintain.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →