dbt Project Structure: How to Organise a dbt Project That Scales

How to structure a dbt project that remains navigable and maintainable as it grows — the staging/intermediate/mart layering convention, directory organisation, file naming conventions, how to use subdirectories for domain separation, configuration inheritance through dbt_project.yml, and the project structure decisions that matter most at different stages of growth.

A dbt project that starts well-structured and stays well-structured as it grows is the exception, not the rule. Most projects start small and accumulate models without a consistent organisational framework. At 20 models, a messy project is navigable. At 200 models, it becomes the barrier to onboarding new engineers and the reason changes take longer than they should.

The conventions in this guide are opinionated. They reflect what works at scale and have become the de facto community standard — but they are conventions, not requirements. If your team has a good reason to deviate, deviate.

The Three-Layer Convention

The most important structural decision in a dbt project is adopting the three-layer staging/intermediate/marts convention. Every model should belong to one of these layers, and each layer has a clear responsibility.

### Staging Layer

Staging models are one-to-one with source tables. One staging model per source table. Their responsibility is standardisation only:

- Rename columns to consistent naming conventions (snake_case, prefix removal)

- Cast data types to correct types (convert string dates to DATE, string amounts to NUMERIC)

- Apply basic cleaning (lower-case email addresses, trim whitespace from string fields)

- Add computed surrogate keys if the source does not have a natural primary key

- Filter out known-bad records that should never appear in downstream models (test records, internal users)

Staging models do NOT:

- Join to other tables

- Apply business logic

- Aggregate data

The staging model for Fivetran's Salesforce opportunity table is called stg_salesforce__opportunity. It contains the Salesforce fields, renamed and typed, and nothing else.

### Intermediate Layer

Intermediate models exist to assemble and enrich entities from staging models. They handle the joining, enriching, and business logic application that staging models do not.

Intermediate models are the right place for:

- Joining related staging models to produce a unified entity (int_customers__with_orders joins stg_salesforce__account with stg_stripe__customers)

- Applying business logic classifications (customer tier assignment, cohort grouping, product category mapping)

- Computing entity-level metrics that will be used in multiple downstream marts

- Deduplications that cannot be handled at the staging layer

Intermediate models should not be exposed directly to BI tools — they are building blocks for mart models.

### Marts Layer

Mart models are the final analytical outputs consumed by BI tools and analysts. They are designed around specific analytical use cases or business domains.

Mart models should:

- Be wide enough that analysts do not need to join to get common associated attributes

- Be pre-aggregated to the grain that most analytics consume

- Have clearly documented grain (one row represents what?)

- Be stable: column names and grain do not change without a migration plan

Common mart patterns:

- fct_orders: One row per order, with customer attributes, product attributes, and order financial metrics

- fct_web_sessions: One row per session, with user attributes and session engagement metrics

- dim_customers: One row per customer, with current state of all customer attributes

Directory Organisation

The directory structure should mirror the layer convention:

models/

staging/

salesforce/

stg_salesforce__opportunity.sql

stg_salesforce__account.sql

stg_salesforce__contact.sql

_salesforce__sources.yml

_salesforce__models.yml

stripe/

stg_stripe__customer.sql

stg_stripe__invoice.sql

_stripe__sources.yml

_stripe__models.yml

intermediate/

finance/

int_orders__with_customers.sql

int_subscriptions__with_mrr.sql

marketing/

int_campaigns__with_spend.sql

marts/

finance/

fct_orders.sql

fct_subscriptions.sql

dim_customers.sql

_finance__models.yml

marketing/

fct_campaigns.sql

dim_channels.sql

Key principles:

- Staging models grouped by source system (one directory per source)

- Intermediate and mart models grouped by business domain

- Schema YAML files live in the same directory as the models they document

File Naming Conventions

Consistent naming makes models findable and communicates their layer and type:

**Staging models:** stg_{source}__{table} (double underscore separates source from table)

- stg_salesforce__opportunity

- stg_stripe__invoice

- stg_google_analytics__sessions

**Intermediate models:** int_{entity}__{transformation} (describes the entity and what was done)

- int_customers__with_orders

- int_orders__with_revenue

**Mart fact models:** fct_{business_event} (describes the business event the fact represents)

- fct_orders

- fct_sessions

- fct_subscriptions

**Mart dimension models:** dim_{entity} (describes the entity)

- dim_customers

- dim_products

- dim_dates

**Mart report models (optional):** rpt_{use_case} (for specific report-oriented aggregations)

- rpt_executive_summary

- rpt_cohort_analysis

dbt_project.yml Configuration

dbt_project.yml is where project-wide configuration is set. The most important configuration: materialisation defaults by directory.

models:

my_project:

staging:

+materialized: view

+schema: staging

intermediate:

+materialized: ephemeral

marts:

+materialized: table

+schema: marts

Staging models as views: they are thin wrappers over source tables; rebuilding them as tables wastes storage and adds rebuild time without benefit.

Intermediate models as ephemeral: they are compiled inline into the queries that reference them, not materialised as database objects. This reduces the number of objects in your schema. If an intermediate model is referenced by many downstream models and is expensive to compute, materialise it as a view or table.

Marts as tables: they are queried by BI tools and should be pre-materialised for fast query response.

Schema File Organisation

Each directory should contain a schema YAML file documenting all models in that directory. Naming convention: __{directory_name}__models.yml.

Do not put all model documentation in a single monolithic schema.yml at the project root. At scale, a single file becomes a merge conflict nightmare and is too large to navigate.

Project Growth Stages

**Small project (under 50 models):** The full three-layer convention is worth implementing from the start, even if some layers have only a few models. The discipline pays off as the project grows.

**Medium project (50–200 models):** Domain-based subdirectories within marts and intermediate become necessary. Without them, the directory becomes a flat list of 50+ files that is hard to navigate.

**Large project (200+ models):** Consider splitting the dbt project into multiple projects (dbt project dependencies, available since dbt 1.6) aligned to business domains. The commercial analytics project, the product analytics project, and the finance analytics project become separate dbt projects that reference each other through cross-project refs. This is a significant architectural change; make it when the single-project monorepo is genuinely causing problems, not preemptively.

What Not to Optimise

**Premature ephemeral optimisation:** Making models ephemeral reduces schema object count but complicates debugging (ephemeral models cannot be queried directly). Only make models ephemeral if they are genuinely intermediate computation that should not be exposed. When in doubt, use view.

**Deep nesting:** More than three directory levels (models/staging/salesforce/crm/) creates navigation overhead without organisational benefit. Keep the hierarchy shallow.

**Over-granular model decomposition:** A model that is 50 lines of SQL does not need to be decomposed into 5 intermediate models for "separation of concerns." The cost of navigating more files is real; decompose only when a piece of logic is genuinely reusable or when a model is so complex that decomposition aids understanding.

Our data engineering consulting practice designs and implements dbt project architectures — contact us to discuss dbt project structure for your environment.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →