Modern data warehouses are organized in layers — raw ingested data, cleaned and standardized data, and business-ready analytical data — each serving a specific purpose in the transformation pipeline. This guide explains the layered architecture, the medallion architecture pattern, and how dbt implements it.
A data warehouse layer is a stage in the data transformation pipeline — a logically or physically distinct set of tables serving a specific purpose in moving raw source data toward analytics-ready output. Layered warehouse architecture separates raw data from transformed data from business-ready data, making pipelines easier to debug, maintain, and trust.
The best-known layering convention is the medallion architecture (popularized by Databricks): Bronze, Silver, and Gold. The names are memorable shorthand for a pattern that exists in various forms across modern data stacks. In dbt-based implementations, the same concept is expressed as staging, intermediate, and mart layers.
The Layers and What They Do
**Layer 1 — Raw / Bronze / Staging:** Exact copies of source system data, loaded with minimal transformation. The goal is fidelity: the raw layer should look as close to the source as possible, preserving all original fields and values. Transformations are limited to: standardizing column names to snake_case, casting obvious data types, adding metadata columns (loaded_at timestamp, source system identifier). No business logic. No filtering of bad data. No joins across sources.
The raw layer is the historical record of what source systems sent. If there is ever a dispute about what the source data contained, or a need to reprocess from scratch with different business logic, the raw layer provides the foundation. Raw layer data is typically append-only — each ingestion run adds new records rather than overwriting prior loads.
**Layer 2 — Cleaned / Silver / Intermediate:** Cleaned and standardized data with quality fixes applied and cross-source joins resolved. This layer:
- Deduplicates records that arrived multiple times from the source
- Standardizes categorical values (Salesforce stores "Mid-Market" and "midmarket" and "MidMkt" for the same segment — the cleaned layer standardizes to one value)
- Resolves entity references across sources (the customer in Salesforce and the account in NetSuite are joined and resolved to a common entity)
- Applies referential integrity enforcement (orphaned records without valid parents are flagged or excluded)
The silver layer is where data quality is enforced. Tests run at this layer catch quality issues before they propagate downstream. Not yet business-ready — no fact/dimension structure, no metric pre-calculation. But clean and consistent.
**Layer 3 — Business-Ready / Gold / Mart:** Dimensional models (star schemas) and pre-calculated aggregates designed for consumption by BI tools, analysts, and downstream applications. This layer:
- Implements fact and dimension tables in the star schema pattern
- Pre-calculates metrics that would be expensive or complex to compute at query time
- Applies business logic: fiscal calendar attribution, revenue recognition rules, customer segmentation
- Produces the exact structures Tableau workbooks connect to
The gold layer is the consumption layer. Business users and BI tools query it; it is maintained to a production quality standard with SLAs for freshness and reliability.
Why Layering Matters
**Debuggability.** When a metric is wrong in a dashboard, the layered architecture makes investigation systematic: is the problem in the raw layer (source sent bad data), the cleaned layer (cleaning step introduced an error), or the mart layer (business logic is wrong)? Without layers, all transformation logic is mixed together and the investigation has no structure.
**Reprocessability.** The raw layer preserves the original source data. If business rules change — a new revenue recognition policy, a revised customer segmentation definition — the mart layer can be rebuilt from the raw layer without re-ingesting from source systems. This is extremely valuable when source systems are slow, rate-limited, or have short change windows.
**Separating concerns.** Data engineers own the raw and cleaned layers; analytics engineers own the mart layer. Splitting responsibility by layer reduces coordination overhead — the analytics engineer does not need to understand EL pipeline mechanics; the data engineer does not need to understand fiscal calendar business logic.
**Testing by layer.** Tests at the raw layer verify ingestion completeness. Tests at the cleaned layer verify data quality assumptions. Tests at the mart layer verify business logic correctness. Each layer's tests are appropriate to that layer's purpose.
dbt Layer Implementation
In dbt, layers are implemented as model directories with naming conventions:
- staging/ (or raw/) — staging models: one per source table
- intermediate/ — intermediate models: cross-source joins and cleaning
- marts/ — mart models: dimension and fact tables, organized by business domain (marts/finance/, marts/sales/, marts/product/)
The dbt project structure makes the layer boundaries explicit and enforceable through ref() dependencies — a mart model cannot directly reference a raw source, only through the staging or intermediate layer.
Our data architecture services practice designs and implements layered data warehouse architectures — raw ingestion, cleaning, and business-ready mart layers — using dbt for the transformation framework. Contact us to discuss your data warehouse architecture.
A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.
Book a Call →