BlogData Architecture

What Is Medallion Architecture? Bronze, Silver, and Gold Layers Explained

James Okafor
James Okafor
Lead Data Engineer
·March 29, 202810 min read

Medallion architecture organizes a data lakehouse into progressive quality layers: bronze (raw ingested data), silver (cleaned and validated), and gold (business-ready aggregations). This guide explains how each layer works, what belongs in each, and when the pattern adds value over simpler alternatives.

Medallion architecture is a data design pattern that organizes data in a lakehouse into three progressive quality layers: bronze (raw), silver (refined), and gold (business-ready). Popularized by Databricks as a pattern for Delta Lake-based lakehouses, it has become a widely adopted organizing principle for lakehouse data pipelines regardless of the underlying technology.

The name comes from the progression from raw material to refined product — each layer represents a step up in data quality, structure, and analytical readiness.

The Three Layers

Bronze — Raw Ingestion

The bronze layer contains data exactly as it arrived from the source systems, with minimal or no transformation. Every event, every record, every file is preserved in its original form. The only additions are ingestion metadata: when the data arrived, which source it came from, and a unique load identifier.

Bronze is the audit layer. If a downstream transformation introduces a bug, you can reprocess from bronze. If a data quality issue is discovered months later, the raw data is available for investigation. If a business rule changes and historical data needs to be recomputed, the raw records exist to reprocess against the new logic.

Bronze should be append-only or write-once. Records are never updated or deleted (except for compliance reasons like GDPR right-to-erasure). This immutability is what makes bronze a reliable audit trail.

Schema enforcement at the bronze layer is light — enough to ensure that the data can be read (the file is valid Parquet, the JSON is well-formed), but not strict typing or business rule validation. The goal is to land all data, even data that has quality problems, so nothing is silently discarded.

Silver — Cleaned and Validated

Silver is the refinement layer. Data from bronze is transformed to be clean, typed correctly, deduplicated, and validated against business rules. Transformations applied at the silver layer include:

- Parsing string fields into appropriate types (converting "2024-01-15" string to a proper date type)

- Deduplicating records that arrived multiple times due to at-least-once delivery semantics

- Applying null handling — null values replaced with defaults, or flagged as invalid

- Standardizing formats — phone numbers, addresses, currency codes, status codes normalized to consistent representations

- Running data quality checks — records that fail quality thresholds are quarantined rather than silently discarded or propagated

- Joining reference data — enriching events with slowly-changing dimension data (e.g., joining device events with device metadata)

Silver tables are the stable, cleaned representation of source data. They are not yet aggregated or business-logic-enriched — that happens in gold — but they are trustworthy enough that a data analyst could query silver directly and get meaningful results.

Gold — Business-Ready

Gold is the serving layer. Data from silver is transformed to reflect business logic, domain-specific aggregations, and the structures optimized for analytical consumption. Gold contains:

- Fact tables and dimension tables structured for BI tool consumption

- Pre-aggregated summary tables that reflect business metrics (daily active users, monthly revenue by region, weekly cohort retention)

- Domain-specific calculations — revenue after refund deductions, customer lifetime value, marketing attribution

- Tables or views that match the shape of specific reports or dashboards

Gold is what data analysts and BI tools primarily query. It reflects how the business defines its metrics, not just what the source systems record. If the finance team defines "net revenue" as gross revenue minus refunds minus failed payment adjustments, that calculation lives in gold.

What Makes Medallion Valuable

**Reprocessing from clean boundaries** — if a bug is discovered in a gold transformation, you reprocess gold from silver. If the bug is in the silver cleaning logic, you reprocess silver from bronze. You do not need to re-extract from the source system. This dramatically reduces the cost of fixing pipeline errors.

**Separation of concerns** — each layer has a clear responsibility. Bronze developers focus on reliable ingestion. Silver developers focus on cleaning and validation. Gold developers focus on business logic. Teams and responsibilities are easier to separate.

**Progressive data quality** — data quality problems are caught at the earliest appropriate layer. Invalid records are quarantined in silver rather than allowed to propagate into gold and corrupt business metrics.

**Auditability** — because bronze preserves raw data, any gold metric can be traced back to source events. This is valuable for regulatory compliance and for debugging metric discrepancies.

Medallion vs Simpler Patterns

For small teams and simple pipelines, medallion may be over-engineered. A two-layer architecture (raw and serving) accomplishes most of the same goals with less operational complexity. The distinction between bronze and silver is less meaningful when source data is high quality and transformations are simple.

Medallion adds most value when:

- Source data quality is variable or unreliable (the bronze-to-silver cleaning step is substantial)

- Multiple gold-layer consumers need the same cleaned representation of source data (silver prevents duplicate cleaning logic)

- Regulatory or compliance requirements demand raw data preservation and audit trails

- Pipeline debugging is expensive, and the ability to reprocess from intermediate layers has clear value

- Data volumes are large enough that re-extracting from source systems is not practical

Medallion Outside of Databricks

The pattern originated with Databricks and Delta Lake, but it applies to any lakehouse or data warehouse environment:

In a Snowflake environment, raw staging tables serve as bronze, intermediate dbt models as silver, and mart models as gold. Many dbt project structures implicitly follow medallion — staging models clean source data, intermediate models apply business logic, mart models serve BI tools.

In a BigQuery environment using Dataform or dbt, the same layer separation applies with BigQuery datasets for each tier.

The technology matters less than the principle: separate raw landing from cleaning from business logic, and preserve raw data for reprocessing.

Our data architecture practice designs lakehouse and data warehouse pipelines using medallion and related patterns — contact us to discuss your data pipeline architecture.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →