BlogData Architecture

Data Warehouse Modeling: Star Schema, Snowflake Schema, and One Big Table

Austin Duncan
Austin Duncan
Managing Director & Principal Data Architect
·June 8, 202712 min read

The debate about data warehouse modeling — star schema versus snowflake schema versus wide denormalised tables — has been running for thirty years. The right answer depends on who is querying, with what tools, and against what data volumes. This guide explains the trade-offs clearly.

Data warehouse modeling is the design of how fact data and dimension data are stored and related within an analytical database. The choice of modeling approach shapes query performance, BI tool compatibility, maintenance burden, and the ease with which analysts can explore data independently. The debate between approaches has been running for three decades — and the right answer has changed as BI tools and analytical databases have evolved.

Star Schema: The Classic Approach

Ralph Kimball's dimensional modeling methodology — the source of the star schema — was designed in the era of relational databases and proprietary BI tools that required optimised data structures to deliver acceptable performance. The key insight: analytical queries typically aggregate fact data (revenue, quantity, events) across dimensional attributes (time, customer, product, geography).

A star schema separates these into:

**Fact tables**: Contain the quantitative measures (revenue, quantity, events) at a defined grain. Each row represents one transaction, one event, or one period. Fact tables have foreign keys to dimension tables but no descriptive attributes.

**Dimension tables**: Contain the descriptive attributes used for filtering, grouping, and labelling (customer name, product category, date attributes, geography hierarchy). Dimension tables are denormalised — a customer dimension contains region, country, segment, and lifecycle stage in the same table, not in separate tables joined together.

The "star" shape comes from one or more fact tables at the centre with dimension tables radiating outward. Queries join the fact table to the relevant dimensions.

Star schema advantages:

- Queries are simple: one fact table, join to a few dimensions. Most BI tools generate this pattern automatically from a data model.

- Query performance: joins between a large fact table and smaller dimension tables are well-optimised by both row-oriented and columnar databases.

- BI tool compatibility: Tableau, Power BI, and Looker all generate correct aggregations from star schemas without complex configuration.

- Analyst accessibility: the schema is understandable without data engineering expertise.

Star schema limitations:

- Dimension management: slowly changing dimensions (customer segment changes, product category reassignments) require a defined strategy. Kimball's Type 1 (overwrite), Type 2 (new row with effective dates), and Type 3 (add previous-value column) approaches each have appropriate use cases but add complexity.

- Wide dimensions: a customer dimension with 200 attributes is a maintenance challenge.

- Conformed dimensions: in a multi-mart environment, dimensions used by multiple fact tables (date, customer, geography) must be consistent. Maintaining conformed dimensions across teams is governance work.

Snowflake Schema

A snowflake schema normalises the dimension tables — splitting repeated or hierarchical dimension attributes into separate tables. A product dimension in a star schema might contain category, subcategory, brand, and manufacturer in one table. In a snowflake schema, category and subcategory might be separate tables joined to the product dimension.

Snowflake schemas were popular in the era when storage was expensive and disk space justified the complexity of additional joins. The join overhead versus a denormalised dimension table was acceptable on row-oriented databases when the alternative was storing millions of redundant strings.

In the current environment — columnar cloud warehouses where storage is cheap and join performance is strong — snowflake schemas provide limited benefit for analytical workloads. The normalisation that reduces storage redundancy also increases query complexity (more joins), reduces BI tool compatibility (some tools struggle with deeply normalised schemas), and makes the schema harder for non-engineers to understand.

Snowflake schemas remain appropriate in specific cases: when dimensions change frequently and normalisation reduces update overhead; when source system tables are imported directly rather than modeled; and in certain operational contexts where update performance matters more than query simplicity.

For most analytical purposes, a well-designed star schema with denormalised dimensions outperforms snowflake schema on both query performance and analyst accessibility.

One Big Table (OBT)

The "One Big Table" approach denormalises everything into a single wide table: fact measures and all dimension attributes in the same table. No joins required; every query is a SELECT with filters and aggregates on a single table.

OBT was considered an anti-pattern in the Kimball era — redundant storage of dimensional attributes multiplied by fact row count was prohibitively expensive. In columnar databases with compression, the storage overhead is substantially reduced. Columnar reads skip columns that are not queried, so a 300-column wide table is queried almost as efficiently as a narrow table for queries that use only a few columns.

OBT advantages in columnar databases:

- Zero join complexity: analysts can query without understanding table relationships

- BI tool compatibility: any BI tool can generate correct aggregations from a single table

- Query simplicity: simpler SQL means fewer opportunities for incorrect aggregation (duplicate rows from bad joins are a common error source in star schemas)

OBT limitations:

- Dimension update complexity: changing a customer attribute requires updating every fact row associated with that customer, not just one dimension row

- Multiple grains: mixing measures at different grains (order-level and line-level) in the same table produces incorrect aggregations if the distinction is not carefully managed

- Schema flexibility: adding a new dimension attribute requires an ALTER TABLE that affects the entire table

- Debugging: when something is wrong, there is no structured dimension table to inspect independently

The dbt ecosystem has shifted practice toward wide mart tables in the final presentation layer — a practical compromise. Staging and intermediate models preserve source fidelity and enable modular transformation; mart models denormalise for the BI layer. This is effectively OBT for the consumption layer with structured layers upstream.

Modern Practice: Layered Modeling with Mart Tables

Current data engineering best practice (as expressed in dbt conventions and the modern data stack) uses a three-layer approach:

**Staging**: One-to-one with source tables. Minimal transformation — renaming columns, casting types, standardising nulls. Exposes source data in a clean, consistent form.

**Intermediate**: Multi-source joins, business logic, grain definitions. Produces reusable building blocks — orders enriched with customer attributes, sessions enriched with user properties. Intermediate models are not exposed to BI consumers directly.

**Mart**: Final presentation layer. Wide, denormalised tables designed for specific analytical domains (finance mart, marketing mart, operations mart). BI tools connect to mart tables. The mart is effectively a star schema or OBT depending on the domain — whatever produces the cleanest BI tool experience for that domain's consumers.

This layered approach separates concerns cleanly: source fidelity in staging, business logic in intermediate, analytical optimisation in marts. Changes to source systems are absorbed in staging without propagating downstream. Business logic lives in intermediate models where it can be tested. Mart design can be optimised for consumption without affecting upstream layers.

Choosing for Your Context

For a new analytical environment built on a modern columnar warehouse (Snowflake, BigQuery, Redshift, Databricks):

- Use dimensional modeling principles (explicit grain, separate fact and dimension concerns) as the design foundation

- Implement mart tables as wide, denormalised tables rather than strictly normalised star schemas unless the BI tool specifically requires normalised schemas

- Use dbt for transformation with staging/intermediate/mart layers

- Avoid snowflake schema for analytical workloads — the normalisation benefits do not justify the join complexity at current storage costs

For a legacy environment on a row-oriented database serving a traditional BI tool that generates SQL natively from a semantic layer: the star schema remains appropriate. The BI tool's SQL generation is designed around this pattern.

Our data architecture practice designs warehouse modeling strategies for each stage of analytical maturity — contact us to discuss the right modeling approach for your environment.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →