The schema design of a data warehouse determines how fast analytical queries run, how easy models are to maintain, and how well BI tools can auto-generate SQL. This guide explains the three primary patterns — star schema, snowflake schema, and wide denormalized tables — with the trade-offs that determine when each is appropriate.
The schema design of a data warehouse determines how fast analytical queries run, how easily the models can be maintained, how well BI tools perform automatic SQL generation, and how intuitively business users understand the data structure. Choosing between the major schema patterns — star schema, snowflake schema, and wide denormalized tables — involves real trade-offs that depend on your analytical workload, team structure, and BI tool requirements.
The Relational Model Is Not the Right Starting Point
Before discussing warehouse schema patterns, it is worth stating clearly: the schema design principles for operational (OLTP) databases are not appropriate starting points for analytical (OLAP) warehouses.
Operational databases are normalized — data is split across many tables to minimize redundancy, enforce consistency, and optimize write performance. A customer's address is stored once; orders reference the customer by key rather than repeating the address on every order row. This normalization makes writes efficient and consistent, but it makes analytical queries complex — joining ten tables to answer a simple question.
Data warehouses are deliberately denormalized. The goal is to make reads fast and queries simple, at the cost of some redundancy and storage. Understanding this inversion is the starting point for warehouse schema design.
Star Schema
The star schema is the foundational pattern for dimensional data warehouse design. It organizes data into two types of tables:
**Fact tables** contain measurements — the events and transactions you want to analyze. A sales fact table has one row per order line item, with columns for quantity, unit price, discount, and foreign keys to dimension tables. Fact tables are tall (many rows) and moderately wide.
**Dimension tables** contain the descriptive attributes used to filter, slice, and group facts. A customer dimension has one row per customer with columns for name, region, segment, acquisition channel, and other attributes. Dimension tables are wide (many attribute columns) but relatively short.
The "star" name comes from the visual: the fact table at the center, dimension tables radiating outward like points of a star.
A simple order analytics star schema:
- fact_orders: order_id, customer_key, product_key, date_key, store_key, quantity, unit_price, discount, net_revenue
- dim_customers: customer_key, customer_name, region, segment, acquisition_channel, email
- dim_products: product_key, product_name, category, subcategory, brand, unit_cost
- dim_date: date_key, date, day_of_week, month, quarter, year, fiscal_period, is_holiday
- dim_stores: store_key, store_name, city, state, region, store_format
The query "revenue by customer segment and product category by quarter" joins two dimensions to the fact table — straightforward SQL that BI tools can generate automatically.
**Star schema strengths**: simple query patterns, predictable BI tool behavior, fast query performance due to fewer joins, easy to understand for business users.
**Star schema weaknesses**: dimension tables may contain redundant data (region in both dim_customers and dim_stores), updates to dimension attributes require careful handling (SCD management), and dimension tables can become very wide if many attributes are needed.
Snowflake Schema
The snowflake schema extends the star schema by normalizing dimension tables — splitting them into sub-dimensions to eliminate redundancy.
In a star schema, the customer dimension includes a region attribute. In a snowflake schema, region is extracted to its own dim_regions table, and dim_customers has a region_key foreign key rather than a region name. Similarly, if product has a brand hierarchy (product → subcategory → category → brand), each level might be its own table.
The result looks like a snowflake: multiple levels of dimension tables radiating from the fact table.
**Snowflake schema strengths**: eliminates redundancy in dimension tables, smaller dimension table storage, easier updates to hierarchical attributes (update region name once in dim_regions, not in every customer row).
**Snowflake schema weaknesses**: more joins required for every analytical query (joining through multiple dimension levels), more complex SQL, BI tools generate more complex SQL that can be slower, harder for business users to understand.
**When to use snowflake vs star**: the snowflake schema's benefits are primarily storage-related — relevant when dimension tables are very large or when hierarchical attributes change frequently. For most modern data warehouses with abundant storage, the simplicity of star schema outweighs the storage efficiency of snowflake. The dominant recommendation in modern analytical engineering is star schema. Snowflake schema is appropriate when dimension table size is a genuine constraint or when a strict dimensional modeling mandate requires it.
Wide Denormalized Tables (One Big Table)
At the opposite extreme from the normalized snowflake schema is the fully denormalized wide table — sometimes called "one big table" (OBT) or a wide flat table.
In this pattern, instead of maintaining separate fact and dimension tables, you join them all into a single wide table with every attribute needed for analytics. The "fact_orders" table becomes a 50-column table that includes not just order metrics but customer region, product category, date fields, store region, and every other attribute that might be filtered or grouped on.
Query patterns are maximally simple: no joins required, every column is directly available.
**Wide table strengths**: maximum query simplicity, minimal SQL for BI tools to generate, easiest for non-technical users to understand, fast queries because no joins.
**Wide table weaknesses**: significant redundancy (customer region repeated on every order row, not just in one customer row), large storage footprint, updates to attributes (if a customer changes region and you are maintaining current-state denormalization) require recomputing the wide table, and the table can become unwieldy as more attributes are added.
**When wide tables are appropriate**: for final serving layers consumed directly by BI tools, especially when BI users are not SQL-fluent and simplicity is paramount. Looker's LookML and Power BI's semantic layer partially address the join complexity of star schema, but wide tables eliminate it entirely. Many modern data teams use star schema in the warehouse transformation layer (dbt models) and optionally materialize wide serving tables from those models for specific BI consumers.
Schema Design in dbt
In a dbt-based analytics engineering workflow, schema design is expressed through model organization:
- **Staging models** (bronze/silver) clean and standardize source data — one model per source table, minimal business logic
- **Intermediate models** apply business logic and join sources — these represent the transformation from operational to analytical structure
- **Mart models** (gold) implement the final star schema or wide table structure for consumption — one model per business domain or report area
The dbt project structure forces schema decisions to be explicit and version-controlled. Changing from a normalized structure to star schema means rewriting mart models — a deliberate change with a clear paper trail.
Conformed Dimensions
Conformed dimensions are dimension tables shared across multiple fact tables. A dim_customers table used by both fact_orders and fact_support_tickets ensures that "customer segment" means the same thing in sales reports and support reports.
Conformed dimensions are a core principle of Ralph Kimball's dimensional modeling approach. Without them, different fact tables develop their own incompatible definitions of shared entities, making cross-domain analysis impossible or unreliable.
In a dbt project, conformed dimensions are models that are referenced by multiple mart models — a natural enforcement of the principle, since changing the dimension affects all consumers.
Our data architecture practice designs analytical data warehouse schemas from scratch and refactors existing models — contact us to discuss your data modeling requirements.
A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.
Book a Call →