Dimensional modeling is the data warehouse design methodology developed by Ralph Kimball — organising data into fact tables (measurements) and dimension tables (context) in patterns optimised for analytical queries. Despite being decades old, dimensional modeling remains the most effective framework for designing data warehouse schemas that business users can query intuitively and BI tools can query efficiently.
Dimensional modeling is a data warehouse design methodology developed by Ralph Kimball in the 1990s. Its central premise is that business intelligence databases should be designed around the analytical questions businesses ask, not around the operational processes that generate data. The result is a schema design that business users can navigate intuitively — "show me revenue by product and region for last quarter" — and that BI tools can query with efficient, simple SQL.
The Core Concepts
Dimensional modeling organises data into two types of tables:
**Fact tables** — store measurements (numeric, additive values that can be summed, averaged, or counted): revenue, quantity sold, cost, sessions, clicks. Each row in a fact table represents a measurement event at a specific granularity (one row per order line, one row per session, one row per transaction). Fact tables are the centre of analysis — they contain the numbers business users want to aggregate.
**Dimension tables** — store descriptive context for measurements: customer attributes (name, segment, region), product attributes (category, brand, SKU), date attributes (month, quarter, fiscal period, holiday flag). Dimension tables give meaning to fact table measurements — they answer "who, what, where, when, and why" questions about each measurement.
The relationship between fact and dimension tables: fact tables reference dimension tables via foreign keys. An order fact row references a customer dimension row, a product dimension row, a date dimension row, and a geography dimension row. Joining the fact table to dimension tables adds the descriptive context needed for grouped analysis.
Why Not Third Normal Form (3NF)
Operational databases are typically designed in third normal form — normalised to eliminate data redundancy and protect data integrity during transactional updates. 3NF is excellent for OLTP (online transaction processing) but poor for OLAP (online analytical processing):
**Query complexity** — answering "revenue by product category" in a 3NF schema might require joining 6-8 tables. In a dimensional model, the same query is a 2-table join: fact table to product dimension.
**Query performance** — fewer joins mean faster queries. Dimensional models are typically 1-3 table joins for most BI queries; 3NF schemas require more joins for equivalent queries.
**Understandability** — business users and BI developers understand "fact" and "dimension" readily. Navigating a 3NF schema requires understanding the operational data model, which has a steep learning curve for non-engineers.
The Four Steps of Dimensional Modeling
Kimball's recommended process for designing a dimensional model:
**Step 1: Select the business process** — choose the operational process you are modelling. Common business processes: order management, website sessions, customer support tickets, financial transactions. Each business process produces its own fact table and related dimensions.
**Step 2: Declare the grain** — the grain is the level of detail represented by each row in the fact table. "One row per order line item" is a grain declaration. "One row per daily summary by product and region" is a different grain. The grain must be declared before anything else; it determines what questions the fact table can answer.
Grains should be atomic (the finest granularity available from the source) whenever possible. Atomic grains support any level of aggregation; pre-aggregated grains cannot be drilled below their level of aggregation.
**Step 3: Identify the dimensions** — identify all of the context dimensions that describe each fact row at the declared grain. For an order line fact (one row per order line): customer, product, date, geography, sales channel, promotion. Each dimension becomes a dimension table.
**Step 4: Identify the facts** — identify all of the measurements associated with each fact row. For an order line: unit price, quantity, discount amount, extended price, cost, margin. Measurements should be additive (can be summed across any dimension) when possible; non-additive and semi-additive facts require careful handling.
Additive, Semi-Additive, and Non-Additive Facts
**Additive facts** — can be summed across all dimensions. Revenue is additive: sum revenue by customer, by product, by date, by region — the sums are all meaningful.
**Semi-additive facts** — can be summed across some dimensions but not all. Account balance is semi-additive: you can sum balances across accounts to get a total, but summing balance across dates gives a meaningless number (double-counting). Semi-additive facts require special handling in BI calculations.
**Non-additive facts** — cannot be meaningfully summed across any dimension. Percentages, ratios, and unit prices are non-additive. To calculate margin percentage at a rolled-up level, sum the underlying margin dollars and revenue dollars first, then compute the percentage.
Conformed Dimensions
A conformed dimension is a dimension table shared across multiple fact tables. A customer dimension conformed across order facts, support ticket facts, and marketing interaction facts means all three fact tables use the same customer keys and customer attributes.
Conformed dimensions enable cross-process analysis: you can join the order fact table to the support ticket fact table through the shared customer dimension and analyse the relationship between order history and support volume.
Conformed dimensions are the foundation of Kimball's data warehouse bus architecture — the enterprise data warehouse is built by assembling business process fact tables connected through shared, conformed dimensions.
Date Dimension
Every dimensional model includes a date dimension — a table with one row per calendar date, with columns for every date attribute analysts might need: day of week, month, quarter, year, fiscal period, holiday flag, day name, month name.
The date dimension eliminates date arithmetic from queries. "Revenue in Q3 FY2024" becomes a simple WHERE quarter = 'Q3' AND fiscal_year = 2024 rather than a complex date range calculation. Pre-computing all date attributes in the dimension table makes date-based analysis simpler and more accurate (especially for fiscal calendars and non-standard date logic).
A date dimension with daily granularity covering 20 years contains 7,300 rows — trivially small, loaded once, never updated.
Slowly Changing Dimensions (SCD)
Dimension attributes change over time: a customer changes their segment, a product changes its category, an account manager reassignment occurs. Slowly changing dimensions (SCD) describe strategies for handling these changes:
**Type 1** — overwrite the old value with the new value. No history retained. Use when the old value is never needed for historical analysis (correcting data errors, updating contact information that is not analytically significant).
**Type 2** — add a new row with a new surrogate key for the new attribute values. The old row is retained with a valid-to date. Use when historical analysis must reflect the attribute value at the time of each fact — what segment was the customer in when they placed this order?
**Type 3** — add a new column for the new value, retaining the previous value in an old-value column. Limited to tracking one previous version. Use rarely; Type 2 is almost always preferable.
Our data architecture practice designs dimensional models for enterprise analytics teams — contact us to discuss data warehouse design for your analytics programme.
A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.
Book a Call →