Dimensional Modeling: Facts, Dimensions, and Star Schema Design

The complete guide to dimensional modeling — star schemas, snowflake schemas, slowly changing dimensions, and the design decisions that determine whether your data warehouse delivers fast, intuitive analytics.

Dimensional modeling is the design methodology that determines whether your data warehouse delivers fast, intuitive analytics — or slow, confusing queries that analysts avoid. First articulated by Ralph Kimball in the 1990s, dimensional modeling has remained the dominant approach for analytical data design for thirty years because it solves the right problem: making complex business data fast to query and easy to understand.

Most data warehousing failures are not technology failures. They are design failures. Teams that choose the wrong database schema, the wrong granularity, the wrong handling of historical change — these teams build warehouses that work technically but fail analytically. Understanding dimensional modeling is not optional for anyone building analytical systems seriously.

The Core Structure: Facts and Dimensions

Every dimensional model is built from two types of tables:

**Fact tables** store the measurements of business events — orders, transactions, page views, sensor readings, support tickets. They are tall and narrow: many rows, relatively few columns. The columns in a fact table are either foreign keys to dimension tables or numeric measures. A fact table should have one row per event at the grain you care about.

**Dimension tables** store the context for those events — who, what, where, when, why. They are short and wide: fewer rows, many descriptive columns. A customer dimension might have columns for name, email, acquisition channel, geographic region, tier, segment, lifetime value band. Dimension tables are what make fact tables human-readable.

The **star schema** arranges these tables so that the fact table sits at the centre, with dimension tables radiating outward like points on a star. A query joining the fact table to two or three dimensions produces a result set that is both fast to compute (optimised for columnar storage and hash joins) and easy to understand.

Choosing the Right Grain

The grain of a fact table is the single most important decision in dimensional modeling. The grain defines what one row in the fact table represents.

If you are modeling orders, the grain might be: one row per order line item. Not one row per order (too coarse — you lose line-item detail). Not one row per product-order-customer combination (too fine — you create confusing duplication). One row per order line item is the natural grain for e-commerce order data.

Getting the grain wrong creates cascading problems:

- **Too coarse**: you cannot answer questions the business needs to answer because the detail has been aggregated away

- **Too fine**: queries that should be simple require complex deduplication logic; aggregates produce incorrect results because rows represent different things

The rule is: choose the lowest grain at which you can record a measurement, then decide whether you need higher-grain rollup tables as separate physical objects. Do not mix grains in a single fact table.

Fact Table Types

Dimensional modeling recognizes several fact table types for different measurement patterns:

**Transaction fact tables** record individual events at the moment they occur. One row per event. The most common type. Orders, payments, clicks, sensor readings.

**Periodic snapshot fact tables** record the state of something at regular intervals — daily account balances, weekly inventory levels, monthly subscription metrics. One row per entity per period. Useful for tracking things that change continuously where you care about the state at defined points.

**Accumulating snapshot fact tables** track the lifecycle of a process that has a defined beginning and end — an order that progresses through statuses, a loan application that moves through approval stages. One row per process instance, updated as the process advances. Date dimension foreign keys populate as each milestone is reached.

**Factless fact tables** have no numeric measures — they record that something happened without measuring it. Enrollment tables (student X enrolled in course Y), coverage tables (product X is sold in region Y). Less common but important for certain analytical patterns.

Dimension Design

Dimension tables require careful design to be analytically useful:

**Keep descriptive attributes in dimensions, not facts.** The customer tier should be in the customer dimension, not the order fact. The product category should be in the product dimension, not the sales fact. This allows filtering and grouping without joins back to lookup tables.

**Use surrogate keys.** Dimension tables should have a surrogate primary key — a system-generated integer — rather than relying on the natural key from the source system. Natural keys change, get reused, and carry business-system constraints. Surrogate keys are stable, controlled, and support slowly changing dimension patterns.

**Use meaningful, verbose column names.** Dimension columns are what users see in their BI tools. product_category_description is better than cat_desc. customer_acquisition_channel_name is better than acq_chnl. Dimension columns should be self-explanatory to a business user who has never seen the data model.

**Denormalize aggressively.** A common mistake is normalizing dimension tables — putting the product category in a separate category table, the customer region in a separate region table, creating a snowflake schema. Resist this. The joins are not free; they make queries more complex without meaningful storage savings in a modern columnar warehouse. Flatten the hierarchy into the dimension. Product category, subcategory, and department should all be columns in the product dimension.

The Date Dimension

Every dimensional model needs a date dimension. The date dimension is a special case: it has no natural fact table relationship that generates its rows — you populate it from a calendar. It should contain every date your data might reference, with columns for every temporal attribute you might filter or group by:

- Calendar attributes: day of week, week number, month name, quarter, year

- Business calendar attributes: is_business_day, fiscal quarter, fiscal year

- Relative attributes: days until end of quarter, days until end of fiscal year

- Holiday flags: is_holiday, holiday_name

The date dimension allows users to filter by "Q3 FY2025" or "weekdays only" or "months after the product launch" without writing complex date arithmetic. Pre-computing these attributes in the dimension is far more efficient than computing them at query time.

Slowly Changing Dimensions

Business reality changes. Customers move to new cities. Products change categories. Employees transfer to new departments. How you handle these changes determines whether your historical analysis is accurate.

**Type 1: Overwrite.** Simply update the dimension record. No history preserved. The customer's city in the dimension is whatever it is today, even if you want to analyze orders from when they lived somewhere else. Appropriate when history does not matter — product descriptions, contact email addresses.

**Type 2: Add a new row.** When the dimension value changes, expire the old row and add a new one. The old row retains the old attribute values with a valid-from and valid-to date range. Fact table rows pointing to the old surrogate key will always resolve to the correct historical attribute, even after the dimension changes. This is the standard approach for attributes where historical accuracy matters — customer geography, product pricing tier, employee department.

**Type 3: Add a new column.** Add a "previous value" column alongside the "current value" column. Allows tracking one prior state but no further history. Rarely the right choice; types 1 and 2 cover most cases.

**Type 4: History table.** Separate current and historical dimension records into different tables. Occasionally useful for very large dimensions with high change rates.

Most enterprise dimensional models use a combination of Type 1 (for attributes where history is irrelevant) and Type 2 (for attributes where historical accuracy is required). Identifying which attributes need Type 2 treatment is a critical design conversation with the business users.

Conformed Dimensions

In a data warehouse that serves multiple subject areas — sales, marketing, finance, supply chain — dimensions are used across multiple fact tables. The customer dimension is used in the sales fact and the support ticket fact and the contract renewal fact.

**Conformed dimensions** are dimensions that have the same definition across all fact tables that use them. The customer dimension is defined once and used everywhere. This means "customers" means the same thing in the sales dashboard and the customer success dashboard — the same keys, the same attributes, the same grain.

Conformed dimensions are what allow the sales VP and the customer success VP to have a conversation about the same customer without a reconciliation exercise. Without conformed dimensions, each subject area builds its own customer table, and they inevitably diverge.

Bridge Tables for Many-to-Many Relationships

Some relationships between facts and dimensions are many-to-many. An order might have multiple promotions applied. A patient might have multiple diagnoses. A transaction might be attributed to multiple campaigns.

The standard approach is a **bridge table** that sits between the fact table and the dimension. The bridge table has one row per combination of fact and dimension member. Queries against the bridge table require careful handling of double-counting — when you sum the fact measure through the bridge, you must weight it appropriately.

Bridge tables are more complex than standard dimension joins. Use them only where the many-to-many relationship is real and where the single-attribution alternative would mislead users. In many cases, picking a primary attribution (the first promotion, the primary diagnosis) and handling multi-attribution as a separate analysis is simpler and more reliable.

Dimensional Modeling in Modern Cloud Warehouses

Snowflake, BigQuery, Redshift, and Databricks Lakehouse have changed some of the physical implementation considerations but not the logical design principles. Dimensional modeling on a modern columnar warehouse looks different from dimensional modeling on a traditional RDBMS:

- **Storage is cheap; joins are fast.** Denormalization is even more appropriate.

- **Materialized views and clustering replace traditional indexes.** Partition on date, cluster on the highest-cardinality filter column.

- **Separation of compute and storage removes the need for aggregate tables** in most cases — query the base grain at full speed.

- **dbt enables dimensional model implementation as code** — version-controlled, tested, documented. See our guide on dbt project structure for how to organize dimensional models in dbt.

The logical design — facts, dimensions, grain decisions, SCD handling, conformed dimensions — is unchanged. The physical implementation is simpler and more maintainable.

Common Mistakes

**Combining multiple grains in one fact table.** One row might represent an order line item; another might represent an order-level discount. Mixing grains causes incorrect aggregation and confuses every user who touches the model.

**Using transaction tables as dimension tables.** The customer table from the source OLTP system is not a dimension. It lacks surrogate keys, has no SCD handling, may not contain the descriptive attributes analysts need, and will blow up your query performance when joined to a billion-row fact table.

**Modeling for the source system, not the analytics use case.** The dimensional model should reflect how the business thinks about its data, not how the source system stores it. If the business thinks about customers by segment and the source system stores segment in a separate table with a cryptic code, denormalize the segment description into the customer dimension.

**Premature aggregation.** Storing pre-aggregated daily totals instead of transaction-level data seems like an optimization; it is actually a loss of flexibility. Modern columnar warehouses can aggregate a billion rows fast. Build at the lowest grain; aggregate at query time or via materialized views when specific aggregations are proven to be hot.

Our data architecture consulting practice designs dimensional models for enterprises migrating to modern cloud warehouses — contact us to discuss your data warehouse architecture.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →