The fact table is the core measurement table in a data warehouse star schema — storing the quantitative events of the business alongside dimension keys that describe the context of each event. This guide explains fact table design, grain selection, measure types, and common patterns.
A fact table is the central measurement table in a data warehouse star schema. It stores the quantitative events of the business — orders placed, transactions processed, calls handled, pages viewed, products shipped — as rows, with each row representing one instance of the measured event at a defined granularity. Surrounding dimension tables provide the descriptive context for those measurements.
The relationship between fact tables and dimension tables defines the analytical query model. When Tableau asks "what was total revenue by product category and customer region for Q3?" the query joins the sales fact table to the product dimension (for category) and customer dimension (for region), filters by date dimension (for Q3), and aggregates the revenue measure. Every BI query against a dimensional model is some variation of this pattern.
Grain: The Foundation of Fact Table Design
The grain of a fact table is the precise definition of what one row represents. Every design decision that follows — which measures to include, which dimension foreign keys to include, how to handle multi-valued dimensions — depends on grain.
Grain options for a sales fact table:
- **Order line item:** One row per product on an order. Supports analysis of individual products per order, discounts per line, per-item profitability.
- **Order:** One row per order. Supports order-level analysis — number of orders, average order value, orders per customer — but cannot answer per-product-per-order questions.
- **Customer-day:** One row per customer per day, pre-aggregated. Supports daily customer behavior analysis efficiently but cannot answer individual order-level questions.
Choosing the lowest grain that satisfies the analytical requirements preserves the most analytical flexibility. Aggregating rows to a higher grain in the fact table is irreversible at query time — once the line-item detail is aggregated into an order-level fact, individual line-item analysis is unavailable without returning to the source. Most well-designed fact tables choose the grain at the natural business event level (the order line, the transaction, the session event) and let BI tools aggregate upward as needed.
Fact Table Column Types
**Dimension foreign keys:** Integer surrogate keys joining to each related dimension. Every fact event has a customer_key, product_key, date_key, location_key, and any other dimension relevant to the business event. These are the join columns that allow the fact to be analyzed across any combination of dimensional attributes.
**Degenerate dimensions:** Dimension attributes that have no separate dimension table because they are unique to each fact row. Order number, invoice number, transaction ID, session ID — these are identifiers that provide drill-through capability (linking to source system records) but do not support aggregated analysis across shared attribute values. They live as columns in the fact table rather than in a separate dimension.
**Additive measures:** Measures that can be summed across any dimension. Revenue, quantity, cost, shipping weight — sum these across any combination of customer, product, date, and location and the result is meaningful. Additive measures are the most analytically flexible.
**Semi-additive measures:** Measures that can be summed across some dimensions but not others. Account balance at a point in time is additive across accounts (total deposits across all accounts on a date) but not across dates (the sum of daily balances does not represent average or period-end balance). Semi-additive measures require specific aggregation logic (last or average over time dimensions).
**Non-additive measures:** Measures that cannot be summed at all. Ratios, percentages, and unit prices are non-additive — summing the price-per-unit across products does not produce a meaningful number. Non-additive measures are typically computed as derived calculations from additive components (revenue divided by quantity yields average price) rather than stored as pre-computed values.
Fact Table Types
**Transaction fact tables** store individual business events as they occur — one row per event, at the grain of the atomic transaction. This is the most common type. Sales transactions, clickstream events, support interactions, payment events are all naturally transactional. Transaction facts grow continuously as events occur.
**Periodic snapshot fact tables** capture the state of a process at regular time intervals, regardless of whether an event occurred. A daily account balance snapshot stores one row per account per day even on days with no transactions. Periodic snapshots enable period-over-period analysis of process state without reconstructing current state from transaction history.
**Accumulating snapshot fact tables** track the progress of a process with a defined lifecycle through its stages. An order fulfillment fact might have one row per order, with date keys for each stage (order_placed_date_key, order_picked_date_key, order_shipped_date_key, order_delivered_date_key) and lag measures between stages. As the order moves through stages, the row is updated. Accumulating snapshots are appropriate for processes where stakeholders need to see where in the pipeline each unit currently sits and how long it has spent in each stage.
Fact Table Performance
Fact tables are the largest tables in a dimensional model — typically orders of magnitude larger than their associated dimensions. For cloud data warehouses, query performance on fact tables depends primarily on:
**Partitioning:** Most fact tables are filtered by date on every query. Partitioning the fact table by the date key means that a query filtering to a 90-day period reads only 90 partitions rather than the full table. On BigQuery and Snowflake, partition pruning can reduce bytes scanned by 95% for date-filtered queries.
**Clustering/sort keys:** After partitioning by date, clustering on the most common filter dimensions (customer segment, product category, region) enables micro-partition pruning within date partitions.
**Aggregating when appropriate:** If the reporting layer never requires row-level fact table detail, pre-aggregating to the reporting grain in a summary table eliminates redundant per-query aggregation for high-traffic dashboards.
Our data architecture services designs dimensional models — fact table grain selection, measure type classification, and partitioning strategy — that produce analytics-ready warehouse layers for Tableau and other BI tools. Contact us to discuss your data warehouse requirements.
A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.
Book a Call →