What Is a Data Model? Structuring Data for Business and Analytical Use

A data model is the structured representation of data entities, their attributes, and the relationships between them — the blueprint that determines how data is organized in a database, warehouse, or application. This guide explains data modeling types, levels, and why modeling decisions have long-term analytical consequences.

A data model is a structured representation of data — defining what entities exist, what attributes describe each entity, and what relationships connect entities. Data models are the blueprints that determine how databases, data warehouses, and applications store and organize information. The modeling decisions made early in a data system's life have long-term consequences for analytical capability, query performance, and data quality.

The term is used across contexts. A logical data model describes business concepts and their relationships without specifying implementation. A physical data model specifies how those concepts are stored in a specific database technology — table structures, data types, indexes, partitioning. An analytical data model (the dimensional model used in data warehouses) is optimized for aggregation and filtering rather than transactional processing.

Data Modeling Types

**Relational modeling (normalized):** The standard approach for transactional databases. Entities are stored in tables; data is normalized — each fact stored once, relationships managed through foreign keys and joins. A sales order in a normalized model is stored across orders, order_line_items, customers, and products tables. Normalization minimizes redundancy and maintains data integrity under inserts and updates. The cost is join complexity for analytical queries — a report on order revenue by customer segment requires joining five tables.

**Dimensional modeling (denormalized for analytics):** The standard approach for data warehouses. Data is structured in star schemas — a central fact table with denormalized dimension tables. Customer attributes (name, segment, region) are copied into the customer dimension rather than stored in a normalized third-normal-form table. This redundancy is intentional — it enables BI tools to join fact tables to dimension tables in one step rather than traversing a normalized entity hierarchy. Ralph Kimball's dimensional modeling methodology defines this approach.

**Entity-relationship modeling:** A conceptual modeling technique that documents entities, their attributes, and the cardinality of relationships between them. ER diagrams are the design tool before physical implementation — used to communicate and agree on the data model before writing DDL.

**Graph data modeling:** Optimized for highly connected data where the relationships between entities are as important as the entities themselves. Nodes represent entities; edges represent relationships; properties describe both. Social network analysis, recommendation systems, and fraud detection use graph models because the traversal of relationship paths is the primary analytical operation.

**Document modeling (schema-on-read):** NoSQL document databases (MongoDB, Elasticsearch) allow flexible schemas — documents in the same collection can have different fields. The model is defined implicitly by the application code that writes and reads documents rather than enforced by the database. Schema-on-read flexibility comes at the cost of consistency guarantees and analytical query performance.

The Analytical Data Model in Practice

For organizations building data warehouses and BI layers, the analytical data model is the most important modeling concern. Analytical data models are built in the transformation layer — typically using dbt — on top of raw data loaded from source systems.

**Staging models:** Direct representations of source system tables, with field renaming and light cleaning but no business logic. One staging model per source table.

**Intermediate models:** Transformation logic applied to staging models — joining related staging tables, applying business rules, resolving entity references. Not exposed directly to BI tools; used as building blocks for mart models.

**Mart models (dimension and fact tables):** The final dimensional model — customer dimension, product dimension, date dimension, sales fact, support fact. These are what BI tools like Tableau query.

This layered approach keeps transformations testable and auditable. Each layer has documented dependencies, tests for data quality assumptions, and clear ownership.

Why Data Modeling Decisions Persist

Data modeling decisions are expensive to reverse. A dimensional model built at the wrong grain requires rebuilding both the fact table and any downstream models and reports that depend on it. A dimension designed without Type 2 SCD history cannot retroactively produce accurate historical attribution without reprocessing years of source data. A metric definition encoded in individual workbooks rather than a semantic layer becomes inconsistent across dozens of reports before anyone notices.

The investment in deliberate data modeling upfront — correct grain selection, SCD design, conformed dimensions, semantic layer metric definitions — pays compound interest over the life of the data warehouse. The cost of poor early modeling decisions is paid repeatedly as every downstream report and analysis built on the weak foundation requires workarounds.

Our data architecture services practice designs and implements analytical data models — from source system staging through dimensional mart design and semantic layer metric definition — that produce analytics-ready data structures for Tableau and BI tools. Contact us to discuss your data modeling requirements.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →