Data Vault is a data warehouse modeling methodology that separates raw historical data storage from business interpretation, enabling auditable, scalable enterprise data warehouses. This guide explains the Data Vault pattern — Hubs, Links, and Satellites — and when it is appropriate.
Data Vault is a data warehouse modeling methodology developed by Dan Linstedt. It structures the enterprise data warehouse into three object types — Hubs, Links, and Satellites — with specific rules governing how business keys, relationships, and descriptive attributes are stored. The result is a data warehouse that can load data from any source system without requiring upfront schema decisions about how to interpret it, enabling the raw historical record to be stored and business rules to be applied later in the presentation layer.
Data Vault is distinguished from dimensional modeling (star schema) by its philosophy of separating raw data from business interpretation. A star schema fact table is already an interpreted view of the data — revenue is calculated, relationships are resolved, grain decisions are made. A Data Vault raw vault stores exactly what source systems provided, with business rules applied in separate layers.
The Three Core Objects
**Hubs** store unique business keys — the identifiers that business processes use to identify core entities. A customer hub stores unique customer IDs; a product hub stores unique product SKUs. A hub contains the business key, a surrogate key, the load date, and the record source. Nothing else — no descriptive attributes, no relationships. The Hub is the claim that this business key exists in the source system.
**Links** store relationships between Hubs. A sales link might relate a customer hub key, a product hub key, a date hub key, and a sales representative hub key — capturing the relationship that this customer purchased this product on this date from this rep. Links contain surrogate keys for each related hub, a link surrogate key, load date, and record source. Links are the audit trail of many-to-many relationships as they existed in source systems.
**Satellites** store descriptive attributes and context. A customer satellite stores customer name, address, email, phone, and demographic attributes alongside the customer hub key, load date, load end date, and record source. Satellites are the slowly changing dimension of Data Vault — each change to an attribute creates a new satellite row with a new load date, preserving the full history of attribute changes. Multiple satellites can feed a single hub — a customer hub might have one satellite from the CRM system (name, phone, email) and another from the billing system (billing address, tax ID, payment terms), loaded independently without coupling the source systems.
The Data Vault Layers
A complete Data Vault implementation typically has three layers:
**Staging:** Raw data loaded from source systems with minimal transformation. Timestamps and record sources added; no business rules applied.
**Raw Vault:** Hubs, Links, and Satellites loaded from staging. The raw vault is the system of record for the enterprise data history. Loads are insert-only — nothing is deleted or updated. Every source record is preserved exactly as received.
**Business Vault:** Derived Hubs, Links, and Satellites that apply business rules. A business vault satellite might calculate derived attributes — a customer segment derived from revenue tiers, or a standardized country code derived from the raw address. Business vault objects extend the raw vault without modifying it.
**Presentation Layer:** Dimensional models — star schemas — built on top of the vault for BI consumption. Tableau and Power BI query the presentation layer, not the vault directly. The presentation layer applies the business interpretation that the raw vault intentionally defers.
When Data Vault Makes Sense
Data Vault adds structural complexity over star schema. It is appropriate when:
**Multiple source systems with inconsistent business keys:** When the same customer is represented in Salesforce, NetSuite, and a legacy CRM with different identifiers, Data Vault's hub model explicitly manages the integration problem — multiple source keys for the same entity are reconciled in the hub through a separate same-as link structure.
**Regulatory requirements for complete audit history:** The insert-only, raw vault preserves every source record exactly as received with load timestamps and record sources. For financial services, healthcare, and regulated industries where proving what data said when is an audit requirement, this complete historical record is the architecture requirement that Data Vault directly satisfies.
**Agile source system changes:** Star schema requires upfront decisions about grain, relationships, and business rules. When source systems are changing frequently or the full scope of data integration requirements is not known upfront, Data Vault's separation of raw storage from business rules allows new sources to be onboarded without redesigning existing structures.
**When it does not make sense:** Data Vault is overengineered for small to mid-size data warehouses with stable, well-understood source systems and no regulatory audit requirements. The three-object model, multiple satellite management, and presentation layer build-out require more engineering investment than a well-designed star schema. Organizations that do not have the specific requirements Data Vault addresses — multiple inconsistent source systems, regulatory audit, frequent source changes — pay the structural cost without the benefits.
Data Vault 2.0 and dbt
Data Vault 2.0 (DV2) extended the original methodology with patterns for big data, NoSQL sources, and agile development. dbt has become the standard transformation tool for implementing Data Vault — dbt's model structure maps naturally to Hubs, Links, and Satellites, and the dbt-vault (now datavault4dbt) package provides base macros for generating Data Vault SQL from configuration.
A dbt-based Data Vault implementation generates Hubs, Links, and Satellites from staging models using macros that enforce the structural rules. Changes to source systems require configuration updates rather than SQL rewrites. The full historical load, incremental load, and business vault layer can be managed as version-controlled dbt models.
Our data architecture services practice designs enterprise data warehouse architectures — including Data Vault implementations for regulated industries and complex multi-source integration requirements. Contact us to discuss your data warehouse requirements.
A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.
Book a Call →