An enterprise data warehouse that serves hundreds of users across dozens of business domains has different design requirements than a departmental analytics database. The decisions that determine long-term success — layered architecture, zone design, governance integration, multi-domain modelling — are architectural, not just technical.
An enterprise data warehouse that serves hundreds of analytical users, dozens of departments, and regulatory requirements across multiple business domains is architecturally different from a departmental analytics database. The decisions that matter — zone design, layered architecture, governance integration, multi-domain modelling — are made early and are expensive to reverse. Getting them right is worth more than any specific technical choice about which compute engine to use.
The Layered Architecture
Production enterprise data warehouses use a layered architecture: raw, staging, integration, and presentation layers, each with a distinct purpose and distinct ownership.
**Raw layer (landing zone)**: data arrives from source systems exactly as it was received, with no transformation. The raw layer is append-only — records are never updated or deleted, they are versioned by load timestamp. If a source system sends corrected data, the correction arrives as a new record alongside the original. The raw layer is the forensic record: when something goes wrong downstream, you can always trace back to what was actually received and when.
The raw layer is typically not directly accessible to analytical consumers. Its purpose is operational, not analytical — it is the immutable source of truth that allows the downstream layers to be rebuilt.
**Staging layer**: source data is cleaned and standardised for technical consistency — data types normalised, encodings unified, obviously malformed records quarantined. The staging layer still reflects source system structure: one table per source system entity. Business logic is not applied here. A staging layer table for CRM contacts looks like the CRM contact table, not the enterprise concept of "customer."
**Integration layer**: source data is integrated across systems and modelled around business concepts. Multiple CRM contacts, billing system accounts, and support system users are reconciled into a single enterprise customer entity. Business rules are applied: what defines an "active" customer, how is revenue recognised, what is the canonical customer key. This is where the dimensional model (star schema, fact and dimension tables) lives.
The integration layer is the most expensive to build and the most valuable. It is also where the political difficulty of enterprise data warehouses lives: agreeing on a single definition of "customer" across Sales, Finance, and Product requires cross-functional governance that is harder than the technical integration work.
**Presentation layer**: views, materialised views, or purpose-built data marts that optimise for specific consumption patterns. An executive dashboard may need pre-aggregated totals that would be slow to compute on demand from the integration layer. A regulatory report may need a denormalised view that joins tables the reporter should not need to understand structurally. The presentation layer adapts the integration layer for consumer needs without modifying the integration layer itself.
Zone Design: Who Accesses What
Enterprise data warehouses define data zones — access-controlled areas with different consumer permissions.
**Raw zone**: accessible only to the data engineering platform team. No analytical consumers. Contents are too sensitive (raw PII, unvalidated data) and too technical (source system artefacts, not business concepts) for general access.
**Staging zone**: accessible to data engineers and senior analytics engineers responsible for building the integration layer. Not accessible to analytical consumers.
**Integration zone**: accessible to analytics engineers, data analysts, and BI developers who build analytical models and dashboards. The integration layer's dimensional models are the primary consumer-facing assets for internal data teams.
**Presentation zone**: accessible to all analytical consumers including business users and self-serve analytics users. Tables and views in the presentation zone are documented, governed, and guaranteed to meet quality standards. They are the "certified" content — the enterprise's official analytical sources.
**Sandbox zone**: accessible to individual users and teams for exploratory analysis. No quality guarantees, no documentation requirements, automatic cleanup after 30 or 90 days. The sandbox exists to give analytical users space to experiment without polluting the governed zones.
Zone design is enforced through database-level access control — role-based permissions that restrict which schemas (zones) each role can read from. The governance team owns the role definitions; the platform team enforces them.
Multi-Domain Modelling
An enterprise warehouse serves multiple business domains: Sales, Finance, Operations, HR, Product, Marketing. Each domain has its own data model requirements, its own definitions, and its own reporting cadence.
The integration layer must navigate a tension: consistency (all domains using the same definitions of shared concepts like customer and product) and domain specificity (each domain's unique entities and business logic).
The resolution is conformed dimensions: shared dimension tables (dim_customer, dim_product, dim_date, dim_employee) that all domain fact tables reference using the same keys and the same attribute definitions. Conformed dimensions enable cross-domain analysis. If fact_sales and fact_support_tickets both reference dim_customer with the same customer keys, you can analyse the relationship between sales volume and support load per customer.
Domain-specific dimensions (dim_sales_territory, dim_product_lifecycle_stage) are owned by the relevant domain and are not required to be shared. They are documented and available, but other domains adopt them by agreement rather than by mandate.
The governance challenge is maintaining conformed dimension consistency as source systems evolve. A customer ID scheme change in the CRM system requires coordinated updates across every fact table that references the customer dimension. This is where the architectural investment in clear documentation and automated lineage pays off.
Governance Integration
An enterprise data warehouse without integrated governance is a liability. Uncontrolled data proliferation, unclear ownership, inconsistent definitions, and undocumented lineage create compliance risk and analytical unreliability.
Governance integration means:
**Certified content programmes**: a formal process for promoting content from the sandbox or development zones to the presentation zone. Certification requires documented owner, documented description, tested data quality, and compliance review. Only certified content is guaranteed by the platform team.
**Data stewardship**: for each domain, a named data steward who is responsible for the accuracy and timeliness of the domain's data products. The steward is the escalation point for quality issues and the approver for schema changes.
**Lineage tracking**: every table in the warehouse knows where its data came from. Modern data warehouse tooling (dbt, OpenLineage) generates this automatically. For regulatory environments (financial services, healthcare), lineage that can be audited at the column level — "this EBITDA figure in this report came from these source system fields through these transformations" — is often a compliance requirement.
**Change management**: schema changes that break consumers must be communicated in advance with sufficient lead time. Breaking changes follow a deprecation process: the old schema is maintained for a migration period while consumers update; then the old schema is removed. Unannounced breaking changes are one of the fastest ways to erode trust in the data platform.
Performance Architecture at Scale
An enterprise warehouse with hundreds of concurrent users has performance requirements that a departmental database does not. Query concurrency, storage organisation, and compute separation all become material concerns.
**Concurrency management**: most analytical queries are read-only and can run in parallel. But highly concurrent workloads against the same tables cause resource contention. Compute separation (Snowflake virtual warehouses, BigQuery slot reservations, Redshift workload management queues) allocates specific compute resources to different workload classes — executive reporting gets guaranteed capacity; exploratory analytics gets shared capacity.
**Partitioning and clustering**: large tables should be partitioned on the most common filter dimension (usually date) and clustered on the next most common filter dimension (usually a domain key like customer_id or product_id). Queries that filter on partition and cluster keys skip irrelevant data without scanning the full table.
**Materialised aggregations**: for frequently-queried aggregate patterns, materialise the aggregation. A fact table with 10 billion rows that is always queried at the monthly/region/category grain should have a pre-computed monthly/region/category aggregate table maintained by the pipeline. The aggregate table is orders of magnitude smaller and proportionally faster to query.
**Extract management for BI tools**: BI tools like Tableau have their own caching layer (extracts). Well-designed Tableau extracts materialise the joins and aggregations the dashboards need, rather than relying on live queries against the warehouse at dashboard load time. This shifts the query load from interactive time (when executives are waiting) to scheduled extract refresh time (when the warehouse is less loaded).
Our data architecture practice designs enterprise data warehouse architectures — contact us to discuss your warehouse architecture requirements.
A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.
Book a Call →