BlogData Architecture

Data Mesh Architecture: What It Is and Whether Your Organisation Needs It

Austin Duncan
Austin Duncan
Managing Director & Principal Data Architect
·May 22, 202612 min read

Data mesh is an organisational and architectural approach that distributes data ownership to domain teams instead of centralising it in a data engineering function. Here is what it actually involves, who it is designed for, and the honest assessment of when it solves a real problem versus adding complexity.

The quick answer

Data mesh is an organisational and architectural approach that distributes data ownership to domain teams — the people closest to the data — instead of centralising data engineering in a single platform team. It emerged from a specific problem: centralised data teams at large organisations cannot keep pace with the volume of data product requests from the business, and the bottleneck degrades data quality and analytical value across the organisation. Data mesh solves this bottleneck problem. It adds significant organisational complexity in return. It is the right architecture for large organisations with the data maturity and engineering capacity to implement it. For most mid-market and early-stage data organisations, a well-governed centralised platform delivers the same outcomes with far less complexity.

What data mesh actually is

Data mesh was introduced by Zhamak Dehghani in a widely-cited 2019 article and developed further in her 2022 book. The core insight: centralised data engineering teams become bottlenecks at scale. Business domains — Sales, Finance, Product, Operations — generate data, understand that data best, and have the most urgent need to use it. But in the centralised model, they must request data products from a central team that does not have full business context, operates on its own backlog, and cannot prioritise all domains simultaneously.

Data mesh redistributes this responsibility. Domain teams own, build, and maintain their own data products. A central platform team provides the infrastructure (a "self-serve data platform") that makes this practical without requiring every domain team to become infrastructure engineers. Federated governance standards define what data products must provide (quality SLAs, documentation, access interfaces) without dictating how they are built.

This is both an organisational change and an architectural change. Most failures to implement data mesh focus on the technical architecture (building data domains, federating the infrastructure) while underinvesting in the organisational change (giving domain teams genuine ownership with genuine accountability). Without the organisational change, data mesh is a more complex version of the centralised architecture it was intended to replace.

The four principles of data mesh

1. Domain-oriented decentralised data ownership

Data is owned by the domain team that generates and understands it — Sales owns sales data, Finance owns financial data, Product owns event and usage data. Domain teams are accountable for the quality, availability, and documentation of the data they expose. They are not data requesters waiting for a central team; they are data product owners.

This requires domain teams to have data engineering capability. Either domain teams include embedded data engineers, or data engineers are aligned to domains rather than to a central function. The organisational model varies, but the accountability is clear: the team that generates the data is accountable for making it usable.

2. Data as a product

Domain data is not a raw database dump or an internal pipeline output — it is a product, designed for consumption by other teams. A data product has a defined interface (how consumers access it), quality commitments (SLAs for freshness and accuracy), documentation (what the data contains and how to use it), and a named owner accountable for its quality.

This shift in framing — from "data we generate" to "data products we maintain" — changes the incentives. Domain teams are accountable not just for generating data but for maintaining it at the quality level that downstream consumers depend on.

3. Self-serve data infrastructure as a platform

For domain teams to own data products without becoming infrastructure teams, a central platform function must provide the infrastructure that makes this practical: a cloud data platform (storage, compute, query engine), data pipeline tooling, cataloguing and discovery, observability, and governance primitives. Domain teams use the platform to build their data products; the platform team maintains the infrastructure they build on.

The platform team's job changes from building data products to building the platform that enables domain teams to build data products. This is a significant cultural and skills shift — from hands-on data product delivery to enabling others to deliver.

4. Federated computational governance

Governance in a data mesh is not centralised control — it is federated standards enforced computationally. The central governance function defines what data products must provide (quality SLAs, data classification, access control interfaces, lineage documentation) but does not control how domain teams implement them. Compliance is verified programmatically against the standards, not through manual review.

This is the hardest principle to implement. Federated governance requires the governance function to operate at policy level rather than implementation level — setting the standards and trusting domain teams to meet them, with automated verification rather than manual oversight.

Who data mesh is designed for

Data mesh addresses a specific problem: the centralised data engineering bottleneck at scale. It is the right solution for organisations that exhibit this pattern:

- **Large organisations with many distinct data domains** — 5+ business domains each generating significant data

- **Existing centralised data team is overwhelmed** — backlog is growing, stakeholder satisfaction is low, data freshness is inadequate

- **Domain teams have sufficient engineering maturity** — they can absorb data ownership responsibility without becoming pure infrastructure teams

- **Executive commitment to organisational change** — domain teams will push back on ownership; without executive mandate, the change does not take hold

Data mesh is **not** the right architecture for:

- **Mid-market organisations with limited engineering capacity** — data mesh requires distributed data engineering capability that most mid-market organisations do not have

- **Organisations at early data maturity** — if you do not have a functioning centralised data platform, distributing ownership before you have the basics is premature

- **Organisations whose data bottleneck is technical, not organisational** — if the bottleneck is pipeline quality or data model design rather than team bandwidth, data mesh does not solve it

- **Organisations that want faster time-to-value** — implementing data mesh correctly takes 18–36 months; a well-executed centralised platform delivers analytics value in 3–6 months

Data mesh vs centralised data platform: the comparison

| Dimension | Centralised platform | Data mesh |

|---|---|---|

| Data ownership | Central data engineering team | Domain teams |

| Accountability | Central team accountable for all data | Domain teams accountable for their data |

| Scale | Bottlenecks at high domain volume | Scales with domain team growth |

| Time to value | 3–6 months | 18–36 months to full implementation |

| Complexity | Lower | Significantly higher |

| Required capability | Senior data engineering team | Data engineering capability in every domain |

| Right for | Most mid-market, early-stage orgs | Large orgs with mature data practices |

The centralised model is not inferior to data mesh — it is appropriate for a different organisational context. Most organisations that are considering data mesh would be better served by building a well-governed centralised data platform first, reaching data maturity, and then evaluating whether the domain bottleneck problem justifies the transition.

Data mesh in practice: what implementation looks like

Organisations that successfully implement data mesh follow a consistent sequence:

Phase 1: Self-serve platform (6–12 months)

Build the infrastructure that domain teams will use. This is the prerequisite — without a self-serve platform, domain teams cannot own data products. The platform typically includes: a cloud data lakehouse (Delta Lake or Iceberg on object storage), a cataloguing layer (Microsoft Purview, Atlan, Alation), data quality tooling (dbt tests, Great Expectations), and a pipeline framework that domain teams can operate without deep infrastructure knowledge.

Phase 2: Pilot domain ownership (3–6 months)

Select one or two business domains — typically the most data-mature, most motivated — and transfer genuine data ownership to them. This means embedding data engineers in those domains, establishing the data product interface standards they must meet, and removing the central team from the delivery chain for those domains' data products. The pilot validates the organisational model before scaling it.

Phase 3: Governance framework (concurrent with Phase 2)

Define the federated governance standards: what does a data product require? Freshness SLA, quality checks, lineage documentation, access interface, data classification labels. Implement automated verification so compliance is checked programmatically, not manually reviewed. Establish the cross-domain governance committee that resolves conflicts between domain definitions.

Phase 4: Scale domain ownership (12–24 months)

Extend domain ownership progressively to additional domains, using the pilot experience to refine the onboarding process and governance framework. The central data engineering function shifts from delivery to enablement — helping domain teams adopt the platform, improving the self-serve infrastructure based on domain feedback.

The semantic layer in a data mesh

One of the most common questions about data mesh: if domain teams own their own data products, how do cross-domain analytics work? A query that joins Sales data with Finance data and Product data — where does the governance live?

The answer is the semantic layer — canonical definitions of cross-domain entities (Customer, Order, Revenue) that domain teams must implement consistently, even though they own their own data. In practice, this requires a central data product for core business entities: a Customer data product owned and maintained by the domain team closest to the customer record (typically CRM/Sales), with a canonical interface that other domains join to. Cross-domain analytics are served from the semantic layer that sits above the domain data products, providing the consistent metric definitions that prevent the distributed ownership model from producing inconsistent cross-domain analysis.

For more on how the semantic layer works, what is a semantic layer covers the implementation in detail.

FAQs

Is data mesh just data marts with extra steps?

Data marts — copies of data prepared for specific business functions — are a much older pattern that data mesh superficially resembles (distributed, domain-oriented data). The difference is accountability: data marts are typically built by the central team for domain consumption, with the central team accountable for quality. Data mesh puts the accountability with the domain team. The architectural pattern may look similar; the organisational model is fundamentally different.

Do we need data mesh to support AI?

No. AI workloads need governed, high-quality data — not necessarily decentralised data. A well-governed centralised lakehouse with proper data quality standards serves AI requirements effectively. Data mesh can improve the quality and freshness of domain data if implemented well, but it is not a prerequisite for AI. The requirements for AI-ready data architecture are covered in why your data architecture cannot support agentic AI.

What is the difference between data mesh and data fabric?

Data mesh is an organisational and architectural approach: decentralised ownership, data as a product, self-serve platform, federated governance. Data fabric is a technical architecture pattern: using metadata, AI, and integration technology to provide a unified access layer across disparate data sources. They are not competing approaches — a data fabric can be the technical implementation of a self-serve data platform in a data mesh. The terms are often confused because both address the challenge of working with distributed data, but they operate at different levels: mesh is an organisational design; fabric is a technical design.

We have read about data mesh but our data team has a backlog. Where do we start?

A data team with a backlog is experiencing the centralised bottleneck problem that data mesh addresses — but implementing data mesh is not the right immediate response. The right immediate response is: diagnose why the backlog is growing (is it prioritisation? engineering quality? inadequate capacity? unclear requirements?), fix the highest-impact bottleneck, and evaluate whether the problem is fundamentally about centralisation or about other issues. Most backlog problems can be resolved without data mesh. If you have resolved the operational bottlenecks and the problem is genuinely about scale and distributed domain ownership, then data mesh becomes relevant.

Our data architecture consulting practice designs data platform architectures for organisations at every maturity stage — from initial cloud data platform builds to data mesh transitions at enterprise scale. If your data team is experiencing bottlenecks or your organisation is evaluating data mesh, book a free 30-minute audit and we will tell you directly whether the problem you have is the problem data mesh solves.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →