BlogData Architecture

Data Contracts: Formalising the Agreement Between Data Producers and Consumers

Austin Duncan
Austin Duncan
Managing Director & Principal Data Architect
·August 4, 202712 min read

A data contract is a formal agreement between the team that produces a dataset and the teams that consume it — specifying schema, semantics, quality levels, and SLAs. Data contracts shift data quality accountability to the source, reduce the cost of schema changes, and replace the informal, undocumented expectations that produce most data quality incidents.

A data contract is a formal agreement between the team that produces a dataset and the teams that consume it — specifying schema, semantics, quality levels, and service level agreements. Data contracts shift the accountability for data quality upstream to the source, reduce the cost of breaking schema changes, and replace the informal, undocumented expectations that produce most data quality incidents in organisations where multiple teams share data.

Why Data Quality Problems Persist

In most data organisations, the root cause of data quality problems is not technical — it is organisational. Source system teams change schemas without notifying downstream consumers. Extract pipelines break silently. Business logic changes in the source application produce numerically different data without any schema change. Downstream teams discover problems when dashboards break or when a business user reports an anomaly.

The informal process for managing these problems is a combination of monitoring alerts, ad-hoc communication, and reactive debugging. Each incident is resolved; the next one is not prevented. The pattern repeats because the underlying accountability is not clear. Source teams do not know what consumers depend on; consumer teams do not know who to contact when data breaks.

Data contracts create explicit accountability. The producer team documents what they commit to providing; the consumer teams document what they depend on. Changes that would break the contract require a coordination process — not just a code change and deployment. Incidents become attributable: was the producer contract violated, or was the consumer relying on something outside the contract?

What a Data Contract Specifies

A well-formed data contract covers:

**Schema specification** — field names, data types, nullability, and any constraints (uniqueness, allowed values, referential integrity). The schema specification is the minimum contract; it is what most teams have, implicitly, through documentation that is rarely kept current.

**Semantic definitions** — what the fields mean in business terms. A field named 'revenue' in an orders table could mean gross revenue, net of refunds revenue, net of discounts revenue, or recognised revenue. Without a semantic definition, consumers build calculations on assumptions that may be wrong, and discrepancies between teams become arguments about whose definition is correct rather than investigations into which definition is right.

**Quality SLAs** — the expected completeness, accuracy, and freshness of the data. A completeness SLA might specify that the orders table will contain at least 95% of expected records within 2 hours of close. A freshness SLA might specify that data will be available by 7:00 AM daily. Quality SLAs give consumers the information they need to assess whether a dataset is fit for their use case.

**Ownership and escalation** — the team or individual responsible for the dataset, and the process for raising concerns or requesting changes. Without an owner, consumer teams have no one to contact when data breaks.

**Change notification process** — how the producer will communicate planned breaking changes. At minimum, a defined notice period (two weeks for schema changes, one sprint for semantic changes) and a coordination requirement (breaking changes require consumer sign-off before deployment).

Implementing Data Contracts Technically

Data contracts can be implemented at different levels of formality:

**Schema-as-code** is the minimum viable implementation. Schema definitions in YAML or JSON, version-controlled in git, with automated validation that the actual schema of the table matches the contract definition. Tools like Great Expectations, dbt schema tests, or custom validation queries can validate schema contracts on every pipeline run.

**Schema registry** for event-driven architectures. Apache Kafka's Schema Registry enforces that producers publish events that conform to a registered schema. Consumers can depend on the schema; the registry prevents producers from publishing non-conforming events.

**Contract testing** using tools like dbt model contracts (added in dbt v1.5), which define enforced constraints on model outputs, or purpose-built contract testing frameworks that verify producer output against consumer expectations on each build.

**Data quality SLA monitoring** with alerting — automated checks that validate completeness, freshness, and key metric ranges at a defined cadence, alerting the responsible owner when SLAs are breached.

The maturity model is progressive: start with schema-as-code and an ownership registry; add quality monitoring; formalize the change notification process; eventually move to schema registry enforcement for the highest-criticality data pipelines.

Data Contracts and Organisational Change

The technical implementation is the simpler part. The harder part is the organisational change required to make data contracts real: source teams need to actually commit to contract terms; a process needs to exist for consumers to ratify those terms; and there needs to be a consequence when contracts are violated.

Without organisational follow-through, data contracts become documentation that is never updated and therefore never trusted. The documentation exists; the accountability does not.

The practical approach is to start with the highest-value, highest-pain data relationships: the datasets that break most often, that have the most downstream consumers, or that are used in the most business-critical decisions. Define contracts for those specific datasets, implement automated validation, and demonstrate that the contract is enforced. Trust builds from demonstrated reliability on specific datasets, not from declaring a comprehensive data contract programme.

Our data architecture and data governance practice helps organisations implement data contracts as part of a broader data quality and governance programme — contact us to discuss your data contract strategy.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →