Data Observability: What It Is and How to Implement It

Data observability is the ability to understand the health of data in your system — detecting when something breaks, degraded, or drifted before users are affected. Here is what it means in practice, how it differs from data quality testing, and which tools are worth deploying.

The quick answer

Data observability is the ability to understand the health of your data at any point in time — detecting when data is wrong, late, missing, or drifted before users and downstream systems are affected. It is distinct from data quality testing (which validates data against predefined rules) and pipeline monitoring (which monitors whether pipelines ran successfully). Observability adds the ability to detect unexpected changes that no predefined rule anticipated.

The concept is borrowed from software engineering observability (the ability to understand internal system state from external outputs — logs, metrics, traces) and applied to data: instead of monitoring whether your application is running, you monitor whether your data is correct.

Why this is harder than it sounds

Pipeline monitoring tells you a job succeeded or failed. Data quality testing tells you the data meets your defined expectations. Neither tells you:

- That your users table row count dropped 40% but the pipeline still completed successfully

- That the distribution of values in a revenue column shifted significantly, suggesting a source system change

- That data from a specific customer segment is no longer arriving, without any pipeline error

- That a column is NULL at a rate three times higher than yesterday

These are the failures that produce incorrect analytics, wrong model predictions, and business decisions made on bad data — without any alert from pipeline monitoring or rule-based quality tests.

Data observability addresses this by learning what "normal" looks like for your data and alerting on anomalies, not just rule violations.

The five pillars of data observability

**Freshness** — is data up to date? Freshness monitoring tracks how recently a table was updated and alerts when data is older than expected. A table that should be updated hourly but was last updated six hours ago is a freshness failure — even if the pipeline scheduled to update it succeeded. Freshness failures often indicate upstream issues (source system downtime, API rate limits, pipeline dependency failures) that the pipeline scheduler does not see.

**Volume** — are the expected numbers of records present? Volume monitoring establishes normal row count ranges for tables and alerts on unexpected drops or spikes. A 40% drop in daily transaction rows is a data health failure; a 500% spike in a normally stable reference table is suspicious. Volume anomalies often signal source system issues, pipeline bugs, or unexpected upstream changes.

**Schema** — have table structures changed unexpectedly? Schema monitoring detects column additions, removals, renames, and data type changes. Schema changes in source systems are among the most common causes of downstream data failures — a source field is renamed in a CRM update, and every downstream transformation that references the old field name breaks or silently returns NULLs.

**Distribution** — has the statistical character of data changed? Distribution monitoring looks at the range, mean, null rate, and cardinality of columns and alerts when they deviate from established baselines. If the average order value in your transactions table shifts from $120 to $12 overnight, distribution monitoring catches it. Rule-based tests would not unless you had defined an explicit range check.

**Lineage** — when something breaks, where did it start? Lineage tells you which upstream sources feed a broken table and which downstream assets depend on it. When an anomaly is detected, lineage converts "something is wrong with this table" into "this specific upstream source changed and these downstream dashboards are affected." Impact analysis and root cause diagnosis become manageable rather than manual.

Data observability vs data quality testing

Data quality testing (dbt tests, Great Expectations) validates data against explicitly defined rules: a column should not be null, values should be in a specified set, a foreign key should resolve. These tests are essential and should be the first line of defence. They are not a substitute for observability.

The limitation of rule-based testing: you can only write tests for failure modes you anticipate. Most production data quality failures are not anticipated. They are caused by source system changes, business process changes, and upstream data issues that no one modelled when writing the tests.

Observability tools (Monte Carlo, Bigeye, Metaplane, Datafold) learn normal patterns from historical data and alert on anomalies — detecting failures that were never explicitly defined as expected failure modes. The two approaches are complementary, not competing: dbt tests plus an observability tool provides defence-in-depth coverage.

When data observability matters most

**Large or complex data environments.** The more tables, pipelines, and dependencies you have, the less feasible it is to write explicit quality rules for every possible failure mode. Observability scales where explicit testing does not.

**Frequent upstream changes.** If source systems change schema, data volumes, or data characteristics regularly (SaaS platform updates, business process changes, external data feeds), observability detects the effects before they reach BI tools.

**High-cost data failures.** In environments where data drives operational decisions (fraud detection, inventory management, financial reporting), undetected quality failures are expensive. The cost of an observability tool is justified by one prevented incident.

**Before deploying ML models.** ML models degrade when the data distribution they were trained on drifts from the data they are scoring. Monitoring feature distribution in production data and alerting when it deviates from training distribution (feature drift monitoring) is a specific observability use case that becomes critical as AI workloads move to production.

**Regulated industries.** Financial services, healthcare, and insurance organisations with regulatory data quality requirements (BCBS 239, HIPAA, SOC 2) need documented evidence that data quality is monitored. Observability platforms produce audit-ready quality reports.

Data observability tooling

**Monte Carlo** is the market-leading enterprise data observability platform. ML-based anomaly detection across freshness, volume, schema, distribution, and lineage. Integrates with Snowflake, BigQuery, Databricks, Redshift, and most modern data stacks. Ingests dbt metadata, Airflow logs, and BI tool metadata to provide end-to-end lineage. Strong for large enterprise environments. Pricing is contract-based.

**Bigeye** offers similar ML-based monitoring with strong dbt integration and a focus on automated threshold learning. Good for organisations that want observability without significant configuration overhead — the auto-detection of what to monitor is more automated than Monte Carlo.

**Metaplane** targets mid-market data teams with simpler environments. Lower implementation overhead than Monte Carlo; appropriate for organisations with tens rather than hundreds of data assets.

**Datafold** focuses on data diff capabilities — detecting row-level and column-level changes between data versions. Particularly useful for CI/CD workflows on data: when a code change is deployed, Datafold shows what changed in the data outputs, enabling data engineers to review data diffs the same way software engineers review code diffs.

**Great Expectations** (open-source) provides rule-based data validation rather than ML-based anomaly detection. It is a data quality testing tool that complements observability rather than replacing it. Strong community, extensive documentation, integrates with Airflow and dbt.

**dbt tests** are the baseline. Every dbt-based architecture should have comprehensive dbt tests before investing in observability tooling. dbt tests cover the known failure modes; observability covers the unknown ones.

Implementing data observability

**Start with dbt tests.** If your transformation layer does not have comprehensive dbt tests, that is the first priority. Get schema tests, referential integrity tests, and business logic assertions in place. Observability tools build on top of this foundation.

**Instrument the most critical tables first.** Not every table needs ML-based observability monitoring. Start with tables that feed critical dashboards, financial reports, or ML models — the tables where a failure has the most business impact.

**Establish baselines before relying on alerts.** ML-based anomaly detection tools need historical data to learn normal patterns. Deploy the tool, let it observe for 2–4 weeks to establish baselines, then enable alerting. Enabling alerts immediately produces false positives from baseline learning.

**Define alert severity and routing.** An alert that goes to everyone is ignored by everyone. Define which anomalies are P1 (immediate response required), P2 (investigate within the day), and P3 (review in the next sprint). Route P1 alerts to on-call channels; P2/P3 to team queues.

**Close the loop with lineage.** Observability alerts are most useful when they come with lineage context: which upstream sources feed this table, and which downstream assets depend on it. Deploy lineage alongside anomaly detection so that alerts are actionable, not just informative.

For the governance structures that make data observability sustainable — ownership, SLAs, response procedures — see data quality management. For how lineage fits in the broader architecture, see data lineage.

Frequently asked questions

Do we need an observability tool if we have dbt tests?

dbt tests cover the failure modes you anticipated when you wrote the tests. Observability tools catch failures you did not anticipate — and most production data quality failures are unanticipated. The two approaches are complementary. For simple environments, dbt tests alone may be sufficient. For complex environments with many data assets, frequent upstream changes, or high-cost failures, an observability layer adds material protection.

Our team is small. Is observability tooling worth the cost?

For small teams with simple environments (fewer than 20 tables, stable source systems, low business impact of data failure), the cost of enterprise observability tooling may not be justified. Start with comprehensive dbt tests. As the environment grows or the business impact of data failures increases, revisit.

How does observability relate to data governance?

Observability provides the monitoring signals; governance provides the response procedures and ownership model. Who is alerted when an anomaly is detected? Who owns the table that failed? What is the resolution SLA? Governance defines the answers; observability generates the triggers. One without the other is incomplete.

Our data architecture consulting practice designs data quality and observability architectures for mid-market and enterprise data platforms. If you are building observability into a data stack or evaluating tooling, book a free 30-minute audit and we will recommend the approach that fits your environment.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →