The six dimensions of data quality, the metrics that operationalise each dimension, how to build a data quality score for your most important tables, and the programme-level KPIs that tell leadership whether data quality is improving or degrading.
"Data quality" is an outcome, not a measurement. To manage data quality — to improve it, to demonstrate that it is improving, and to report its status to stakeholders — you need specific, measurable metrics. This guide covers the six dimensions of data quality, the metrics that operationalise each dimension, and the programme-level KPIs that tell leadership whether data quality is improving over time.
The Six Dimensions of Data Quality
**Completeness:** Are all the values that should be present actually present? Completeness failures include null values in required fields, missing records that should exist (a customer in the CRM with no corresponding billing record), and incomplete records where partial information has been captured.
**Accuracy:** Do the values correctly represent the real-world entity or event they describe? Accuracy failures include incorrect values (a wrong birth year, a misspelled company name, an order amount that does not match the invoice), values that are outside the possible range (a negative stock quantity, an age of 300), and values that are inconsistent with related data (a delivery date before the order date).
**Consistency:** Are the same facts represented consistently across systems and over time? Consistency failures include the same customer having different names across CRM and billing, the same product having different category assignments across different data sources, and a metric computed differently in two different reports.
**Timeliness:** Is the data current enough for its intended use? Timeliness failures include stale data (the marketing database is 48 hours behind the product database for a same-day decision), delayed pipelines (the 6 AM refresh completed at 10 AM), and data with a freshness guarantee that is not being met.
**Validity:** Do the values conform to expected formats, ranges, and business rules? Validity failures include email addresses that fail format validation, phone numbers with the wrong digit count, postcodes that don't match the stated country, and status values outside the agreed set.
**Uniqueness:** Are entities uniquely represented where uniqueness is required? Uniqueness failures include duplicate customer records, duplicate transaction IDs (which can cause double-counting), and duplicate product records that create ambiguity in joins.
Metrics for Each Dimension
Completeness metrics:
- Null rate per column: percentage of rows where the column value is null. Track per column; alert when null rate increases significantly (more than 2x baseline or above defined threshold).
- Required field completeness: percentage of records where all required fields are populated. A CRM record with a name and email but no company is 80% complete if company is required.
- Record completeness: percentage of expected records that are present. If 1,000 orders were processed yesterday but only 970 exist in the warehouse, record completeness is 97%.
Accuracy metrics:
- Out-of-range rate: percentage of values outside the defined valid range (e.g., prices > $1M where max expected is $100K, ages > 120).
- Referential accuracy: percentage of foreign key values that exist in the referenced table. A 2% orphan rate in order-to-customer foreign keys indicates 2% of orders reference non-existent customers.
- Cross-system accuracy: percentage of values that match the authoritative source when checked against it (sampled or via automated comparison).
Consistency metrics:
- Cross-system consistency rate: for shared entities (customers appearing in both CRM and billing), the percentage where key attributes match (company name, email, address tier).
- Metric consistency: for metrics defined in multiple places, the percentage agreement between computations. If the BI tool and a manual export agree on revenue within 0.1%, consistency is high.
Timeliness metrics:
- Data freshness: time elapsed since the last successful data load. Compare against SLA.
- SLA compliance rate: percentage of pipeline runs that met the defined freshness SLA. Track weekly and monthly.
- Average latency: mean time from event occurrence in source system to data availability in the warehouse.
Validity metrics:
- Format validation pass rate: percentage of email addresses, phone numbers, and other formatted fields that pass validation rules.
- Business rule pass rate: percentage of records that satisfy defined business rules (delivery_date >= order_date, discount_pct between 0 and 100, status is in the approved list).
Uniqueness metrics:
- Duplicate rate: percentage of rows in a table that are duplicates by primary key or business key. For fact tables with natural keys (order_id), a non-zero duplicate rate is a data quality failure.
- Entity resolution rate: for tables that should have one record per entity (customer, product), the percentage of entities with exactly one record.
Building a Data Quality Score
A data quality score aggregates multiple dimension metrics into a single indicator for a table or domain. A simple approach:
For each critical table, define 5–10 specific tests across the quality dimensions. Each test produces a pass/fail result. The data quality score is the percentage of tests that pass.
A more nuanced approach weights tests by business impact:
- A null rate on the primary key (score: 0 if any nulls exist) is more critical than a null rate on an optional description field (score: proportional reduction)
- A 5% duplicate rate on a transaction fact table is a major failure; a 0.1% duplicate rate may be acceptable
Publish data quality scores by table and domain in a data quality dashboard. Track the trend over time — are scores improving, stable, or degrading? Report the scores by domain to data owners as part of the governance programme.
Programme-Level KPIs
These KPIs tell leadership whether the data quality programme is working:
**Data quality score trend:** Is the average quality score across critical tables improving quarter over quarter? A target of 98% pass rate across all critical table tests, trending upward, is a meaningful programme indicator.
**Incident rate:** How many data quality incidents (events where incorrect data reached end users, affected business decisions, or triggered user complaints) occurred per month? A declining incident rate indicates the programme is catching issues before they affect users.
**Mean time to detection (MTTD):** How quickly is a data quality issue detected after it occurs? Issues detected in minutes (automated testing catches the problem at pipeline run time) have low MTTD. Issues detected when a business user reports "the numbers look wrong" have high MTTD. A declining MTTD indicates improving monitoring.
**Mean time to resolution (MTTR):** How long does it take to resolve a data quality issue after detection? Issues resolved within hours indicate clear ownership and effective processes. Issues open for weeks indicate unclear ownership or insufficient engineering capacity. Track MTTR by domain to identify where the bottlenecks are.
**Test coverage rate:** What percentage of critical data assets have defined quality tests? A programme that has defined tests for 90% of critical tables is more mature than one covering 40%. Track coverage growth as the programme expands.
**User trust score:** In periodic user surveys (quarterly or semi-annual), how confident are business users in the data they use for decisions? Improving user trust is the ultimate outcome of a data quality programme. It is a lagging indicator — it improves after quality actually improves — but it is the measure that matters most to the business.
Our data engineering consulting practice designs and implements data quality programmes including metric frameworks and automated testing — contact us to discuss your data quality requirements.
A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.
Book a Call →