Master Data Management: What It Is, When You Need It, and How to Build It

Master data management (MDM) creates a single authoritative record for core business entities — Customer, Product, Supplier, Location — across all systems. Here is what it involves, what problems it solves, and what implementation actually looks like.

The quick answer

Master data management (MDM) creates a single authoritative record for core business entities — Customer, Product, Supplier, Location — across all systems. It solves one specific problem: when the same real-world entity exists as multiple inconsistent records across multiple systems, and no single system is trusted as the authoritative source. MDM is not the right solution for general data quality problems or metric inconsistency — those are data governance and semantic layer problems. MDM is the right solution when entity duplication across systems is genuinely costing the business: duplicate customers receiving duplicate communications, product catalogues that cannot be reconciled across channels, or supplier data that cannot be consolidated for spend analytics.

What MDM actually is

Master data refers to the core business entities that appear across multiple systems: Customer (CRM, ERP, finance system, marketing platform), Product (product information management, ERP, e-commerce, warehouse), Supplier (procurement system, ERP, accounts payable), Location (CRM, logistics, property management). These entities are not transactions — they do not change frequently — but they are referenced by transactions constantly.

The problem MDM solves: each system creates and manages its own version of these entities, independently. A customer acquired through the website gets a customer record in the e-commerce system. The same customer calls the contact centre and gets a new record in the CRM. The same customer places an enterprise order and gets a third record in the ERP. Three records for the same customer, with different IDs, different data quality, and no link between them.

MDM creates a master record — a single authoritative representation of that customer — and establishes the reference that all systems point to. When you look up "Acme Corp" in any system, you get data that traces back to the same master record.

MDM styles: which approach fits your situation

MDM is not a single technology or approach — it is implemented in different styles depending on the use case, the systems involved, and the tolerance for centralisation.

**Centralised (Registry) style**: A central MDM hub maintains the master records. Source systems continue to own their own records but register them with the hub, which resolves duplicates, maintains the master ID, and distributes updates. Source systems do not change their data model — they add a reference to the master ID. This style is the least disruptive to source systems but requires all systems to participate in the hub for it to be complete.

**Consolidation style**: All source system records are consolidated into a centralised master data store. The master store is authoritative for analytics and reporting; source systems operate independently. The master store is not fed back into source systems — it is a read-optimised view of the best available data across sources. This style is the fastest to implement and is common for analytics-driven MDM (unified customer analytics, consolidated product catalogue for reporting). The limitation: source system inconsistency is not resolved, only masked in the consolidated view.

**Federation style**: Each system retains its master data, but a federation layer defines the mapping between system IDs and provides a unified lookup interface. No central store. The federation layer resolves "Acme Corp in the CRM" to "Acme Corp in the ERP" on demand. This style has the lowest implementation cost and is appropriate when source system authority is clear and the primary requirement is cross-system lookup, not data quality improvement.

**Coexistence style**: A centralised hub manages master records. Source systems synchronise changes back and forth — the hub updates source systems when master data changes, and source systems propagate updates back to the hub. This is the most complete style and the most complex. Bidirectional synchronisation requires conflict resolution rules and tight integration with source systems. Appropriate when the goal is genuine master data consistency across all systems, not just a consolidated view for analytics.

For most mid-market MDM programmes, consolidation style for analytics is the fastest path to value: ingest records from source systems, deduplicate and match to create master records, maintain the master in the data platform, and use it as the golden source for reporting. The bidirectional coexistence model is typically reserved for organisations with regulatory requirements for consistent master data across systems.

Customer MDM: the most common use case

Customer MDM is the most common MDM implementation because the business impact of customer record fragmentation is immediately visible: duplicate communications to the same customer, inability to calculate true customer lifetime value, sales reps who do not know they are calling an existing customer, and marketing campaigns that treat the same person as multiple independent prospects.

The core technical challenge in Customer MDM is entity resolution: given records from multiple systems that may or may not represent the same real customer, determine which records belong together and which are genuinely distinct.

Entity resolution approaches:

**Rule-based matching**: Define deterministic rules (exact match on email address, or exact match on phone number plus company name) that link records across systems. Fast to implement, easy to audit, but requires careful rule design to avoid false positives (merging records that are actually different people) and false negatives (missing matches when data quality is inconsistent).

**Probabilistic matching**: Score the likelihood that two records represent the same entity based on weighted similarity across multiple fields. A record with the same name (high weight), similar email (medium weight), and same company (high weight) gets a high match score. Records above a threshold are merged; those in a middle range are flagged for human review. More tolerant of data quality variation than rule-based matching, but requires calibration and ongoing monitoring.

**ML-based matching**: Trained models that learn matching patterns from examples. Most effective when you have labelled training data (known matches and non-matches) and the matching logic is complex enough that rules are difficult to maintain. The major MDM platforms (Informatica MDM, Reltio, Stibo Systems) include ML-based matching as a core capability.

For most organisations, a combination of rule-based matching for high-confidence scenarios and probabilistic matching with a human review queue for ambiguous cases provides the right balance of automation and accuracy.

Product MDM

Product MDM is the second most common MDM domain, particularly for manufacturers, retailers, and distributors with complex product catalogues. The business problem: products are managed in multiple systems (product information management, ERP, e-commerce, logistics) with different data models, different product IDs, and different levels of attribute completeness. A product that is out of stock cannot be communicated accurately across channels. A product recall cannot be traced through the full supply chain. Product reporting cannot be consolidated.

Product MDM maintains: a canonical product hierarchy (category structure, product families, variants), a master product ID that maps to all system-specific IDs, and a governed set of product attributes (dimensions, materials, certifications, descriptions) that are maintained to a defined quality standard.

The challenge in Product MDM is often authorship rather than deduplication: unlike customer records which are created by automated processes in multiple systems, product records are typically created by people (buyers, product managers, data entry teams) with inconsistent practices. The MDM programme needs to include the process governance that controls how products are created and modified in the master, not just the technical deduplication logic.

MDM and the data architecture

MDM fits into the data architecture as a master data layer that sits between source systems and analytics:

**Source systems** generate and maintain operational records for their domain (CRM maintains customer interaction records, ERP maintains order records).

**MDM hub** maintains master records: canonical entity definitions, resolved duplicates, master IDs, and governed attributes. In the consolidation style, this lives in the data platform (Snowflake, Databricks, BigQuery) as a set of Gold layer tables maintained by MDM processes. In the coexistence style, it is a dedicated MDM platform that synchronises with source systems.

**Data platform** uses master IDs to join across source system data. When the analytics pipeline ingests orders from the ERP and customer interactions from the CRM, the master customer ID is the join key that links them correctly — even if the ERP and CRM use different customer IDs internally.

**Semantic layer** uses master entity definitions as the canonical Customer, Product, and Supplier dimensions in the data warehouse model.

The relationship between MDM and data governance is close but distinct: data governance defines the policies, ownership, and standards; MDM is the operational system that implements those standards for master data specifically.

The build vs buy decision

Building MDM from scratch — entity resolution logic, deduplication workflows, master record management, survivorship rules — is a significant engineering investment. Most organisations evaluating MDM should consider purpose-built MDM platforms before committing to a custom build.

**Commercial MDM platforms** (Informatica MDM, Reltio, Stibo Systems, Semarchy): mature, feature-rich, with built-in entity resolution, survivorship rules, stewardship workflows, and integration connectors. Enterprise pricing — expect $200,000–$500,000+/year for a full implementation. Appropriate for large organisations with complex multi-domain MDM requirements.

**Lightweight MDM on the data platform**: For organisations with a primary analytics-driven use case (consolidated customer view for reporting, not bidirectional operational MDM), building a consolidation-style MDM programme using dbt, Spark, and a record linkage library (Splink is open-source and excellent for large-scale deduplication) on the existing data platform is significantly cheaper and faster. Appropriate when the coexistence model is not required.

**Data catalogue MDM features**: Platforms like Atlan, Alation, and Microsoft Purview include basic MDM capabilities (reference data management, glossary terms, linked entities). Appropriate for governance-oriented use cases but not for high-volume entity resolution or operational MDM.

When MDM is not the right solution

MDM is specifically for entity duplication across systems. It is not the right solution for:

**Metric inconsistency**: Different reports showing different revenue figures is a semantic layer problem, not an MDM problem. The data is not duplicated — it is calculated differently. Fix this with canonical metric definitions in the semantic layer.

**General data quality problems**: Null values, out-of-range amounts, invalid formats. These are data quality problems fixed by validation at ingestion and dbt tests in transformation, not by MDM.

**Missing data integration**: Source systems that need to exchange data but do not have integration. This is an integration architecture problem, not an MDM problem.

**Slow analytics on customer data**: Performance problems with customer analytics queries are a data model and platform performance problem. MDM may produce a better customer dimension table, but the performance fix is indexing, partitioning, and query optimisation.

If your data problem is one of these, MDM will add cost and complexity without solving it.

FAQs

How long does an MDM implementation take?

A consolidation-style Customer MDM programme — ingesting customer records from 3–5 source systems, running entity resolution, building a master customer table, and integrating it into the analytics layer — typically takes 12–20 weeks for a mid-market organisation. A full coexistence-style MDM programme using an enterprise MDM platform, with bidirectional synchronisation across 4+ source systems, typically takes 12–24 months. The timeline is driven primarily by source system complexity and the coexistence synchronisation requirements.

What is the difference between MDM and a Customer Data Platform (CDP)?

A CDP (Customer Data Platform) unifies customer data from multiple sources to build a comprehensive customer profile for marketing activation — segmentation, personalisation, journey analytics. An MDM hub resolves entity duplication and maintains authoritative master records for operational and analytical consistency. They solve related but different problems: a CDP is a marketing tool that creates profiles; MDM is a data governance tool that resolves identity. Some CDPs include basic MDM capabilities (deduplication, identity resolution), but CDPs are optimised for marketing use cases and MDM is optimised for enterprise data consistency.

Do we need MDM before we can build AI on our customer data?

Not necessarily, but unresolved entity duplication in customer data creates specific AI risks: a model trained on customer behaviour data where the same customer appears as multiple independent records will learn incorrect patterns. If your customer data has significant duplication across systems (a common problem in organisations that have grown through acquisition), resolving that duplication before training customer-level AI models produces materially better model performance. For AI use cases that are not customer-level (revenue forecasting, inventory optimisation), MDM is less critical.

We have Salesforce as our CRM. Is that sufficient for Customer MDM?

Salesforce maintains a customer record within Salesforce. It is not an MDM system — it does not resolve customer records across other systems (ERP, marketing automation, billing) or maintain a canonical master ID that other systems reference. Salesforce is a source system for Customer MDM, not the MDM hub itself. If your customer data exists only in Salesforce (no other systems have customer records), a formal MDM programme may not be necessary. If customer records exist in multiple systems, Salesforce is one of the sources to be resolved, not the solution.

Our data architecture consulting practice designs and implements MDM programmes — from analytics-focused consolidation builds on Snowflake and Databricks to full enterprise MDM platform implementations. If your organisation has customer, product, or supplier data fragmentation that is affecting analytics accuracy, book a free 30-minute audit and we will tell you which MDM approach fits your situation.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →