BlogData Architecture

What Is Master Data Management? Creating a Single Source of Truth for Core Entities

James Okafor
James Okafor
Senior Data Engineer
·June 19, 202811 min read

Master data management (MDM) is the practice of creating and maintaining a single, authoritative record for core business entities — customers, products, locations, and accounts — across all systems in an organization. This guide explains why MDM matters, how it is implemented, and what it costs to get wrong.

Master data management — MDM — is the practice of creating and maintaining a single, authoritative record for the core entities that appear across multiple systems in an organization. Those entities are typically customers, products, locations, accounts, and employees — the fundamental nouns of the business that every operational system references but no single system owns definitively.

The problem MDM solves is entity fragmentation. A mid-market enterprise running Salesforce, NetSuite, a data warehouse, a billing system, and a customer support platform will have the same customer represented differently in each. The Salesforce record has the official company name and a parent account relationship. NetSuite has the billing entity with a slightly different name. The billing system has a customer ID that does not map cleanly to either. The support platform has tickets filed under the trading name, not the legal entity name. None of these systems are wrong; they each capture what they need for their operational purpose. But when you try to answer "what is the total lifetime value of this customer across all products?" you are reconciling four different representations of the same entity.

Why Fragmentation Is Expensive

**Analytics accuracy.** If customer records across systems are not reconciled, cross-system analytics are wrong by construction. Revenue per customer, churn rate, product adoption — all of these metrics require knowing which records in system A correspond to which records in system B. Organizations without MDM produce these metrics through manual mapping exercises that are incomplete, stale the moment they are produced, and not reproducible.

**Operational friction.** Every team that works across systems — sales operations reconciling CRM and ERP, finance reconciling billing and contracts, customer success reconciling support and sales — is doing manual entity reconciliation as part of their job. That work is not value-creating; it is the cost of not having solved the underlying problem.

**Compliance exposure.** GDPR data subject access requests require knowing every system where a person's data is held. Without entity resolution across systems, the answer to "what data do we hold about this customer?" requires manual investigation across every system, risks incompleteness, and cannot be systematically audited.

**Migration and integration projects.** Every system integration project — merging two CRMs after an acquisition, migrating from one ERP to another — requires entity reconciliation as a prerequisite. Organizations that have solved MDM enter these projects with a clean entity map; organizations that have not spend 40–60% of project time on the entity reconciliation work that MDM would have done systematically.

MDM Architectures

**Registry style:** A central MDM system maintains a golden record that is a pointer to source system records. No data is duplicated; the registry maintains cross-reference IDs linking entity representations across systems. Systems query the registry to resolve "which NetSuite customer is this Salesforce account?" The registry does not replace source system records; it resolves the relationship between them. Lowest implementation effort; does not change data in source systems.

**Consolidation style:** Source system data is pulled into the MDM system, matched, merged, and a golden record is maintained in the MDM system as the single authoritative source. Downstream systems query the MDM system for master data rather than source systems. Higher implementation effort; produces a cleaner golden record; requires ongoing synchronization between source systems and the MDM hub.

**Coexistence style:** Golden record maintained in MDM system; golden record data is also pushed back to source systems to keep them in sync. Most complete solution; most complex to operate; source systems are updated with the golden record values, reducing fragmentation at the operational system level rather than only resolving it analytically.

**Data warehouse MDM:** For analytics-focused organizations without the budget or complexity to warrant a dedicated MDM platform, implementing entity resolution within the data warehouse transformation layer — using dbt models that match and deduplicate entities — is a practical intermediate approach. It does not solve the operational fragmentation but produces consistent entity resolution for analytical use cases.

The Matching Problem

The core technical challenge in MDM is entity matching — determining that two records in different systems represent the same real-world entity when there is no reliable shared key.

Matching approaches run from deterministic to probabilistic. Deterministic matching uses exact rules: if the email address matches and the company name matches within edit distance 2, they are the same entity. Probabilistic matching uses scoring across multiple attributes — company name similarity, domain, address, phone — to produce a match confidence score; records above a threshold are matched, those below are flagged for manual review.

Matching quality depends on data quality in source systems. Organizations with low data quality in CRM (inconsistent naming conventions, missing fields, duplicate records within a single system) face significantly harder matching problems — the underlying data quality must improve in parallel with MDM implementation.

MDM Tools and Implementation

Dedicated MDM platforms (Informatica MDM, Stibo Systems STEP, Semarchy xDM, Reltio) provide matching engines, golden record management, workflow for match review and resolution, data stewardship interfaces, and integration connectors to common source systems. They are appropriate for large enterprises with significant entity volumes, complex matching requirements, and multiple domains (customer plus product plus location).

For smaller organizations or analytics-first implementations, building entity resolution in the transformation layer using dbt is a viable lower-investment approach. dbt models can implement matching logic, produce deduplicated entity tables, and maintain cross-reference tables linking source IDs to a canonical entity ID — providing the analytical foundation without the operational overhead of a dedicated MDM platform.

The right approach depends on: whether the fragmentation problem is primarily analytical or operational, entity volume and matching complexity, available investment, and whether the organization needs to push golden records back to source systems or only resolve entities for analytics.

Our data architecture services practice designs entity resolution and master data management implementations appropriate for the organization's scale, complexity, and investment. Contact us to discuss your data architecture requirements.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →