A synthesis of Bain and McKinsey research on what is actually stopping enterprise AI from delivering. Eight structural problems — and what data leaders are doing about each one.
The quick answer
Enterprise AI is not failing because the models are wrong. It is failing because the data underneath them is not ready. Bain and McKinsey research from 2025 and 2026 identifies eight structural problems that data leaders are grappling with — problems that sit below the model layer, inside the data architecture, governance frameworks, and organisational structures that most enterprises have not materially changed since they built their first data warehouse. This report names each one, explains what is driving it, and points to what leading organisations are doing differently.
Why AI initiatives keep stalling
The pattern is consistent across sectors. An enterprise invests in a large language model deployment or an agentic AI programme. The proof of concept performs well. The business case is approved. And then — somewhere between proof of concept and production — the initiative slows to a crawl.
The delay is almost never the model. It is the data. Either the data required to support the use case is not available in the right format, not clean enough to trust, not governed in a way that satisfies legal and compliance, or not accessible to the AI system at the speed and granularity it requires.
Data leaders already know this. The challenge is that fixing these problems requires authority, resourcing, and executive support that the data function often cannot secure on its own — precisely because the problems predate the current AI agenda and have been tolerated for years.
The eight problems below represent the current state of that gap.
Problem 1: Business units do not trust the numbers
The most common data problem in enterprise organisations is not missing data — it is disputed data. Multiple systems contain different versions of the same metric. Revenue in the CRM does not match revenue in the ERP. Customer count in the marketing platform does not reconcile with customer count in the data warehouse. When the CEO asks for one number, three teams send three different figures.
This is not a new problem. But AI makes it significantly worse, because AI systems will confidently act on whichever version of the data they have access to. When a human analyst sees two conflicting numbers, they pause and investigate. An AI system does not pause. It uses the data available to it and produces outputs — recommendations, automated decisions, generated reports — that propagate the inconsistency at machine speed and scale.
Bain's research indicates that BI teams in affected organisations spend over 40% of their time reconciling data rather than analysing it. The cost is not only in analyst hours — it is in executive trust. Once a board has seen AI-generated outputs contradicted by manual analysis, the confidence deficit takes months to recover.
The fix requires a semantic layer: a governed set of canonical definitions for every business entity and metric that any system — including AI — is authorised to use. This is a technical and organisational change. The technical build is straightforward. Getting business units to relinquish their local definitions is not.
Problem 2: Nobody owns the data
Fragmented data ownership is the governance problem that looks like a technology problem. The symptoms are familiar — inconsistent definitions, ungoverned integrations, data quality issues that nobody is accountable for fixing — but the root cause is that no individual or function has clear authority over the data assets that span the organisation.
In most enterprises, data ownership is informal at best. Individual teams own the systems that produce their data. The data team owns the pipelines and the warehouse. But the data itself — the definitions, the quality standards, the governance policies — belongs to nobody. When a definition conflict arises, there is no authority to resolve it. When a data quality issue appears, there is no accountable owner to fix it.
AI deployments expose this gap immediately. The AI system needs canonical definitions. It needs someone to be responsible when those definitions are wrong. It needs an escalation path when it encounters data that contradicts its training. Without that ownership structure, the AI governance model has no foundation.
Data leaders are addressing this through formal data product ownership — assigning explicit owners to data domains with accountability for quality, definition, and access standards. The shift is from owning pipelines to owning data products that other systems — including AI — consume.
Problem 3: The data infrastructure was built for dashboards, not AI
The average enterprise data architecture was designed to answer one kind of question: what happened? Dashboards retrieve historical data on a schedule, aggregate it, and display it to human analysts who interpret it and decide what to do.
AI requires a different kind of infrastructure. It needs data that is current — not hours or days old, but seconds or minutes old. It needs data that is semantically consistent across systems, not just within a single dashboard. It needs data that it can access programmatically at any hour, at any volume, in any query pattern — not just the patterns the warehouse was pre-optimised for.
The structural mismatch is not fixable with configuration. Batch-oriented data warehouses were not designed for streaming access. Reporting schemas were not designed for open-ended AI queries. The infrastructure investment required to close this gap is significant — streaming pipelines, event-driven architectures, semantic layers, real-time quality checks — and most enterprises have not started it.
McKinsey's research identifies infrastructure readiness as the leading reason that AI use cases stall between proof of concept and production. The PoC ran on clean, pre-prepared data. Production requires connecting to the actual enterprise data landscape — which is where the gaps appear.
For a detailed breakdown of the specific architectural changes required, why your data architecture cannot support agentic AI covers the five structural gaps and what agentic-ready architecture looks like.
Problem 4: The data team has the wrong skills for what AI now requires
Most enterprise data teams were built to support analytics. The skills that made them effective — SQL, business intelligence tooling, data modelling for reporting, dashboard development — are not the skills required to build and maintain AI-ready data infrastructure.
AI engineering requires a different discipline: streaming data pipelines, vector databases, embedding models, MLOps platforms, prompt engineering, feature stores, and the ability to instrument and monitor AI systems in production. These are not skills that analysts typically have. Retooling is slow. Hiring is competitive. And the teams already have a full workload supporting the existing BI environment.
The talent gap is structural, not just a hiring backlog. Data leaders who are closing it fastest are doing so through a combination of targeted upskilling for senior engineers, strategic use of external specialists for the build phase, and deliberate separation of AI engineering responsibilities from BI support responsibilities — so the AI work has dedicated capacity rather than competing with dashboard requests.
Problem 5: Governance frameworks designed for humans cannot govern machines
Enterprise data governance was built around human data users. A human analyst requests access to a sensitive dataset. A data steward reviews the request, assesses the need, and approves or declines. The access is logged. The analyst uses the data within the boundaries of the approved request.
This model does not scale to AI. An AI system making tens of thousands of data reads per hour cannot be governed through human approval workflows. The volume alone makes it impossible. The non-determinism of AI query patterns makes it worse — the governance team cannot anticipate what the AI system will ask for, so pre-approving access is difficult.
The result is one of two failure modes. Either the AI system is given broad data access that bypasses governance controls — creating compliance, audit, and security exposure. Or it is given such restricted access that it cannot function effectively, which kills the use case.
Governance frameworks designed for agentic AI require a fundamentally different model: policy-based access control that evaluates agent permissions programmatically at query time, audit trails designed for high-volume machine activity rather than human activity, and anomaly detection that identifies when an agent is behaving outside its intended scope.
Building this is an infrastructure and policy design challenge that most governance teams have not yet faced.
Problem 6: Cloud data spend is growing faster than the value it delivers
Enterprise cloud data cost is the commercial problem that most data leaders are being asked to solve without the tools to solve it. Cloud data spend scales with data volume, compute usage, and query activity — and all three are growing faster than the business value being extracted from the investment.
The root causes are architectural. Data that should be deleted or archived is retained indefinitely because no one is accountable for the retention policy. Queries that should run on pre-aggregated summary tables are scanning raw data because the optimisation work was never done. Compute resources are provisioned for peak demand and left running during off-peak periods. Data pipelines replicate the same data to multiple destinations because the architecture grew tactically rather than by design.
AI makes the cost problem worse before it makes it better. Training jobs and inference workloads add significant compute demand. Vector databases require storage and retrieval infrastructure that has no equivalent in the traditional data warehouse world. And AI use cases often require the full-fidelity, high-frequency data that is most expensive to store and query.
Data leaders are addressing this through cost attribution — making visible which teams and use cases are generating which cloud costs — and through architectural consolidation: reducing the number of data copies, moving from wide-open query access to pre-optimised data products, and enforcing retention policies that were previously unenforced.
For the specific financial signals that indicate an architecture-driven cost problem, 7 signs your data architecture is costing you money identifies the most common patterns.
Problem 7: The data strategy is not tied to commercial outcomes
Data functions that struggle for budget and executive support almost always have the same underlying problem: their roadmap is not expressed in commercial terms. The team is delivering technically — pipelines are running, warehouses are populated, dashboards are live — but the connection between that work and the organisation's commercial performance is not visible to the executives who control the budget.
This is a presentation problem and a prioritisation problem simultaneously. On presentation: most data teams speak to executives in data terms — ingestion rates, query performance, coverage breadth — rather than in business terms: the specific commercial decisions that became faster, better, or cheaper as a result of the data investment. On prioritisation: when the data roadmap is built bottom-up by engineering capacity rather than top-down by commercial impact, the highest-leverage use cases do not get built first.
The fix requires making the commercial case for every item on the data roadmap — not as a retrospective justification, but as an upfront sequencing criterion. The use cases with the clearest commercial outcome and the lowest data readiness gap get built first. Everything else waits.
The CFO is the executive most useful in making this reframe stick, because they own the commercial metrics that the data strategy needs to connect to. How to get CFO buy-in for your AI data strategy covers the specific case to make and the asks that unlock cross-functional authority.
Problem 8: Legacy architecture cannot support autonomous AI agents
Agentic AI — AI systems that take autonomous actions rather than producing outputs for human review — imposes requirements on data infrastructure that are categorically different from anything enterprises have built for before. Real-time data access, complete and queryable data lineage, governance frameworks designed for machine-to-machine interaction, and query infrastructure that can handle non-deterministic access patterns at high volume.
Most enterprises are deploying agentic AI pilots on data architecture that was built for scheduled batch processing, human-interpreted dashboards, and pre-defined reporting queries. The pilots work because they operate on small, curated data sets. Production deployments fail — or produce quietly incorrect results — because they encounter the actual enterprise data landscape.
This is not a theoretical risk. It is the most common reason that agentic AI programmes move from "impressive pilot" to "unexplained delay." The delay is almost always data infrastructure. And the infrastructure gap is almost always discovered after the commercial case has been approved and the programme has started, because data readiness assessments are not typically part of the AI programme scoping process.
For a detailed technical breakdown, why your data architecture cannot support agentic AI covers each structural gap and the architectural properties required to close them.
What leading organisations are doing differently
The enterprises making genuine progress on AI share a pattern. They conducted a data readiness assessment before committing to a production AI programme — not after. They scoped the first agentic use case to the data domain where they had the highest existing governance maturity. They built the semantic layer and governance infrastructure in parallel with the AI development rather than treating it as a prerequisite that had to be complete before AI work could begin. And they got a commercial co-sponsor — almost always in finance or the CFO's office — who had the authority to enforce data standards across business units that did not report to the data function.
None of these moves require a complete data transformation before AI can begin. They require sequencing the AI programme to start where the data is already close to ready, delivering a visible commercial outcome from that starting point, and using that outcome to build the investment case for the broader infrastructure work.
Frequently Asked Questions
Is this a new set of problems or have they always existed?
Most of these problems have existed for years — data quality issues, governance gaps, cost inefficiency. What has changed is the consequence. When data problems affected dashboards and reports, they slowed decision-making. When data problems affect AI systems, they introduce automated errors at scale. The problems are the same. The cost of leaving them unresolved is categorically higher.
Which of the eight problems should data leaders prioritise?
It depends on your organisation's specific AI agenda. For organisations deploying agentic AI, problems 3 and 8 — infrastructure readiness and agentic architecture gap — are the technical prerequisites that block everything else. For organisations where AI is stalling at executive approval, problems 7 and 2 — commercial alignment and ownership — are the bottlenecks. Conducting a readiness assessment against your specific highest-priority use cases is the fastest way to identify which problems are blockers for you specifically.
How long does it take to address these problems structurally?
The governance and commercial alignment problems — 2, 5, 7 — can move significantly in three to six months with the right executive sponsorship. The infrastructure problems — 3, 6, 8 — are 12 to 18-month programmes for most enterprises starting from a batch-oriented architecture. The talent problem — 4 — is a 12 to 24-month investment. The good news is that you do not need to solve all eight before deploying AI. You need to solve the specific blockers for your specific use cases, in the order that matches your commercial priorities.
How do I make the case internally for addressing these problems?
The most effective framing is not "we have data problems" — it is "our AI programme has a defined technical prerequisite that we have not yet funded." Most enterprises have approved AI investment. The data readiness work is the enabling infrastructure for that investment. Presenting it as a dependency of an already-approved initiative, with a specific cost estimate and a defined commercial outcome, is more effective than presenting it as a standalone data quality programme.
What does a data architecture audit actually assess?
A structured data architecture audit evaluates your current environment against the specific requirements of your target AI use cases — data freshness, semantic consistency, governance maturity, lineage completeness, and query infrastructure readiness. It identifies which gaps are blockers for your first production deployment and which can be addressed incrementally. It produces a prioritised remediation plan and a commercial business case for the infrastructure investment required.
Our data architecture consulting team runs these assessments as a standard engagement entry point. If your AI programme is stalling and you want to understand why, the assessment gives you a specific diagnosis — not a generic data maturity framework, but an honest evaluation of whether your data foundation can support the specific AI outcomes your business is trying to deliver.
A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.
Book a Call →