A data infrastructure assessment is the evidence-based foundation for a data strategy. Without it, organisations make infrastructure investment decisions based on symptoms (dashboards are slow, data is stale) without understanding whether the symptoms indicate infrastructure problems, architecture problems, or governance problems. This guide covers how to conduct a rigorous assessment.
A data infrastructure assessment is a structured, evidence-based evaluation of the current state of an organisation's data infrastructure — the systems, pipelines, storage, and tooling that support analytical work. Its purpose is to produce a factual baseline that distinguishes between infrastructure problems, architecture problems, and governance problems before investment decisions are made.
Without an assessment, organisations invest based on symptoms: "dashboards are slow" leads to a server upgrade when the actual problem is an unoptimised extract. "We need a data warehouse" leads to a $200K platform purchase when a well-structured existing database would have been sufficient. "Our data quality is bad" leads to a data governance programme when the actual problem is a single poorly maintained source system.
What an Assessment Measures
A data infrastructure assessment has four measurement domains:
**Performance**: What is the actual performance of current systems? Dashboard load times under normal load. Extract refresh durations and failure rates. Query execution times against the warehouse. Backgrounder job completion rates. These are not perceived performance ("dashboards feel slow") but measured performance with specific numbers.
**Reliability**: How often do systems fail, and how quickly do they recover? Pipeline failure rates by pipeline. Extract failure rates and the lag between failure and detection. Mean time to detect and mean time to resolve for data incidents. Uptime percentage for BI tools and data warehouses.
**Utilisation**: What is actually being used? Dashboard view counts and unique user counts by workbook. Data source usage frequency. User licence utilisation rates. Compute utilisation patterns (when are resources maxed, when are they idle?). Storage utilisation and growth rate.
**Coverage**: Are there analytical questions the organisation needs to answer that the current infrastructure cannot support? Source systems with data that is not integrated. Metrics that are requested repeatedly but answered only via manual analysis. Time-to-answer for common analytical questions.
Data Collection Methods
Each measurement domain requires specific data collection methods:
**BI tool usage data**: Tableau Admin Views and the Tableau REST API provide workbook view counts, last-access dates, user activity, and published data source usage. This data requires API access and should be collected at the start of the assessment.
**Pipeline reliability data**: Orchestrator job history (Airflow, Prefect, Dagster) provides job run records with success/failure status and duration. In the absence of a formal orchestrator, cron job logs or email failure notifications may be the only available record.
**Data warehouse performance data**: Query history tables in Snowflake, BigQuery, and Redshift record all query executions with duration, bytes scanned, and execution status. Analysing the query history identifies the most expensive queries, the most frequently run queries, and the performance distribution.
**Stakeholder interviews**: Performance and coverage gaps are often only visible through structured conversations with data consumers. The CFO's team may know that a reconciliation they need takes 3 days manually because the analytics cannot produce it — this never appears in a query log. Schedule 45-minute interviews with the primary data consumer groups.
Analysis Framework
After data collection, the analysis framework classifies findings into three categories:
**Infrastructure findings**: Problems with the underlying systems and hardware — server capacity, storage, network bandwidth, license limits. Infrastructure findings typically require infrastructure investment to resolve.
**Architecture findings**: Problems with how data is structured, integrated, and modeled — missing source integrations, poor data modeling, excessive duplication, missing governance layers. Architecture findings require architectural work: redesigning pipelines, rebuilding data models, establishing governance structures.
**Governance findings**: Problems with how the analytical environment is operated and managed — undocumented data sources, uncertified content, unclear ownership, missing access controls, unmaintained content. Governance findings often require minimal infrastructure investment but significant operational change.
Most data infrastructure problems are architecture or governance problems misidentified as infrastructure problems. The assessment distinguishes between them.
Presenting Assessment Findings
Assessment findings should be presented with three components:
**Current state with evidence**: What the measurement shows, with specific numbers. "The top 10 dashboards have average load times of 24 seconds. This compares to the industry benchmark of 5 seconds for acceptable interactive analytics performance." Not "dashboards are slow."
**Impact quantification**: What the current state costs the organisation. "Finance spends an estimated 40 hours per month on manual reconciliation tasks that would be eliminated by a correctly integrated ERP source." Not "there are efficiency opportunities."
**Prioritised recommendations**: Specific changes with estimated effort and projected impact, in priority order. "Priority 1: Optimise the three largest extracts (Actions: filter each to the date range used by connected workbooks, estimated effort 2 days, projected benefit: extract refresh time reduced from 6 hours to 45 minutes, dashboard load times reduced from 24 seconds to under 5 seconds)."
The prioritised recommendation list is the output that drives investment decisions. It converts an assessment from an analytical exercise into an action plan.
Our data architecture practice conducts data infrastructure assessments as the foundation for data strategy engagements — contact us to discuss an assessment for your organisation.
A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.
Book a Call →