BlogData Architecture

Migrating Your Data Platform to the Cloud: A Practical Approach

Austin Duncan
Austin Duncan
Managing Director & Principal Data Architect
·February 28, 202712 min read

How to migrate an on-premise data warehouse or BI environment to the cloud without disrupting analytical operations — the migration approach selection, the parallel run strategy, how to handle legacy ETL, and the common failure modes that turn data cloud migrations into multi-year projects.

Migrating a data warehouse or analytics environment to the cloud is one of the highest-risk data platform initiatives an organisation can undertake. The risk comes not from the cloud itself — cloud data warehouses are mature and reliable — but from the migration process: moving data that business operations depend on while maintaining continuity, validating that the migrated data is correct, and retiring the old system cleanly.

Most data cloud migrations take 2–3x longer than initially estimated. Understanding why, and designing the migration process to avoid the common failure modes, is what separates migrations that complete on schedule from ones that linger for years.

Choosing the Migration Approach

Three approaches, each with different risk and complexity profiles:

### Big Bang Migration

The entire system migrates on a single cutover date. The old system runs until day X; the new system goes live on day X+1; the old system is retired.

**When it works:** Small environments (under 50 data sources, under 20 dashboards), short migration timelines (under 3 months), where the business can tolerate a planned outage for the cutover.

**Why it fails:** The scope expands during migration (you discover sources you did not know existed); the validation takes longer than expected; the cutover date becomes a forcing function that drives compromises (migrating content that is not fully validated because the deadline cannot slip).

### Phased Migration

Migrate in prioritised batches: the most critical business data first, then secondary sources, then the long tail of rarely-used legacy content.

The approach:

1. Phase 1: Core financial and operational data (highest business value, highest scrutiny)

2. Phase 2: Commercial analytics (CRM, revenue, pipeline)

3. Phase 3: Product analytics and marketing data

4. Phase 4: Legacy and low-usage content

Each phase runs as a parallel migration — the new platform carries the migrated sources while the old platform continues running everything until each phase's content is validated and cut over.

**Why this is the right approach for most organisations:** It reduces the risk of each individual migration, allows the team to build expertise on the new platform before migrating the most complex sources, and provides business continuity throughout.

### Lift-and-Shift vs. Rebuild

Within the phased approach, two options for each source:

**Lift-and-shift:** Migrate the existing ETL and data models as-is to the new platform. Faster, but perpetuates technical debt and may not take advantage of the new platform's capabilities.

**Rebuild:** Redesign the ingestion and transformation layer for the new platform. Slower, but produces a better outcome and eliminates the legacy debt.

For most migrations, the pragmatic approach is lift-and-shift for the first phase (get the critical data onto the new platform quickly) and then rebuild progressively (replace lifted ETL with managed connectors and dbt models) as part of a post-migration modernisation programme.

The Parallel Run Strategy

The most effective validation approach for data migrations is the parallel run: both the old and new systems produce the same outputs simultaneously, and the outputs are compared systematically.

How it works:

1. Migrate a source to the new platform

2. Run both old and new pipelines in parallel for 2–4 weeks

3. Compare outputs at the report level: do the migrated dashboards produce the same numbers as the original?

4. Investigate and resolve discrepancies

5. When validation passes, cut over users to the new system

The parallel run catches the subtle data migration issues that automated testing alone does not catch: business logic that was implemented differently in the old ETL vs the new dbt model, timezone handling differences, rounding differences in aggregations, and calculation differences that only surface for specific data segments.

**Parallel run duration:** 2 weeks minimum for stable data. 4–6 weeks for financial reporting cycles where month-end and quarter-end results need to match exactly. Some organisations run parallel for a full financial quarter before cutting over critical financial reporting.

Common Failure Modes

### Undiscovered Data Sources

Legacy data environments routinely have sources that are not in any documentation: an analyst who built a personal Excel-to-database integration 5 years ago, a developer who added a database write to an application without telling the data team, a vendor who started sending data feeds 3 years ago and nobody remembers why.

**Mitigation:** Before writing a single line of migration code, conduct a full source audit. Query the warehouse's lineage tracking (if it exists), interview every team that uses the data platform, review every ETL job and scheduled task in the system, and document every source. Budget for discovering at least 30% more sources than your initial inventory shows.

### Calculation Discrepancies That Take Weeks to Investigate

Migrated numbers that do not match the original are not always evidence of a migration bug — sometimes they reveal that the original calculation was wrong. A revenue calculation that produces different results after migration may be showing correctly migrated data that surfaces a bug in the original ETL.

These investigations take significant time and require deep knowledge of both the source system and the business logic. Budget investigation time explicitly — for a large financial reporting migration, 30–50% of the migration effort is validation and discrepancy investigation.

### BI Tool Migration Complexity

Migrating the data warehouse is the first half of the job. Migrating the BI layer — Tableau workbooks, Power BI reports, Looker dashboards — to use the new data sources is the second half, and it is often where the timeline slips.

BI content that was connected to the old data source needs to be updated to connect to the new source, verify that calculated fields still work correctly with the new schema, and be tested by the content's primary users. For large BI environments with hundreds of workbooks, this work is substantial.

**Mitigation:** Prioritise BI migration alongside data migration, not as a sequential phase after. Identify the BI content dependencies for each data source being migrated and migrate both together.

### The Legacy System That Will Not Die

The target state is the old system retired. The failure mode is the new system live but the old system still running — because one dashboard still uses it, or one analyst prefers it, or one edge case was never fully migrated.

Old systems that are not retired continue to cost money, continue to require maintenance, and create confusion about which system is authoritative. Establish a hard retirement date for each component of the old system before the migration begins. The retirement date creates accountability for completing the migration rather than leaving legacy systems running indefinitely.

Cloud Provider and Warehouse Selection

If the migration involves selecting a new cloud data warehouse, the selection should happen before the migration design, not during it.

The migration effort is largely independent of which cloud warehouse you choose — the approach (phased, parallel run, rebuild vs lift-and-shift) applies to Snowflake, BigQuery, and Redshift equally. The warehouse selection should be driven by the organisation's technical requirements, existing cloud relationships, and the criteria in the Snowflake vs BigQuery vs Databricks comparison guide.

Avoid selecting the warehouse based on a vendor demo or benchmark results for workloads that do not match yours. Select based on: how well the warehouse fits your query patterns, which cloud provider your other infrastructure lives in, and the total cost of ownership over 3 years at your expected data volume.

Post-Migration: The Modernisation Phase

The migration gets you to the cloud. The modernisation is where you capture the actual value of the new platform.

Most migrations initially lift-and-shift legacy ETL and data models. Post-migration, the programme should incrementally replace:

- Legacy ETL with managed connectors (Fivetran/Airbyte)

- Hand-rolled SQL pipelines with tested dbt models

- Undocumented tribal knowledge with documented, governed data models

- Point-to-point data connections between applications with a proper data warehouse architecture

The modernisation programme is a continuous improvement initiative, not a one-time project. The most effective approach is to prioritise modernisation work on the highest-usage data assets first — the ones that would benefit most from improved reliability, documentation, and testing.

Our data architecture consulting practice manages data cloud migrations — contact us to discuss migration planning and execution for your environment.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →