BlogData Architecture

What Is a Data Warehouse Migration? Planning and Executing the Move

Austin Duncan
Austin Duncan
Project Manager & Data Strategist
·April 26, 202812 min read

A data warehouse migration moves an organization's analytical data environment from one platform to another — from on-premise to cloud, or between cloud warehouses. This guide explains the key phases of a warehouse migration, the common failure points, and the architectural decisions that determine whether the migration delivers its intended value.

A data warehouse migration moves an organization's analytical data environment from one platform to another: from an on-premise warehouse (Teradata, Netezza, SQL Server Analysis Services) to a cloud warehouse (Snowflake, BigQuery, Redshift), or between cloud platforms. Most migrations are driven by one of three motivations: cost reduction (cloud storage and compute is cheaper than on-premise hardware), capability gap (the legacy platform cannot support required workloads), or strategic consolidation (reducing the number of platforms).

Migrations are among the highest-risk and highest-effort projects in data engineering. The combination of large data volumes, complex transformation logic, many dependent downstream systems, and often poorly documented legacy environments makes them genuinely difficult. Understanding the phases and common failure points is prerequisite to planning one successfully.

The Phases of a Warehouse Migration

Phase 1: Assessment and Discovery

Before committing to a migration, the first phase documents what exists: every table (schema, row counts, data types), every transformation pipeline (ETL jobs, stored procedures, views), every downstream consumer (reports, dashboards, data extracts, downstream databases, API calls), and every connection (data sources, credentials, access patterns).

In legacy warehouses, this documentation often does not exist or is severely outdated. Modern tooling helps: the Snowflake, Redshift, and BigQuery migration tools include automated discovery capabilities. Third-party profiling tools extract schema metadata, view definitions, and stored procedure code from source systems.

The output of assessment is a migration inventory: a complete list of objects to migrate, their dependencies, and their priority. Identifying dependencies before migration is critical — migrating a table without migrating its source views or its dependent reports is a common cause of migration failure.

Phase 2: Architecture Design

The legacy architecture does not automatically map to the target architecture. On-premise warehouses were often designed around hardware constraints (compress data to fit on expensive storage, avoid table scans) that do not apply to cloud warehouses. Migrating a Teradata schema column-for-column to Snowflake often produces a warehouse that runs slowly and costs more than the legacy system.

Architecture design during migration is an opportunity to redesign:

- Normalize the schema — if the legacy warehouse has denormalized wide tables for performance reasons that no longer apply, a star schema or ELT-friendly structure may be more appropriate

- Redesign ETL logic — legacy ETL jobs written in Informatica or SSIS may be better implemented as dbt models

- Evaluate partitioning and clustering — cloud warehouses have different partitioning semantics than legacy systems; copying the partitioning strategy directly is often wrong

- Plan for SCD handling — if the legacy system implemented SCDs in custom ETL logic, this is a chance to use a standard dbt snapshot approach

Phase 3: Iterative Migration

Migrating everything at once is the highest-risk approach. The recommended pattern is iterative migration by workload:

- Identify the lowest-risk workload (least critical to the business, smallest number of dependencies, simplest transformation logic)

- Migrate that workload to the new platform while the legacy system remains operational

- Validate results (data quality comparison between legacy and new)

- Run both in parallel for a defined period to ensure consistency

- Decommission the legacy workload for that scope

- Repeat for the next workload

The parallel run period is where most migration issues are caught. A table that produces different results on the new platform than the old is caught during parallel run, not by an analyst who notices a wrong number in a report three months after migration.

Phase 4: Validation

Validation is the most under-resourced phase of most migrations. The business must sign off that the data in the new platform is correct before the old system is decommissioned. This requires:

- Row count comparison at each table

- Aggregate comparison for key metrics (total revenue, total users, aggregate values that business stakeholders can verify)

- Spot-check comparison for specific records

- Regression testing for key reports — running the same report in both systems and comparing output

- Data type and precision validation (floating point precision differences, date handling differences between platforms)

Automated validation tooling (like dbt tests configured to run on both source and target, or custom comparison queries) reduces the manual effort of validation and enables re-running validation after data loads.

Phase 5: Cutover

The final cutover from legacy to new platform requires:

- Freeze on the legacy system (no new development, no schema changes)

- Final data sync to ensure the target is current

- Switching all downstream consumers (BI tools, pipelines, APIs) to the new system

- Monitoring for the first hours and days after cutover

Cutover planning should include a rollback procedure: if critical issues are discovered immediately after cutover, what is the path back to the legacy system? This is not always feasible for large migrations, but for medium-scale migrations, having a documented rollback plan reduces risk.

Common Migration Failure Points

**Undocumented dependencies** — a report that was assumed to be decommissioned was actually still used by the finance team for the quarterly close. Discovered after the legacy system is shut down.

**Transformation logic differences** — a Teradata OREPLACE() or Netezza-specific function has subtly different behavior than the Snowflake equivalent, producing different results for edge cases that are only discovered in production.

**Timezone handling** — different warehouses have different default timestamp behavior. A migration from Oracle (which stores timestamps in UTC by default) to a warehouse configured for local timezone can silently shift all historical timestamps.

**Performance regression** — the migrated queries run slower on the new platform than expected because the schema was copied directly without optimizing for the new platform's characteristics.

**Missing historical data** — the legacy system had a data retention policy that purged records older than 3 years. The migration only captured what was there; historical analyses expected to go back 5 years now cannot.

**Testing shortcuts** — parallel run is skipped or shortened to meet a deadline. Issues that would have been caught are discovered by users.

Cloud-to-Cloud Migrations

Migrations between cloud warehouses (Redshift to Snowflake, BigQuery to Snowflake, Snowflake to Databricks) have different characteristics than on-premise-to-cloud:

- Data is already in cloud storage; the movement cost is lower

- Modern ETL/ELT tools (Fivetran, dbt) are already in use and are largely platform-agnostic

- SQL dialects differ (BigQuery uses ARRAY_AGG, Snowflake uses LISTAGG in some contexts) but the differences are narrower than Teradata vs cloud

- The primary effort is in reconfiguring ingestion pipelines, rewriting any warehouse-specific SQL, and re-testing all reports

Our data architecture practice plans and executes data warehouse migrations including on-premise to cloud and cloud-to-cloud — contact us to discuss your migration requirements and timeline.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →