BlogCloud Engineering

Data Warehouse Migration: How to Plan It, Execute It, and Not Break Production

Austin Duncan
Austin Duncan
Managing Director & Principal Data Architect
·June 15, 202612 min read

Migrating a data warehouse — from on-premise to cloud, between cloud platforms, or from a legacy warehouse to a modern one — is one of the most operationally complex data projects an organisation runs. Here is the methodology that prevents the common failures.

The quick answer

A data warehouse migration — whether from on-premise SQL Server to Snowflake, from a legacy Oracle data warehouse to BigQuery, or from one cloud warehouse to another — is one of the most operationally complex projects a data team runs. The technical work (schema translation, data movement, query rewriting) is manageable. What breaks most migrations: inadequate assessment before starting, underestimating the complexity of dependent BI and reporting layers, insufficient parallel operation before cutover, and not having a rollback plan.

The migrations that go well are the ones that treat migration as a programme, not a project. There are phases, gates, parallel operation periods, and a deliberate cutover — not a "we moved the data over the weekend and told users it was live on Monday."

Phase 1: Assessment (4–8 weeks)

The assessment phase must happen before any migration work begins. Its purpose is to understand what you are migrating, what dependencies it has, what the target state should look like, and how long it will actually take. Organisations that skip or rush the assessment underestimate scope, encounter surprises mid-migration, and blow timelines.

**Source environment inventory**: catalogue every table, schema, and database in the source warehouse. How many tables? What sizes? What are the most-queried tables? What is the total data volume? What is the rate of data change (important for the data movement strategy)? For Tableau-specific migrations from a Tableau Server environment, see Tableau Cloud migration cost for comparable assessment scope.

**Query and workload analysis**: identify the most resource-intensive queries, the most frequently executed query patterns, and any queries that rely on platform-specific syntax (stored procedures, platform-specific functions, proprietary SQL extensions). These are migration risks — SQL that runs in Oracle or SQL Server may not run in Snowflake or BigQuery without rewriting.

**Dependency mapping**: what reads from this warehouse? BI tools (Tableau, Power BI, Looker), application databases (reporting queries from operational systems), ETL pipelines (which may both write to and read from the warehouse), APIs (microservices that query the warehouse), export jobs (scheduled reports delivered via email or to file systems). Every dependency must be inventoried and planned for — migration without dependency mapping produces a migrated warehouse with broken BI tools and failing pipelines.

**Data quality assessment**: what is the current state of data quality in the source warehouse? Schemas with null constraints that are not enforced, foreign keys that have referential integrity violations, data type inconsistencies between tables. These quality issues do not disappear in the target warehouse — they must be addressed as part of migration or they become the target warehouse's inherited technical debt.

**Target architecture design**: what should the target state look like? This is not just "the same schema in a different warehouse." Migration is the opportunity to rationalise the schema — remove deprecated tables, consolidate redundant structures, modernise the data model. The target architecture should be designed by a data architect, not by a database administrator running schema export scripts.

Phase 2: Infrastructure and foundation build (4–6 weeks)

Before migrating data, build the target environment infrastructure.

**Warehouse provisioning**: set up the target cloud warehouse (Snowflake account, BigQuery dataset configuration, or equivalent). Configure virtual warehouses (Snowflake) or reservation slots (BigQuery). Set up dev, staging, and production environments — migrations require at least a dev environment for development and testing and a production environment for cutover.

**Access and security configuration**: set up identity and access management for the new warehouse. Integrate with your organisation's identity provider (Azure AD, Okta, Google Workspace). Define service account roles for ingestion pipelines, transformation jobs, and BI tool connections. Apply column-level masking for PII fields if required by compliance requirements.

**Monitoring and cost governance**: before loading data, configure cost governance (Snowflake Resource Monitors, BigQuery budget alerts). Unconstrained cloud warehouse compute with large data loads can produce significant unexpected costs.

**Ingestion pipeline setup**: configure the ingestion pipeline for the initial load and ongoing CDC (Change Data Capture) or incremental sync. For historical data migration, bulk-load tools (Snowflake COPY INTO, BigQuery bq load, Data Migration Service for BigQuery) handle the initial load. For ongoing sync, the same ingestion tools used post-migration (Fivetran, ADF, Airbyte) handle incremental updates.

Phase 3: Schema translation and SQL rewriting (4–8 weeks, parallel with phase 2)

This is often the most technically demanding phase — and the one most commonly underestimated.

**DDL translation**: translate CREATE TABLE statements from source to target syntax. Data type mapping between platforms is almost never one-to-one: Oracle VARCHAR2 to Snowflake VARCHAR, SQL Server DATETIME to Snowflake TIMESTAMP_NTZ, Oracle NUMBER to Snowflake NUMBER or FLOAT depending on precision requirements. Automated translation tools (like AWS Schema Conversion Tool or Snowflake's conversion tool for SQL Server/Oracle) handle 70–80% of the work; the remaining 20–30% requires manual review for platform-specific constructs.

**Stored procedure and function rewriting**: this is where most migration SQL complexity lives. Stored procedures written for SQL Server T-SQL or Oracle PL/SQL do not run in Snowflake or BigQuery. Each procedure must be evaluated: can the logic be replicated in the target platform's procedural SQL? Should it be moved to the application layer or to the dbt transformation layer? Can the procedure be eliminated if the transformation layer design is modernised?

**ETL and pipeline SQL rewriting**: SQL in ETL pipelines (SSIS, Informatica, ADF activities) must be reviewed for platform-specific syntax and rewritten for the target platform. Test each rewritten query against the target warehouse before declaring it migrated.

**Query regression testing**: for the most critical queries (the ones that feed production reports), run the original and rewritten versions against representative data and validate that results match. Differences in NULL handling, date arithmetic, string functions, and aggregation behaviour between platforms are subtle and produce wrong results that look correct without side-by-side comparison.

Phase 4: Data migration and validation (4–8 weeks)

**Initial data load**: load historical data from source to target. For large datasets (terabytes), this is a bulk load operation — exporting from source to cloud storage (S3, ADLS Gen2, GCS) and loading from there into the target warehouse. Not a live replication. Plan for the time this takes: loading 10TB into Snowflake from S3 with parallel COPY INTO typically takes hours, not minutes.

**Data validation**: for every migrated table, validate that the target contains the correct data:

- Row count match: target row count equals source row count (accounting for any deduplication applied in migration)

- Aggregate validation: SUM of key numeric columns matches between source and target for each time period (daily, weekly, monthly)

- Null rate comparison: null rates for key columns should match between source and target

- Key distribution: the distribution of foreign key values should match, confirming no records were dropped or duplicated

Automated validation scripts running these checks across all tables provide the audit evidence needed before cutover.

**CDC and incremental validation**: after the initial load, configure ongoing data sync from the source (CDC or incremental extraction) to keep the target current during the parallel operation period. Validate that incremental loads are correctly capturing changes.

Phase 5: BI and application migration (6–12 weeks, parallel with phase 4)

BI and application migration is frequently the longest phase and the one that most commonly delays cutover. Plan for it accordingly.

**BI tool migration**: update Tableau, Power BI, or Looker connections from source warehouse to target warehouse. For Tableau, this means updating data source connections and validating that published data sources and workbooks produce correct outputs after the connection change. For Power BI, update dataset connections and validate query results. For Looker, update LookML database connection and validate explores.

Run every critical dashboard against both source and target warehouse simultaneously and compare outputs. Any discrepancy is a query translation issue that must be resolved before cutover. Do not assume that "connection changed, workbook still opens" means the data is correct.

**Application migration**: update any application connection strings that point to the source warehouse. Test applications against the target warehouse in a staging environment before production cutover.

**Performance testing**: validate that critical queries and dashboard loads meet performance requirements on the target warehouse. If performance is not acceptable, this is the phase to optimise — adding clustering keys, materialising intermediate tables, adjusting warehouse sizing — before the production cutover.

Phase 6: Parallel operation and cutover (4–6 weeks)

Parallel operation means running source and target warehouses simultaneously, with production users accessing the source warehouse while the target warehouse is running and being validated. This phase catches issues that did not surface in testing.

**Parallel operation period**: the minimum recommended parallel operation period is 4 weeks — two full monthly close cycles for financial data, at least two reporting cycles for operational data. During parallel operation, the data team actively monitors and compares source and target outputs.

**Cutover planning**: define the cutover window precisely — the day and time when production systems switch from source to target. This is typically a low-traffic window (weekend, end of month after reporting closes). Define the rollback procedure: if the cutover is executed and critical failures emerge, what is the process to roll back to the source warehouse?

**Cutover execution**: at the defined window, update all production connections to the target warehouse, validate that production systems are operating correctly, and monitor for failures. Keep the source warehouse available (read-only, no new writes) for at least 2–4 weeks post-cutover as a fallback reference.

Common failure modes

**Skipping the assessment**: underestimating scope, missing dependencies, discovering stored procedure complexity after the migration is underway. Non-negotiable: do the assessment.

**Lift-and-shift without modernisation**: migrating the schema exactly as it is in the source without redesigning for the target platform's characteristics. A normalised OLTP-style schema migrated verbatim to Snowflake performs poorly and carries the same architectural debt into the new environment. Migration is the opportunity to modernise.

**Cutting BI migration time**: BI and reporting layer migration is consistently allocated insufficient time. Every dashboard must be validated against the new warehouse, not just reconnected. This takes time.

**No parallel operation**: cutting directly from source to target without a parallel operation period produces surprises in production that could have been caught in validation. Always run parallel operation.

**Not planning rollback**: if production cutover fails and the rollback procedure is "restore from backup and spend a week reconnecting everything," the migration has a material risk of extended outage. Define the rollback procedure before the cutover.

For the cost structure of cloud data migrations, see cloud data migration guide. For the assessment scope that feeds a migration plan, see data architecture consulting cost.

Our cloud engineering and data architecture consulting practices manage data warehouse migrations from assessment through cutover. If you are planning a warehouse migration or evaluating whether migration is the right investment for your situation, book a free 30-minute audit.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →