Data Platform Migration: How to Move from Legacy Systems to Cloud Without Breaking Analytics

A practical guide to data platform migrations — from on-premise data warehouses to cloud, from one cloud warehouse to another, from custom ETL to modern ELT. The migration phases, risk patterns, and the sequencing decisions that determine whether a migration succeeds or produces months of unstable analytics.

Data platform migrations are among the most disruptive projects a data team undertakes. They involve replacing infrastructure that the rest of the organisation depends on, migrating data that may have years of history, and rewriting pipelines and analytical workbooks that no one has touched since they were first built. When they go well, the organisation gains a modern, maintainable platform. When they go badly, analytics is unstable for months and trust in the data team erodes.

This guide covers the three most common migration scenarios — on-premise to cloud, warehouse to warehouse, and custom ETL to modern ELT — with the sequencing patterns and risk approaches that determine whether a migration succeeds.

The Three Migration Types

**On-premise to cloud warehouse migration:** Moving from an on-premise data warehouse (Teradata, Netezza, Oracle DW, SQL Server Analysis Services) or a self-managed database to a cloud warehouse (Snowflake, BigQuery, Redshift, Databricks). The challenges are: schema conversion (SQL dialects differ), data volume movement (petabyte-scale migrations take weeks), and the cultural shift from infrastructure-managed to cloud-managed.

**Cloud warehouse to cloud warehouse migration:** Moving between cloud platforms — Redshift to Snowflake, BigQuery to Snowflake, Snowflake to Databricks, or any other combination. Usually driven by cost, performance, or ecosystem reasons. Technically simpler than on-premise migrations (SQL is closer) but still requires schema conversion, data movement, and pipeline rewrites.

**Custom ETL to modern ELT migration:** Replacing custom-built data pipelines (Python scripts, stored procedures, bespoke ETL tools) with dbt + a cloud warehouse. The existing pipelines often have undocumented business logic and produce outputs that analysts depend on. The challenge is reverse-engineering the business logic well enough to reproduce it correctly in dbt.

Phase 1: Assessment

Before writing any migration code, understand what you are migrating. Skipping or rushing the assessment phase is the most common cause of migration failure — teams discover undocumented dependencies, critical pipelines, and business logic embedded in places they did not expect.

**Data inventory:** What tables exist? What is the row count, size, and age of each? Which tables have active downstream consumers (dashboards, reports, applications)? The Tableau Metadata API, Looker's content API, or manual stakeholder interviews identify which tables the business actually depends on.

**Pipeline inventory:** What processes write data to which tables? Are there scheduled jobs, triggers, stored procedures, or application writes? Document every source that populates each table. Legacy environments frequently have data written by multiple systems — some documented, some not.

**Business logic audit:** Where is business logic implemented? In stored procedures? In ETL transformation code? In Tableau calculated fields? In Excel spreadsheets that load into the warehouse? Business logic that lives outside the warehouse is the hardest to migrate — it is often undocumented and owned by individuals who may not be available during the migration.

**Dependencies:** Which dashboards, reports, applications, and data science models depend on which tables? These are the migration's success criteria. If the Salesforce revenue dashboard gives different numbers on the new platform than the old, the migration is not complete regardless of how technically correct the new platform is.

Phase 2: Design

**Schema mapping:** Document the translation from the source schema to the target schema. This includes column type mappings (Teradata BYTEINT to Snowflake NUMBER, Oracle DATE to Snowflake TIMESTAMP_NTZ), naming convention changes, and schema hierarchy changes (database/schema/table vs project/dataset/table in BigQuery).

**Validation strategy:** Define what "correct" means for each critical table. A validation query that computes row counts, null counts, and key business metrics on both old and new systems — and compares the results — is the core migration quality gate. Define acceptance thresholds: row count difference within 0.1%, revenue totals within 0.01%, etc.

**Cutover strategy:** Big bang vs parallel operation. Big bang cuts over everything simultaneously — simpler to manage but high risk if problems are discovered post-cutover. Parallel operation runs both old and new platforms simultaneously for a validation period, with the old platform remaining authoritative until validation passes. Parallel operation is operationally expensive (double the infrastructure cost, double the maintenance) but dramatically reduces risk.

For business-critical data platforms, parallel operation is almost always the right choice. The cost of running both platforms for 4–8 weeks is small compared to the cost of analytics instability if a big bang cutover encounters unexpected problems.

**Migration sequencing:** Not all tables are equal. Prioritise:

1. High-confidence, low-criticality tables first (internal operational tables, reference data, low-volume lookup tables)

2. Medium-criticality tables second (operational dashboards, department reports)

3. Business-critical tables last (revenue reporting, executive dashboards, external-facing data products)

This sequencing allows you to work out migration problems on lower-stakes tables before migrating the ones that cannot tolerate errors.

Phase 3: Execution

**Data movement:** For historical data, the most common approach is:

- Export from source to intermediate storage (S3, GCS, or Azure Blob)

- Load from intermediate storage to target using the warehouse's bulk load mechanism (Snowflake COPY INTO, BigQuery bq load, Redshift COPY)

For large migrations, cloud provider data transfer tools reduce time: AWS Database Migration Service, Google Database Migration Service, and Azure Database Migration Service handle ongoing replication (useful for parallel operation where source continues to receive writes).

**SQL conversion:** Expect 15–30% of SQL queries and stored procedures to require manual review after automated conversion. SQL dialects differ in date functions, string functions, aggregation window functions, and conditional logic. Automated conversion tools (Snowflake's SQL translation service, Google's BigQuery Migration Service) handle common patterns but require human review for edge cases.

**Pipeline rewrites:** ETL pipelines that extract from source, transform, and load to target need to be replicated in the new stack. If the migration includes moving from custom ETL to dbt, this is the opportunity to rewrite transformations as version-controlled, tested dbt models. Accept that this will take longer than simply porting existing code — it is also where most of the long-term value comes from.

Phase 4: Validation

Run validation queries comparing old and new platform outputs on the same data. The core validation set:

**Row count comparison per table:** If the new platform has fewer rows than the old, something was missed in migration. If it has more, there may be duplicate loading.

**Business metric comparison:** Sum revenue by month, count orders by status, compute key ratios — compare old and new platform results. Discrepancies surface logic errors in the migration.

**Referential integrity:** Do foreign keys in the new platform resolve correctly? A customer_id in fct_orders that does not exist in dim_customers indicates either a migration gap or different data quality standards between platforms.

**BI tool smoke testing:** Open the top 20 most-used dashboards on the new platform. Does every number look reasonable? Does anything look obviously wrong? This is a sanity check, not a full audit, but it catches category-level errors before cutover.

Phase 5: Cutover and Decommission

**Cutover communication:** Notify stakeholders of the cutover date with at least two weeks' notice. Identify the specific dashboards and data products that will change URLs, require re-authentication, or have temporarily different data during the cutover window.

**Cutover window:** Execute cutover during a low-activity period — weekend, end of month (after month-close reporting), or holiday period. A cutover that requires 4 hours of data team availability should not happen during business hours.

**Rollback plan:** Define the conditions under which you will roll back to the old platform. Set a time limit — if the new platform is not validated within 48 hours of cutover, automatically revert. Having a rollback plan that the team trusts reduces the pressure on the cutover and allows faster decision-making if problems occur.

**Decommission timing:** Do not decommission the old platform immediately after cutover. Run both for 30–90 days post-cutover — old platform in read-only mode — to allow late-discovered issues to be investigated against the source of truth. Decommission after the first full reporting cycle (typically month-end close) completes successfully on the new platform.

Common Migration Failure Patterns

**Migrating business logic without understanding it.** Stored procedures and custom ETL code often contain business rule exceptions, data quality fixes, and edge cases that are not obvious from reading the code. Not understanding these before migration produces a new platform that is technically correct but produces different numbers.

**Underestimating stakeholder impact.** BI tools need to be reconnected to new data sources. Report subscriptions need to be recreated. Embedded analytics in applications need connection string updates. Each of these touches a person who may resist the change. Plan stakeholder management as a migration work stream, not an afterthought.

**No validation gate.** Migrations that proceed without formal validation — comparing outputs between old and new platforms — discover problems post-cutover when the old platform is gone. Define validation acceptance criteria upfront and treat them as a hard prerequisite to cutover.

Our data architecture consulting practice has managed cloud data platform migrations from assessment through decommission — contact us to discuss your migration requirements.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →