Cloud Data Migration: Planning, Costs, and How to Avoid the Common Mistakes

Moving your data infrastructure to the cloud is a multi-month programme that most organisations underestimate. Here is how to plan it, what it costs, the phases that cannot be skipped, and the mistakes that push timelines from 6 months to 18.

The quick answer

Cloud data migration is not a lift-and-shift exercise. Moving an on-premise data warehouse or data pipeline infrastructure to the cloud while rebuilding it with cloud-native architecture takes 12–24 weeks for a mid-market organisation — longer if source data is complex or if the project is scoped incorrectly at the start. The organisations that complete migrations on time and on budget do three things consistently: they assess before they build, they run source and cloud environments in parallel until the cloud environment is validated under real load, and they scope data quality remediation as a first-class workload rather than discovering it mid-migration. The organisations that fail treat migration as a technical project rather than a data project.

What a cloud data migration actually involves

Cloud data migration means moving your data infrastructure — the pipelines, the storage, the compute, the transformation logic, and the BI layer — from on-premise or legacy hosting to a cloud-native platform. This is distinct from simply moving files to cloud storage. A true cloud migration redesigns the architecture for cloud-native patterns rather than replicating on-premise design at cloud prices.

The components typically migrated:

**Data storage** — from on-premise SQL Server, Oracle, or Teradata to cloud-native storage (Azure Data Lake Storage, S3, Google Cloud Storage) and cloud warehouses (Snowflake, BigQuery, Azure Synapse, Redshift).

**ETL/ELT pipelines** — from SSIS, Informatica, or custom scripts running on on-premise servers to cloud pipeline tools (Azure Data Factory, dbt, Fivetran, Airflow). This is almost never a 1:1 migration — cloud-native ELT patterns are architecturally different from on-premise ETL patterns.

**Transformation logic** — stored procedures, views, and custom SQL that live in the on-premise database need to be assessed, redesigned where appropriate, and migrated to dbt models or cloud-native equivalents.

**Orchestration** — scheduled jobs and dependency chains managed by SQL Server Agent, Control-M, or Autosys need to be migrated to cloud orchestration (Azure Data Factory, Apache Airflow, Prefect).

**BI connectivity** — Tableau, Power BI, and other BI tools connected to on-premise databases need their connections updated to point at the cloud environment, often with connection type changes (live connections to cloud warehouses vs extracts from on-premise).

The four phases of a cloud data migration

Phase 1: Assessment (3–5 weeks)

The assessment is not optional. Organisations that skip it to save time routinely discover mid-migration that source data quality is significantly worse than assumed, that transformation logic is undocumented and complex, or that the scope is 2–3x larger than the initial estimate. The assessment produces:

- Complete inventory of source systems, tables, volumes, and dependencies

- Data quality baseline — what the actual quality of source data is, not what it is assumed to be

- Transformation logic catalogue — every stored procedure, view, and custom SQL documented and assessed for cloud migration complexity

- Pipeline dependency map — what runs when, in what order, with what dependencies

- Estimated migration scope by complexity tier (simple lift, redesign required, complex redesign)

- Cloud architecture design — what the target state looks like on the chosen cloud platform

The assessment costs $15,000–$35,000 for a mid-market organisation. It is the best money spent in the migration — it prevents scope surprises that cost 5–10x more to fix mid-project.

Phase 2: Foundation build (4–8 weeks)

Before migrating any data, build the cloud infrastructure that data will migrate into:

- Cloud platform provisioning (Azure, AWS, GCP)

- Storage account and data lake setup with access controls and encryption

- Network connectivity between on-premise and cloud (ExpressRoute, Direct Connect, VPN)

- Identity and access management — service accounts, role-based access control, secrets management

- Development and staging environments — separate from production, used for pipeline development and testing

- Monitoring and alerting infrastructure — before pipelines run in production

This phase also includes the initial pipeline development for the first data domain — the simplest, highest-priority source systems — so that Phase 3 begins with proven patterns rather than untested architecture.

Phase 3: Phased migration (8–16 weeks)

Migration happens domain by domain, not all at once. The sequence: start with the simplest, least-risky data domains to prove the migration pattern, then progress to complex domains with the benefit of experience from the simpler ones.

For each domain:

1. Migrate source data to the cloud Bronze layer

2. Build Silver transformation (data quality, typing, deduplication)

3. Build Gold layer (business logic, data products)

4. Update BI connections to point at cloud

5. Run parallel operation — source and cloud simultaneously — validating that outputs match

6. Cutover: deprecate the on-premise pipeline for that domain

The parallel operation period is not optional. It is the only reliable way to validate that the cloud environment produces the same outputs as the on-premise environment under real load, with real data, including edge cases that testing does not catch. Cutting over without parallel operation produces production incidents.

Phase 4: Validation, cutover, and decommission (3–5 weeks)

Final validation: all domains migrated, all outputs reconciled against on-premise, all BI connections updated, all users tested. Performance validation under peak load. Governance review (access controls, data lineage, quality monitoring). Then cutover — the on-premise environment is demoted from primary to backup, and the cloud environment becomes the system of record. A 30-day window of dual operation, then decommission.

What cloud data migration costs

Migration cost is primarily driven by three factors: source complexity (number of source systems, volume of transformation logic, data quality issues requiring remediation), target platform (Snowflake vs Azure Synapse vs Databricks each have different implementation characteristics), and migration velocity (faster timelines require more parallel engineering resources).

**Assessment**: $15,000–$35,000. Fixed scope. Non-negotiable starting point.

**Foundation build**: $20,000–$50,000. Cloud infrastructure setup, networking, security, monitoring. Relatively predictable scope.

**Migration execution**: The largest variable. For a mid-market organisation with 5–10 data domains, 3–5 source systems, and moderate transformation complexity: $80,000–$200,000. For organisations with high transformation complexity (large volumes of stored procedures, complex business logic), legacy source systems (AS400, mainframe feeds), or data quality issues requiring remediation: $200,000–$500,000+.

**BI layer update**: Often underestimated. Updating Tableau or Power BI connections, validating dashboard outputs, retraining users on any new interfaces: $20,000–$60,000 depending on content volume.

The most important cost driver that organisations consistently underestimate: **data quality remediation**. Source data that has been tolerated in on-premise environments — duplicate records, referential integrity violations, inconsistent formats — must be addressed during migration. Discovering this mid-migration rather than during assessment typically adds 30–50% to the total cost and 4–8 weeks to the timeline.

The mistakes that derail migrations

**Skipping the assessment.** The most expensive mistake. A migration scoped without an assessment is scoped against assumptions about source data quality and transformation complexity that are almost always wrong. The mid-project scope expansion that follows is expensive, demoralising, and avoidable.

**Lift-and-shift of on-premise patterns.** Migrating SSIS packages to Azure Data Factory without redesigning them for cloud-native ELT patterns produces on-premise architecture at cloud prices. The result: cloud costs that are higher than on-premise, none of the scalability or elasticity benefits, and a platform that cannot support modern analytics workloads. Cloud migration is an opportunity to redesign, not just relocate.

**Cutting over without parallel operation.** Going live on the cloud environment without a parallel validation period means the first time real production load hits the cloud pipeline is also the first time you discover its failure modes. Every production incident in this scenario costs more than the parallel operation period would have.

**Treating BI as an afterthought.** The data platform migrates. The BI connections are updated. Users discover that dashboards look different, performance has changed, or data they relied on is not available in the same form. If the BI layer is not included in migration planning from the start, the last 10% of the migration takes 30% of the time.

**Underestimating data quality work.** Source data quality issues that were invisible in the on-premise environment — because analysts knew to work around them — become visible and blocking when automated pipelines try to process them. Data quality remediation should be scoped during the assessment and budgeted as a first-class workload, not treated as an afterthought.

**Not planning for cutover.** The transition from parallel operation to production cutover requires a detailed runbook: the sequence of steps, who does what, the rollback plan if something goes wrong, the communication to business stakeholders about the cutover window. Migrations that skip cutover planning go live with a wing and a prayer.

Choosing the right cloud platform for migration

The cloud platform choice for the migration target should be made during the assessment, based on your existing infrastructure, team skills, and workload requirements. Three primary options for most organisations:

**Azure (Synapse + ADLS + ADF + Databricks)** — the natural choice for organisations with existing Microsoft infrastructure (SQL Server, Active Directory, Office 365/M365). Azure Data Factory for orchestration, ADLS Gen2 for storage, Synapse Analytics or Databricks for compute. Strong if Power BI is your primary BI tool — the integration is tighter than with any other cloud platform.

**Snowflake + Azure/AWS/GCP** — Snowflake runs on any cloud platform. For organisations that want best-of-breed SQL warehouse capabilities independent of cloud provider, Snowflake is the most portable and operationally simple option. Pairs well with Fivetran for ingestion and dbt for transformation.

**Databricks + Azure/AWS/GCP** — for organisations with significant ML and data engineering requirements alongside BI analytics. More complex to operate than Snowflake but more capable for Python-heavy pipelines and ML workloads. See snowflake vs databricks for the detailed comparison.

For most mid-market migrations from SQL Server or Oracle on-premise: **Snowflake or Azure Synapse with dbt** is the fastest path to production at reasonable operational complexity. For organisations with GCP or AWS infrastructure: BigQuery or Redshift respectively.

How long does it take

A mid-market organisation migrating a SQL Server data warehouse with 5–8 data domains, 3–4 source systems, and moderate transformation complexity: **16–22 weeks** from assessment start to production cutover.

The timeline extends for:

- High transformation complexity (large stored procedure libraries): +4–8 weeks

- Data quality remediation required: +4–8 weeks

- More source systems (10+): +4–8 weeks

- Legacy source systems (AS400, mainframe): +4–8 weeks

- Large BI content library requiring connection updates: +2–4 weeks

The timeline compresses for:

- Simple source systems (SaaS tools with Fivetran connectors): −2–4 weeks

- Small data volume and simple transformation logic: −2–4 weeks

- Dedicated, experienced migration team (no competing priorities): −2–4 weeks

The most common cause of timeline extension is mid-project scope discovery — source data complexity that was not visible during a superficial assessment. A thorough assessment prevents this.

FAQs

Should we migrate all data domains at once or phase it?

Phase it. A phased migration — domain by domain — produces faster value (the first domain is in the cloud and validated before the last domain is started), lower risk (early domains prove the migration pattern before it is applied to complex domains), and clearer accountability (each domain has a defined cutover date). Big-bang migrations compress all risk into a single cutover event and have a higher failure rate.

Do we need to migrate our BI tools at the same time?

BI connection migration is part of the data migration — the two cannot be fully separated. However, BI connection updates for each domain can be done as part of that domain's cutover rather than at the end of the full migration. This distributes the BI migration work across the project rather than concentrating it at the end.

What happens to our on-premise environment after migration?

Typically: parallel operation for 30 days post-cutover (both environments running simultaneously), then the on-premise environment is decommissioned — licences cancelled, hardware repurposed or returned. Some organisations retain on-premise as a cold backup for 90 days post-cutover before full decommission. Few organisations need the on-premise environment beyond 90 days once the cloud environment is validated.

How do we handle data that cannot move to the cloud due to data residency requirements?

Data residency requirements — most common in healthcare (HIPAA), financial services, and government — restrict where certain data can be stored. Most major cloud providers offer region-specific storage that satisfies data residency requirements for specific countries or regulatory regimes. For data that genuinely cannot leave a physical location, a hybrid architecture is the answer: sensitive data remains on-premise, processed by cloud pipelines that reach into the on-premise environment via a secure gateway. This adds complexity and cost but is manageable.

What is the difference between a cloud migration and a cloud-native rebuild?

A migration moves existing architecture to the cloud, adapting it for cloud-native patterns where necessary but maintaining continuity of existing data products and outputs. A cloud-native rebuild starts from a clean design — often using a migration as the opportunity to rethink the data model, governance structure, and BI layer from first principles. Most mid-market migrations are primarily migrations with selective redesign for the domains with the most technical debt. Full rebuilds are appropriate when the on-premise architecture is so far from cloud-native patterns that migration would produce a poor cloud implementation.

Our cloud engineering practice has delivered cloud data migrations across Azure, Snowflake, and Databricks for mid-market organisations across financial services, healthcare, and professional services. If you are planning a cloud data migration and want an experienced view of what your specific environment will take, book a free 30-minute audit and we will give you a realistic scope and timeline based on what you actually have.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →