Migrating to Snowflake: From Legacy Data Warehouse to Cloud-Native Analytics

Migrating from a legacy on-premises data warehouse — Teradata, SQL Server, Oracle — to Snowflake is a multi-phase programme covering data migration, SQL dialect conversion, ETL pipeline rewiring, BI tool reconnection, and user transition. This guide covers the full migration lifecycle, common blockers, and how to structure the programme to minimise business disruption.

Why Organisations Migrate to Snowflake

Legacy on-premises data warehouses — Teradata, Oracle, SQL Server, Netezza — were built for a world of predictable data volumes, fixed hardware, and batch ETL. They require significant DBA effort, scale vertically at high cost, and impose operational burdens (hardware maintenance, OS patching, licence management) that cloud-native platforms eliminate.

Snowflake's architecture separates compute and storage, scales elastically in seconds, and charges only for the compute used. Organisations migrate to eliminate on-premises infrastructure, reduce operational overhead, enable independent scaling of compute and storage, and adopt SQL-accessible semi-structured data handling that legacy warehouses lack.

Migration Programme Phases

A Snowflake migration is not a simple data copy — it is a multi-phase programme spanning data migration, SQL conversion, ETL rewiring, security model reconstruction, BI reconnection, and user transition. Treat it as a project with discrete phases, not a bulk lift-and-shift.

### Phase 1 — Assessment and Planning

Before writing any migration code, document what you are migrating:

**Data inventory**: Catalogue every table, view, procedure, function, and schema in the source warehouse. Use information_schema or system catalogue queries to extract counts — tables by schema, row counts, compressed sizes, active usage in last 90 days. Flag tables with no query activity in the last year as migration candidates for archival, not active migration.

**SQL dialect gap analysis**: Compare source dialect to Snowflake SQL. Common migration issues:

- Date functions: DATEADD, DATEDIFF syntax varies by platform; Snowflake uses interval syntax

- String functions: CHARINDEX vs POSITION; STUFF vs INSERT

- Window functions: Most standard windowing translates well; platform-specific extensions do not

- Stored procedures and UDFs: Teradata BTEQ, Oracle PL/SQL do not translate to Snowflake JavaScript/Snowflake Scripting directly — require rewrite

- Implicit data type conversions: Teradata is permissive; Snowflake is stricter

**ETL pipeline inventory**: Identify every pipeline loading data into the source warehouse. Each pipeline must be updated to point at Snowflake instead of the legacy platform. Scope the ETL rewiring as part of the migration.

**BI and reporting inventory**: List every Tableau, Power BI, Looker, and SSRS/Business Objects report connecting to the source warehouse. Each connection requires updating.

### Phase 2 — Environment Setup

Create the Snowflake environment before migrating data:

**Account and role hierarchy**: Define ACCOUNTADMIN (minimal human users), SYSADMIN, and custom functional roles. Snowflake role-based access control uses a role hierarchy — lower roles inherit permissions granted to parent roles. Design the role hierarchy to match your security requirements before granting permissions.

**Warehouse (compute) sizing**: Create virtual warehouses sized for different workloads — an ETL warehouse, a BI reporting warehouse, and a developer warehouse. Start small (XS or S); scale up after performance testing. Enable auto-suspend and auto-resume to eliminate idle cost.

**Database and schema structure**: Mirror the source warehouse's database/schema structure or use the migration as an opportunity to rationalise. Define a raw landing schema for ELT staging, a transformation schema for dbt models, and a marts schema for BI-facing tables.

**Network and security**: Configure network policies to restrict access to corporate IP ranges. Enable multi-factor authentication for all human users. Configure key-pair authentication for service accounts. Enable audit logging to the SNOWFLAKE.ACCOUNT_USAGE schema.

### Phase 3 — Data Migration

Move historical data from the source warehouse to Snowflake:

**Extract**: Export tables from the legacy warehouse to intermediate storage — S3 for AWS, GCS for GCP, or Azure Blob. Export in Parquet for most efficient loading; CSV as fallback. For Teradata, use FastExport or TPT. For Oracle, use external tables or Data Pump with Parquet export via OCI.

**Schema creation**: Create target tables in Snowflake. Map source data types to Snowflake equivalents. Use Snowflake Schema Conversion Accelerator or write CREATE TABLE scripts manually for control over data type choices.

**Load**: Use COPY INTO from S3/GCS/Azure Blob to Snowflake. COPY INTO is Snowflake's bulk loading command — massively parallel, reads Parquet/CSV files from object storage. For large tables (100M+ rows), partition the export and load in parallel across multiple COPY INTO calls.

**Validate**: Row count comparison between source and target. Checksum or statistical sampling on key columns. Date range validation for time-series tables.

### Phase 4 — SQL and Procedure Migration

Convert SQL objects that do not translate directly:

**View migration**: Many views translate with minimal changes. Run each view definition against Snowflake and fix syntax errors one at a time. Common fixes: function name changes, implicit type cast issues, platform-specific date functions.

**Stored procedure migration**: Snowflake supports JavaScript, Python, Java, Scala, and Snowflake Scripting (SQL) for stored procedures. Teradata BTEQ and Oracle PL/SQL must be rewritten. Prioritise which procedures are still actively used before investing migration effort.

**UDF migration**: Scalar functions translate to Snowflake UDFs in JavaScript, Python, or Java. Table functions become Snowflake UDTFs.

### Phase 5 — ETL Pipeline Rewiring

Update every pipeline that loads data into the warehouse:

**ELT tools** (Fivetran, Airbyte): Reconfigure destination from legacy warehouse to Snowflake. Update destination credentials and schema/database names. Test each connector's output in Snowflake.

**Custom ETL** (Python scripts, Airflow DAGs): Update connection strings from source warehouse drivers to Snowflake Python connector or JDBC. Test data type handling — Snowflake handles VARIANT (semi-structured) differently from most legacy platforms.

**dbt**: Update profiles.yml to point at Snowflake. Validate model SQL against Snowflake dialect. Run dbt debug and dbt run against the Snowflake environment before cutover.

### Phase 6 — BI Reconnection

Update BI tool connections to point at Snowflake:

**Tableau**: Remove the legacy warehouse data source connection, reconnect to Snowflake using the Tableau Snowflake native connector. Map fields that changed names or data types. Validate calculated fields for syntax differences.

**Power BI**: Update the data source in Power Query. Snowflake connector requires Snowflake ODBC driver on the gateway machine for on-premises gateway deployments.

**Looker**: Update connection in the Looker Admin panel. Validate LookML model SQL against Snowflake dialect.

### Phase 7 — Cutover

**Pre-cutover**: Run both legacy and Snowflake environments in parallel for at least 2-4 weeks. Validate that Snowflake results match legacy warehouse results for critical reports. Address any discrepancies.

**Cutover window**: Schedule the cutover for a low-traffic period. Update ETL pipeline destinations to point at Snowflake. Update BI tool connections. Communicate to users that the warehouse is changing.

**Post-cutover monitoring**: Monitor query error rates, refresh failures, and performance for the first 2 weeks. Have rollback plan (reverting ETL to legacy warehouse) available for the first 30 days if critical issues emerge.

Common Migration Blockers

- **Stored procedure rewrite scope**: Teradata BTEQ scripts with thousands of lines require significant rewrite effort. Audit before estimating.

- **Implicit type conversion reliance**: SQL that relies on platform-specific implicit conversions fails in Snowflake. Add explicit CASTs.

- **TIMESTAMP handling**: Snowflake stores TIMESTAMP_NTZ (no timezone), TIMESTAMP_LTZ (with local timezone), and TIMESTAMP_TZ (with stored timezone) separately. Legacy warehouses often store timestamps without timezone context. Define a standard and migrate consistently.

- **Legacy BI tool versions**: Old versions of Tableau or Power BI may not have updated Snowflake connectors. Validate connector support before starting the BI reconnection phase.

Our data architecture practice manages Snowflake migrations from legacy data warehouses — contact us to discuss your migration requirements.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →