BlogCloud Engineering

ETL vs ELT: Which Pipeline Pattern Should You Use?

Austin Duncan
Austin Duncan
Managing Director & Principal Data Architect
·May 20, 20269 min read

ETL and ELT describe two different approaches to moving and transforming data. The right choice depends on your data volumes, transformation complexity, and cloud platform. Here is a practical breakdown of when each pattern fits and what drives the decision.

The quick answer

ETL (Extract, Transform, Load) transforms data before it lands in the destination. ELT (Extract, Load, Transform) moves raw data into the destination first, then transforms it there. The right choice for most organisations building on modern cloud data platforms is ELT — because cloud warehouses and lakehouses are powerful enough to run transformations efficiently after load, and ELT preserves raw data history in a way that ETL does not. But the distinction matters less than it once did, and there are specific situations where ETL remains the right pattern.

What ETL and ELT actually mean

Both patterns solve the same fundamental problem: data lives in source systems (CRM, ERP, application databases, event streams) and needs to be moved to a central data platform where it can be analysed.

**ETL** — Extract, Transform, Load — is the older pattern. Data is extracted from source systems, transformed (cleaned, reshaped, joined, aggregated) in an intermediate processing environment — historically a dedicated ETL server — and then loaded into the destination data warehouse in its final, analysis-ready form. The destination only ever sees transformed, structured data.

**ELT** — Extract, Load, Transform — inverts the transformation step. Data is extracted from source systems and loaded into the destination in raw form — full fidelity, minimal processing. Transformations then run inside the destination data platform, using its native compute engine (SQL in Snowflake, Spark in Databricks, BigQuery SQL in BigQuery). The destination holds both the raw source data and the transformed analytics-ready versions.

The practical difference is about where computation happens and when raw data is available.

Why ELT has become the dominant pattern

Three structural shifts have made ELT the preferred architecture for most modern data platforms:

**Cloud warehouse compute is cheap and elastic.** In the on-premise era, loading raw data into a data warehouse was expensive — warehouse storage and compute were costly, and processing raw data in the warehouse consumed capacity that was needed for analytics queries. This made pre-transformation (ETL) economically rational. Cloud warehouses on Snowflake, Databricks, and BigQuery scale compute independently of storage and charge for consumption rather than fixed capacity. Running transformations in the warehouse is now cost-effective.

**Raw data preservation has become a requirement.** Business requirements change. What looked unnecessary in the initial data model often turns out to be needed 18 months later. ETL pipelines that transform before loading discard raw data — if the transformation was wrong, or if requirements change, the source data is gone. ELT loads the raw data first, which means you always have the complete source record to re-transform when requirements evolve. The Bronze layer in medallion architecture is the ELT principle applied consistently: ingest everything, shape it later.

**dbt made in-warehouse transformation engineering-grade.** The historical argument for ETL was that in-warehouse SQL transformations were unmaintainable: no version control, no testing, no documentation, no lineage. dbt resolved this. dbt models are SQL SELECT statements with built-in testing, documentation, and dependency tracking — they behave like software, not like ad-hoc SQL scripts. Once in-warehouse transformation could be done with engineering discipline, the case for a separate ETL processing environment became much weaker.

When ETL is still the right choice

Despite the shift to ELT, ETL remains appropriate in specific situations:

**Sensitive data that must be masked before landing anywhere.** If your source systems contain PII, PHI, or financial data subject to strict access controls, and your data platform has broader access than the source, transforming (masking, tokenising, encrypting) before load ensures that raw sensitive data never enters the destination. For healthcare organisations handling unmasked patient data, or financial organisations handling unmasked account data, loading raw data first and transforming second creates a compliance window where unmasked data exists in the destination.

**Extremely large data volumes with significant reduction.** When source data is very large (multi-terabyte event streams) and the final analytics dataset is a small fraction of it (a 1% sample or an aggregated summary), pre-filtering and aggregating in an ETL layer before load can reduce destination storage and compute costs materially. This is less common than it once was — cloud storage is cheap — but for high-frequency IoT sensor data or clickstream data at very large scale, pre-aggregation before load remains economical.

**Legacy systems that cannot be touched by cloud credentials.** Some on-premise source systems have security architectures that prevent direct connection from cloud data platform tools. An ETL tool deployed on-premise can extract from the source, process locally, and push to the cloud destination. This is an architectural constraint rather than a preference, but it remains a real constraint in enterprise environments with strict network segmentation.

**Complex transformations that require procedural logic.** Highly complex transformations — multi-step stateful processing, complex rule engines, transformations that require calling external APIs mid-pipeline — are sometimes easier to implement in a full programming language (Python, Scala) in an ETL tool than in SQL inside a data warehouse. Apache Spark on Databricks blurs this line significantly, but there are edge cases where a dedicated ETL processing environment handles logic that SQL cannot.

The tools landscape

**ELT ingestion tools** handle the Extract and Load steps: moving data from source systems into your data platform in raw form. The dominant options:

- **Fivetran**: Fully managed connectors for 300+ data sources. High reliability, minimal engineering overhead, premium pricing. The right choice when connector coverage and managed operations matter more than cost.

- **Airbyte**: Open-source alternative to Fivetran with a large connector library. Can be self-hosted for cost control, or run as Airbyte Cloud. More engineering overhead than Fivetran, lower cost at scale.

- **Azure Data Factory**: Microsoft's managed data integration service. Native to the Azure ecosystem, strong for organisations already on ADF for other purposes. Broader capability than pure ELT tools — handles orchestration and some transformation logic.

- **Stitch**: Simpler, lower-cost Fivetran alternative for organisations with modest connector requirements.

**Transformation tools** handle the Transform step after data is loaded:

- **dbt**: The dominant in-warehouse transformation tool. SQL-based, with version control, testing, documentation, and lineage. Runs inside Snowflake, Databricks, BigQuery, Redshift, and others. The default choice for most organisations building a Silver/Gold layer.

- **Databricks Notebooks**: For transformations that require Python or Spark rather than SQL — complex ML feature engineering, stateful streaming transformations, or workloads that dbt cannot handle.

- **Stored procedures**: Legacy pattern. No version control, no testing, no lineage. Avoid for new development.

**ETL tools** (full extract-transform-load in one):

- **Informatica PowerCenter / IDMC**: Enterprise ETL platform, common in large financial services and healthcare organisations. High capability, high cost, significant operational overhead.

- **Talend**: Open-source and commercial ETL. Common in mid-market enterprise environments.

- **Azure Data Factory** with data flows: ADF's mapping data flows provide a visual ETL environment within the Azure ecosystem. Useful when transformation logic needs to be owned by non-engineers.

- **Alteryx**: Analytics and data preparation platform. Common in analytics-led organisations where business analysts own data preparation.

ELT in practice: the medallion architecture

The clearest application of ELT principles in modern data platform design is the medallion architecture (Bronze/Silver/Gold), which we cover in detail in what is a data lakehouse.

**Bronze layer** — the load step. Raw source data is ingested and stored with full fidelity. No transformations. No schema enforcement. The Bronze layer is the source of truth for what the source system actually sent, which is invaluable for debugging, replaying data, and handling schema changes.

**Silver layer** — the first transformation step. Data quality is applied: null handling, deduplication, type casting, schema enforcement, and light business logic. The Silver layer is clean, typed, and consistent, but not yet shaped for any specific analytics use case.

**Gold layer** — the analytics transformation step. Business logic is applied: joins across domains, metric calculations, aggregations, and the data product definitions that BI tools and analysts query. The Gold layer is the semantic layer materialised — the canonical definitions of revenue, customer, product, and every other business concept that needs to be consistent across tools.

This three-layer structure cleanly separates ingestion from transformation from presentation — which is exactly the separation that makes data platforms maintainable as requirements change.

Common mistakes when choosing between ETL and ELT

**Building ETL pipelines because that is what the team knows.** Many enterprise data teams have deep experience with Informatica, Talend, or SSIS. There is a natural tendency to apply familiar tools to new architectures. The result is on-premise ETL patterns implemented on cloud infrastructure — expensive, rigid, and disconnected from modern tooling. The cloud data platform investment should be accompanied by a tooling strategy that takes advantage of cloud-native transformation capabilities.

**Loading raw data into the warehouse without governance.** ELT's preservation of raw data is a feature — but an unmanaged Bronze layer that grows without retention policies, access controls, or clear ownership becomes a data swamp. The ELT pattern requires governance around the raw layer: what gets retained, for how long, who can access it, and how schema changes in the source are handled.

**Using the wrong tool for the workload.** dbt is excellent for SQL-based transformations in a data warehouse or lakehouse. It is not the right tool for stateful streaming transformations, complex ML feature engineering, or high-frequency event processing. Understanding where dbt ends and where Spark or a stream processing engine begins is important for designing a pipeline architecture that does not require engineering workarounds.

**Treating pipeline reliability as a monitoring problem.** Reliable pipelines require engineering discipline, not just better alerting. Idempotent design, data quality gates at ingestion, proper retry handling, and dead-letter queues for unprocessable records are engineering choices that prevent failures, not just detect them. See our detailed treatment in azure data architecture best practices.

FAQs

Is ETL dead?

No — ETL is still the right pattern in specific situations (PII masking before load, very large data reduction, on-premise source constraints). But ELT has become the default for most modern cloud data platform builds because cloud warehouse compute is cheap, raw data preservation matters, and tools like dbt have made in-warehouse transformation engineering-grade. If you are starting a new data platform project, your default should be ELT unless you have a specific reason for ETL.

Do we need both Fivetran and dbt?

Fivetran (or Airbyte) handles ingestion — Extract and Load. dbt handles transformation. They are complementary, not competing. Most modern data stacks use a dedicated ingestion tool alongside dbt: Fivetran/Airbyte loads raw data into Bronze, dbt transforms it into Silver and Gold. The combination is the standard ELT stack.

Can we use Azure Data Factory for everything?

ADF handles both ingestion and orchestration, and its mapping data flows provide ETL-style transformation capabilities. For Azure-first organisations, ADF is a reasonable choice for ingestion pipelines — especially when data sources are primarily other Azure services. For transformation logic, dbt is generally more maintainable than ADF data flows for SQL-based transformations. ADF and dbt are complementary: ADF orchestrates pipeline execution, dbt handles the transformation SQL.

What is the difference between a pipeline and a workflow?

A pipeline moves and transforms data — it is the code and infrastructure that ingests, processes, and loads data from source to destination. A workflow is the orchestration layer that schedules and coordinates pipeline execution: when does the Bronze layer refresh? What is the dependency between the Silver transform and the Gold materialisation? Tools like Apache Airflow, Prefect, and Azure Data Factory handle workflow orchestration. The pipeline is the work; the workflow is the scheduling and dependency management.

How do we handle schema changes in source systems?

ELT with a Bronze raw layer is more resilient to source schema changes than ETL, because raw data is stored as-is before any schema is enforced. When a source adds a column, the Bronze layer stores it automatically; the Silver transformation handles the schema change when the dbt model is updated. For critical schema changes that would break downstream transforms, schema evolution tooling in Delta Lake and Iceberg provides explicit handling. The worst outcome — a pipeline that fails silently and loads wrong data — is prevented by data quality tests at the Silver layer that catch unexpected schema changes before they propagate to Gold.

Our cloud engineering practice designs and builds ELT pipelines on Azure Data Factory, Fivetran, Airbyte, and dbt across Snowflake, Databricks, and BigQuery environments. If your pipelines are unreliable, your data quality is inconsistent, or you are planning a data platform build and want an experienced view on the right architecture, book a free 30-minute audit and we will tell you directly.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →