ETL — extract, transform, load — was the standard data integration pattern for decades. ELT — extract, load, transform — has replaced it in most modern data stacks. The shift is not just a letter swap; it reflects a fundamental change in where and how data transformation happens, with significant implications for architecture, tooling, and team skills.
ETL — extract, transform, load — describes a data integration pattern where data is extracted from source systems, transformed into the target format by a separate transformation engine, and then loaded into the data warehouse. The transformation step happens outside the warehouse: in a dedicated ETL tool (Informatica, Talend, SSIS), in a custom-built transformation application, or in an intermediate staging environment.
ELT — extract, load, transform — inverts the order of the last two steps. Data is extracted from source systems and loaded directly into the data warehouse in raw or lightly processed form. Transformation happens inside the warehouse using its own SQL engine.
This is not a trivial reordering. The shift reflects a fundamental change in where computational power is available and how data teams prefer to work.
Why ETL Was the Standard
ETL emerged when data warehouses were expensive, proprietary, and limited in compute capacity. Processing large data volumes inside an Oracle or Teradata warehouse was costly — both in licence fees and in compute resources that could not be dynamically scaled.
The ETL pattern made economic sense in this context: do as much transformation as possible outside the expensive warehouse, load only the final clean result, and minimise the work the warehouse needs to do. Dedicated ETL servers were cheaper to run for preprocessing than warehouse compute.
ETL tools provided GUI-based transformation configuration, connector libraries for common source systems, and some level of abstraction for transformation logic. For non-programmers managing data integration, these tools were accessible in a way that writing SQL or code was not.
The limitation of ETL: transformation logic defined in proprietary ETL tool formats (visual flows, XML configurations) is difficult to version control, test, and review. When logic is buried in a proprietary GUI, it is opaque to engineers who prefer code, and typically requires specialised knowledge of the tool to maintain.
Why ELT Replaced ETL
Several factors converged to make ELT the dominant pattern for modern data stacks:
**Cloud data warehouse scalability**: Snowflake, BigQuery, and Redshift separate storage from compute and scale elastically. Running a complex transformation in BigQuery costs money proportional to data scanned, but the compute is abundant and cheap compared to running the same transformation in a dedicated ETL server. The economic argument for keeping transformation outside the warehouse evaporated.
**SQL as a first-class transformation language**: dbt (data build tool) established the pattern of writing transformations as SQL SELECT statements, version-controlling them in git, testing them with schema and data quality tests, and running them against the warehouse. SQL is the language data practitioners already know; dbt makes it a sufficient tool for the full transformation layer.
**Git-native workflows**: ELT with dbt means transformation logic lives in SQL files in a git repository. Every change is reviewed, version-controlled, and reversible. This matches how software engineering teams prefer to work and is fundamentally different from visual ETL tools where logic is stored in database tables or proprietary file formats.
**Raw data preservation**: Loading raw data first means the complete source record is always available. If a transformation logic changes or a requirement is discovered retrospectively, historical data can be reprocessed from the raw layer. ETL that transforms before loading discards source data that the warehouse never sees.
The Modern ELT Architecture
The canonical modern data stack ELT pattern:
**Extraction and loading**: Fivetran, Airbyte, Stitch, or a custom connector extracts data from source systems (Salesforce, Stripe, PostgreSQL production database, marketing APIs) and loads raw data into the warehouse. Raw tables are the source layer — they mirror the source system schema with minimal modification.
**Transformation**: dbt models define the transformation logic. Staging models clean and standardise the raw tables (rename columns, cast types, handle nulls). Intermediate models join and enrich across sources. Mart models produce domain-specific analytical tables. All logic is in SQL, version-controlled in git, and tested with dbt tests.
**Orchestration**: Airflow, Prefect, or Dagster coordinates the extract-load runs and dbt transformation runs, managing dependencies between sources and ensuring transformations run on fresh data.
What ETL Is Still Right For
ELT is not universally better. ETL patterns remain appropriate in specific contexts:
**Complex non-SQL transformations**: Machine learning feature engineering, NLP preprocessing, complex statistical calculations, and API-enriched transformations that SQL cannot express cleanly. These genuinely need a transformation layer outside the warehouse. Modern stacks handle this with Python scripts, dbt Python models, or Spark jobs rather than traditional ETL tools.
**Streaming data**: Real-time stream processing (Kafka, Kinesis, Flink) is ETL in structure — transformation happens before or during loading to maintain low latency. Batch ELT is not appropriate for sub-minute data currency requirements.
**Data masking and sensitivity**: Some organisations require that sensitive fields be masked or hashed before they ever enter the warehouse. ETL-style preprocessing can handle this before load; ELT loads raw data that includes sensitive fields (which then requires strict access controls on the raw layer).
**Legacy environments**: Environments already built on mature ETL infrastructure with stable transformation logic have limited reason to migrate to ELT for existing workloads. The migration cost is real; the benefit depends on whether the development and maintenance experience of the ETL tooling is causing problems.
Practical Implications for Architecture
For a new data stack evaluation:
If your team is comfortable with SQL and git, and your data sources are served by modern SaaS connectors (Fivetran, Airbyte), and your data volumes fit within cloud warehouse economics, ELT with dbt is the correct default. The development experience, testability, and version-control discipline are materially better than equivalent ETL tooling.
If you have real-time streaming requirements, ML-intensive transformations, or specific pre-load data masking requirements, a hybrid approach is appropriate: ELT for the batch analytical layer, with purpose-built tools for the streaming or sensitive-data processing.
If you are inheriting an existing ETL environment, evaluate whether the current development and maintenance friction justifies migration. If the ETL layer is stable and the team is productive, incremental migration may not be worth the disruption. If the ETL layer is a bottleneck, migrating the most active transformation areas to dbt is a reasonable incremental approach.
Our data architecture practice evaluates and redesigns data integration layers — contact us to discuss whether your current architecture should move toward an ELT pattern.
A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.
Book a Call →