Data engineers and analytics engineers both work with data pipelines and SQL, but the roles have distinct scopes, tools, and career paths. This guide draws the line clearly — what data engineers own, what analytics engineers own, where the overlap is, and how the two roles divide responsibilities in organisations of different sizes.
Data engineers and analytics engineers both write SQL, both work with data pipelines, and both care about data quality. The roles are distinct enough that most organisations that have both treat them as different job families — but similar enough that the boundary is frequently misunderstood, and many organisations conflate them in ways that create accountability gaps.
Here is a clear articulation of what each role owns, where they overlap, and how the division of responsibility plays out in practice.
What Data Engineers Own
Data engineers build and maintain the infrastructure that moves data. Their work begins at the source and ends at the data warehouse (or data lake) where the data lands. The core scope:
**Data ingestion pipelines:** Connecting to source systems (Salesforce, PostgreSQL, Stripe, Kafka topics) and extracting data reliably. This includes managing API rate limits, handling authentication and credential rotation, dealing with schema changes in source systems, and ensuring the extracted data lands complete and on schedule.
**Streaming infrastructure:** Building and operating real-time data pipelines for event streams — Kafka producers and consumers, Flink or Spark Structured Streaming jobs, Kinesis firehoses. Streaming infrastructure requires different skills and operational practices than batch pipelines.
**Data platform infrastructure:** Provisioning and maintaining the compute and storage layer — cloud warehouse clusters (Redshift, Databricks), object storage (S3 buckets with appropriate lifecycle policies and access controls), orchestration platform (Airflow or Prefect), and the associated infrastructure-as-code.
**Pipeline orchestration:** Defining the dependency graph for data pipeline execution, handling failures and retries, alerting on pipeline health, and ensuring the end-to-end pipeline runs reliably on schedule.
**Performance and cost engineering:** Data warehouses at scale require active engineering to remain performant and cost-effective. Distribution keys, clustering, partition management, query cost monitoring, and warehouse rightsizing are data engineering concerns.
**Security and access control:** Ensuring data in transit and at rest is encrypted, managing IAM roles and service account permissions, enforcing network-level controls for data infrastructure, and ensuring compliance with data residency requirements.
What Analytics Engineers Own
Analytics engineers build and maintain the transformation layer — the SQL code that converts raw ingested data into clean, modelled, tested analytical assets. Their work begins where data engineering ends (the raw data in the warehouse) and ends where analysts and BI tools begin.
**Staging models:** The first layer of transformation — one model per source table, renaming columns to consistent conventions, casting types, applying basic data cleaning (handling known nulls, standardising categorical values), and documenting field meanings.
**Data modelling:** Designing the dimensional model — fact tables that record business events at the appropriate grain, dimension tables that describe the entities involved, mart tables that pre-aggregate for specific analytical use cases. The data model design requires understanding both the business domain and the analytical use cases it needs to support.
**Business logic implementation:** The canonical definitions of business metrics — what counts as an "active customer," how revenue is calculated, what determines a "churned" subscription — are implemented in dbt models. Analytics engineers own ensuring that these definitions are correct, consistent, and applied uniformly across the model.
**Data quality testing:** Writing and maintaining dbt tests that enforce quality constraints on the data model. Investigating test failures, tracing them to upstream causes, and coordinating fixes with data engineers or source system owners.
**Documentation:** Writing model descriptions, column descriptions, and metric definitions in schema.yml. The analytics engineer is responsible for making the transformation layer legible to analysts who did not build it.
**Collaboration with analysts:** Working with business analysts and data analysts to understand their reporting requirements, translate them into data model design decisions, and ensure the mart layer provides what analytical consumers need.
Where the Roles Overlap
The overlap zone is the raw-to-staging boundary. Whether the raw source tables are cleaned in a managed connector (data engineering) or in a staging dbt model (analytics engineering) varies by team. Most commonly:
- Data engineers own the ingestion connector and ensure raw data lands correctly.
- Analytics engineers own the staging model that standardises the raw data.
- Both agree on what the raw data should contain and flag issues that require upstream fixes.
The orchestration of dbt runs is another overlap area. Whether the dbt production run is triggered by Airflow (data engineering infrastructure) or by dbt Cloud's scheduler (analytics engineering tooling) depends on the team's setup. Either can work; the important thing is that someone owns the reliability of the dbt run and investigates failures.
How the Division Works at Different Team Sizes
### Small teams (under 5 people in the data function)
At this scale, the distinction barely exists. One person (or a small team) does everything: manages the Fivetran connections, writes the dbt models, monitors pipeline health, and supports analysts. The title might be "data engineer" or "analytics engineer" or just "data person."
Recognising the distinction matters not for job titles but for understanding what skills the team needs and what work is being done.
### Mid-size teams (5–20 people)
Specialisation starts to make sense. A data engineer (or small data engineering team) manages ingestion, platform infrastructure, and orchestration. One or more analytics engineers own the dbt layer and the data model. Analysts use the mart layer without building it.
The productivity gain from this specialisation is significant: each person operates in their area of expertise, the work is better because it is not a compromise of competing skill sets, and accountability is clear.
### Large teams (20+ people in the data function)
Multiple data engineering teams aligned to infrastructure domains (ingestion, streaming, platform). Multiple analytics engineering teams aligned to business domains (commercial, product, finance). The boundary is well-defined and explicitly managed.
Common Confusion Points
**"Our dbt models are data engineering work."** In some organisations, data engineers also write the dbt transformation layer. This is fine — but it means those data engineers need the SQL depth and business domain knowledge normally associated with analytics engineering. Many data engineers prefer the infrastructure and systems work and find the transformation layer work less engaging; many analytics engineers are not interested in infrastructure management. Conflating the roles works if the people involved can genuinely do both well.
**"Analytics engineers are just data analysts who learned dbt."** The data analyst role is primarily oriented toward answering business questions — building reports, doing ad-hoc analysis, communicating findings to stakeholders. Analytics engineers are oriented toward building the infrastructure that enables analysis — the transformation layer is their product. The skills overlap (both need SQL, both need business domain knowledge) but the orientation is different. Analytics engineers who do not understand that their product is the data model — not the analysis — build data models for analysis they personally find interesting rather than models that serve the organisation's analytical needs.
**"Data engineers and analytics engineers are the same job."** They are not, and treating them as interchangeable creates accountability gaps. If no one specifically owns the transformation layer, it accumulates debt. If no one specifically owns the ingestion and platform layer, incidents are slow to resolve and infrastructure decisions are made without the engineering judgment they require.
Our data engineering consulting practice staffs both data engineers and analytics engineers and helps organisations design data team structures — contact us to discuss team structure for your data function.
A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.
Book a Call →