BlogData Engineering

How to Hire a Data Team: Roles, Interview Questions, and Common Mistakes

Austin Duncan
Austin Duncan
Managing Director & Principal Data Architect
·February 28, 202712 min read

The specific roles in a modern data team, what each role actually does, how to write job descriptions that attract the right candidates, the interview questions that separate strong from weak candidates in technical assessments, and the hiring mistakes that leave data teams underpowered or misstructured for years.

Building a data team that is appropriately structured and staffed is harder than it looks. The roles are not well-standardised across companies; the skills within roles vary significantly; and the hiring process for data roles routinely fails to distinguish candidates who sound technically proficient from candidates who actually are. This guide covers the roles, what to look for, how to interview effectively, and the mistakes that leave data teams structurally wrong for years.

The Core Roles

### Data Engineer

Data engineers build and maintain the infrastructure that moves, stores, and processes data. Their primary output is reliable data pipelines — systems that get data from source to warehouse on schedule, with appropriate quality checks, and with the resilience to handle failures without manual intervention.

What actually distinguishes strong from weak data engineers in interviews:

Weak candidates describe the tools they have used. Strong candidates explain the decisions they made and the trade-offs they considered. "I used Airflow" tells you nothing. "I chose Airflow over Prefect because we had an existing infrastructure team familiar with Kubernetes and the operator ecosystem mattered to us" tells you something.

Ask: "Tell me about a data pipeline you built that had a significant reliability problem. What was the problem, how did you diagnose it, and what did you change?"

A strong answer demonstrates: diagnosis process (not just "I added logging" but why the existing logging was insufficient), understanding of root cause (not just symptoms), and a solution that addresses the root cause rather than the symptom. A weak answer describes the symptoms without demonstrating diagnosis capability.

Core technical assessment:

- SQL proficiency at join/window function level

- Understanding of incremental vs full refresh patterns and when each applies

- Ability to explain what change data capture is and why it matters

- Python at a scripting/pipeline level (not software engineering level, but enough to build data processing scripts)

- Understanding of at least one cloud warehouse (Snowflake, BigQuery, or Redshift)

### Analytics Engineer

Analytics engineers build the transformation layer — the dbt models, the certified data sources, the dimensional data model that analysts use for reporting.

What actually distinguishes strong from weak analytics engineers:

Strong analytics engineers understand that their product is the data model — not the reports, not the analysis, but the transformation layer that makes analysis possible. They think about grain, about what downstream consumers need, about how the model will be used, not just whether the SQL runs.

Ask: "Walk me through how you would design the data model for a subscription business. What are the core entities, what is the grain of the key fact tables, and what are the questions a well-designed model should be able to answer without additional joins?"

A strong answer: identifies subscription, customer, and payment as core entities; thinks carefully about grain (one row per subscription-period? one row per billing event?); identifies the business questions the model needs to answer (MRR, churn rate, cohort retention) and works backwards to what the model needs to contain.

Core technical assessment:

- Advanced SQL (window functions, complex joins, query optimisation for at least one warehouse)

- dbt fundamentals (model materialisation, testing, documentation)

- Understanding of dimensional modelling (fact vs dimension, grain, SCD patterns)

- Ability to explain business logic clearly (metric definitions, calculation choices)

### Data Analyst / BI Developer

Analysts and BI developers build the reports and dashboards that business users use to make decisions. They bridge the technical data layer and the business consumer layer.

**The common hiring mistake:** Evaluating technical skills without evaluating business communication. An analyst who builds beautiful dashboards that answer the wrong questions, or that answer the right questions in a way business stakeholders cannot understand, is not effective.

Ask: "Tell me about a dashboard you built that changed how someone made a decision." This question identifies analysts who understand that their job is enabling better decisions, not producing pretty charts.

Strong answers describe the business question the dashboard was designed to answer, the iterative process of working with stakeholders to refine the design, and the specific decision or behaviour change that resulted. Weak answers describe the technical features of the dashboard.

Core technical assessment:

- Proficiency in the relevant BI tool (Tableau, Power BI, Looker)

- SQL for ad-hoc analysis

- Data visualisation principles (when to use which chart type, why)

- Communication of analytical findings to non-technical stakeholders

### Data Scientist

Data scientists build models, run experiments, and provide analytical insight that goes beyond descriptive reporting into prediction and prescription.

The role is highly variable: some data scientist roles are primarily ML engineering (deploying models to production), others are primarily statistical analysis (A/B testing, causal inference), and others are primarily applied research. Hiring for one type of data scientist when you need another is a common mistake.

Clarify before hiring: does the role primarily involve production model deployment, statistical analysis, or research? The skills, seniority profile, and interview process differ substantially.

Structuring the Data Team

### The First Data Hire

The most consequential data hiring decision is the first one. The first data hire sets the team's technical direction, culture, and scope definition for years.

The most common mistake: hiring a data analyst as the first data hire when the organisation actually needs data engineering infrastructure. An analyst who arrives to find no reliable data warehouse and no working pipelines spends their time on data plumbing rather than analysis — and the organisation concludes the analyst hire did not deliver value, rather than recognising the wrong hire was made.

The first data hire should match the organisation's most acute gap:

- No data infrastructure: hire a data engineer (or a versatile analytics engineer who can do both)

- Infrastructure exists but no analytical capability: hire an analytics engineer or senior analyst

- Both infrastructure and analytical capability exist: hire to support the team's highest-constraint role

### Team Sizing by Stage

**Pre-Series B:** One versatile generalist who can do data engineering, analytics engineering, and analysis. The "Analytics Engineer" title often fits best — this person handles everything from pipelines to dashboards.

**Series B to Series C:** Two distinct roles: one data engineer focused on infrastructure, one analytics engineer focused on the transformation layer and BI. Analysts can now be hired as domain specialists who consume the certified data layer.

**Series C to Series D:** Specialised roles: data engineers, analytics engineers, BI developers, and potentially a data scientist. A data team lead or Head of Data to manage the function.

**Beyond Series D:** Domain-aligned data teams (commercial analytics, product analytics, finance analytics), a central data platform team, and potentially a data science function.

Interview Process Design

The most reliable interview process for data roles combines:

1. **Technical screen (45 min):** SQL assessment appropriate to the role level. Written SQL to answer a business question, evaluated by a technical interviewer who can probe the logic and ask follow-up questions.

2. **System design or case study (60 min):** For data engineers, design a data pipeline for a described scenario. For analytics engineers, design a data model for a described business. For analysts, analyse a provided dataset and present conclusions. This is the highest-signal interview — it reveals whether candidates can apply skills to real problems.

3. **Behavioural interview (45 min):** Structured behavioural questions focused on work that is relevant to the role. For data engineers: "Tell me about a time a pipeline you owned failed in production. What happened?" For analysts: "Tell me about a time you found an insight that was surprising and changed someone's mind."

4. **Stakeholder conversation (30 min):** A conversation with a non-technical stakeholder the new hire would work closely with — a finance manager, a product manager, a commercial leader. This tests communication ability and fit with the business context.

Common Hiring Mistakes

**Hiring for tool knowledge rather than principles:** A candidate who knows Snowflake deeply but does not understand why distribution keys matter will make the same architectural mistakes on any platform. Tools change; principles persist.

**No technical take-home or live coding:** Data roles that do not include a technical assessment in the hiring process often end up with candidates who can describe what they have done but cannot demonstrate they can actually do it. The assessment does not need to be long — a 45-minute SQL exercise is sufficient.

**Hiring a team that is too homogeneous in skills:** A data team where everyone is a strong analyst but nobody understands data engineering infrastructure, or where everyone is a strong engineer but nobody has BI experience, is structurally incomplete. Map the existing skills gaps before each hire.

Our data architecture consulting practice advises organisations on data team structure and has access to data talent networks — contact us to discuss building your data team.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →