BlogData Engineering

Analytics Engineering Workflow: From Raw Data to Trusted Metrics

James Okafor
James Okafor
Data & Cloud Engineer
·June 17, 202712 min read

Analytics engineering bridges the gap between data engineering and data analysis. The analytics engineer transforms raw data into clean, well-modeled, tested, and documented data assets that analysts can use with confidence. This guide covers the workflow, the tooling, and what separates an effective analytics engineering practice from a collection of SQL files.

Analytics engineering is the discipline that sits between data engineering (building and maintaining the data infrastructure that brings data in) and data analysis (using data to answer business questions). The analytics engineer transforms raw data — as-loaded from source systems — into clean, well-modeled, tested, and documented datasets that analysts can use with confidence, without needing to understand the underlying pipeline plumbing.

The discipline emerged because the traditional division of labour between data engineers (who build infrastructure and do not write business logic) and analysts (who write business logic in spreadsheets because the data they need is not in a usable form) produced a gap. Data landed in the warehouse but required significant manual work to make analytically useful. Analytics engineering occupies and bridges that gap.

The Core Workflow

The analytics engineering workflow follows a consistent pattern regardless of tooling:

**Understand the source data**: Before transforming, understand what you have. What do the tables represent? What does each column mean? What are the grain and cardinality of the key tables? What data quality issues exist? How does the source system model its entities (is a deleted record actually deleted, or flagged with a status column)?

Understanding source data requires reading source system documentation, querying the raw tables, and often talking to the team that owns the source system. This step is frequently underinvested, producing transformations that misinterpret the source data.

**Build staging models**: Staging models are one-to-one with source tables. They rename columns (source tables often have cryptic or inconsistent column names), cast types (VARCHAR '2024-01-15' becomes a DATE), standardise null handling (empty strings become nulls), and apply consistent naming conventions across sources.

Staging models should not join tables or apply business logic. Their job is to make source data clean and consistently formatted, not to answer business questions. One staging model per source table.

**Build intermediate models**: Intermediate models join across sources and apply business logic that does not belong in staging. A customer with attributes from CRM, product usage from the event system, and support history from the ticketing system is assembled in an intermediate model. Business rules about what constitutes an "active" customer or a "converted" lead live in intermediate models.

Intermediate models are the most design-intensive part of the analytics engineering workflow. They encode business logic that must be correct, consistent, and maintainable. Every intermediate model should be reviewed by someone who understands the business domain, not just the SQL.

**Build mart models**: Mart models are the consumption layer — what BI tools, analysts, and applications connect to. They are typically denormalised for query performance and ease of use: wide tables with fact measures and dimension attributes that a business user or BI tool can query without needing to understand joins.

One mart per analytical domain: a Finance mart for revenue and cost analysis, a Marketing mart for campaign and lead analysis, an Operations mart for throughput and capacity. Each mart is designed for its specific audience.

**Test throughout**: Tests verify that the data meets expected quality characteristics at every layer. Staging tests verify source data quality: primary keys are unique, required fields are not null. Intermediate tests verify business logic: entity counts match expectations, key metrics are within expected ranges. Mart tests verify the final output: metrics reconcile with source system reports, year-over-year comparisons are consistent.

Tests in dbt are declarative and run automatically. A model that fails a test fails the pipeline run — the failure does not reach production.

**Document as you build**: Documentation written at build time is the only documentation that will exist. Documentation written later is rarely complete. dbt's documentation layer allows adding column descriptions, business definitions, and model-level descriptions alongside the SQL. The dbt docs site makes this documentation searchable and linked to the lineage graph.

Version Control and Code Review

All analytics engineering code lives in git. This is non-negotiable. SQL files in git are:

- Version-controlled: every change is tracked, attributable, and reversible

- Reviewable: changes go through code review before merging, which catches both technical errors and business logic mistakes

- Deployable: CI/CD runs tests on the branch before merging to main

Code review for analytics engineering requires domain knowledge. A SQL review that only checks syntax and query structure misses the most important error class: business logic that is technically correct SQL but analytically wrong. Reviewers must understand what the model is supposed to do to verify that it does it.

Development and Production Environments

Analytics engineering code needs at least two environments: development and production.

Development environment: where models are developed and tested. Typically a schema per developer (analytics_dev_jsmith) that contains the developer's versions of models without affecting the production schema that BI tools connect to. dbt's target schema configuration makes this straightforward.

Production environment: the schema that BI tools, dashboards, and downstream consumers connect to. Production runs only from the main branch after code review and CI/CD tests pass. No direct modifications to production schemas by developers — all changes go through the git and CI/CD workflow.

Staging or QA environment (optional but valuable): a pre-production environment that runs the full production workflow against a representative data sample before promoting to production. Catches issues that CI/CD unit tests miss: downstream BI tool compatibility, dashboard rendering with real data volumes, performance under realistic query loads.

Metrics Layer

A persistent pattern in analytics engineering is the same metric defined multiple times in different models: revenue calculated differently in the Finance mart and the Marketing mart; active users defined differently for the product dashboard and the executive report. The divergence happens gradually and is discovered when numbers contradict.

A metrics layer (Metrics definitions in dbt, or a dedicated semantic layer tool like Cube or dbt Semantic Layer) centralises metric definitions. A metric named 'monthly_recurring_revenue' has one definition; every dashboard that displays it uses the same definition. When the definition needs to change, it changes in one place.

The metrics layer is the answer to the problem that BI tools face when building their own semantic layers independently: semantic layers in Tableau, Power BI, and Looker diverge from each other and from the warehouse transformation layer. A warehouse-level metrics layer is the authoritative definition that all tools reference.

Team Integration

Analytics engineers sit closest to both the data engineering team (understanding what the pipelines produce and what the data means) and the analytics team (understanding what business questions the data needs to answer). This positioning is the role's core value.

Effective analytics engineering practice means regular communication with both sides: syncing with data engineering on source system changes before they land; syncing with analysts and BI teams on what the mart layer needs to contain to answer the questions they are being asked.

Our data architecture and BI strategy practice builds analytics engineering workflows — from dbt setup to mart design to metrics layer — that give analysts data they can trust — contact us to discuss building an analytics engineering capability.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →