BlogData Engineering

Dagster: Asset-Oriented Data Orchestration for Analytics Teams

Eric Chen
Eric Chen
BI Solutions Architect
·November 10, 202712 min read

Dagster takes a different approach to data orchestration than Airflow — instead of defining workflows as graphs of tasks, Dagster defines workflows as graphs of data assets. This asset-oriented model aligns naturally with how analytics engineers think about their work and provides observability, dependency management, and testing capabilities that task-based orchestrators require more effort to achieve.

Dagster is a data orchestration platform built around the concept of software-defined assets. Where task-based orchestrators like Airflow define workflows as graphs of jobs to run, Dagster defines workflows as graphs of data assets to produce — tables, files, machine learning models, API responses — with the execution logic derived from the asset graph rather than defined separately.

This distinction has practical implications. In Airflow, you define a DAG of tasks; you separately know (or document) which tables those tasks produce. In Dagster, you define assets with their computation logic; the execution graph is derived automatically. The result is a tighter coupling between "what data we want" and "how we produce it" — and better observability into the state of the data, not just the state of the jobs.

Core Concepts

**Asset** — a persistent data object produced by computation: a warehouse table, a file, a trained model, a feature store entity. Defined with the @asset decorator.

**Op** — the computation unit, analogous to an Airflow task. Ops are typically not defined directly — @asset automatically creates the underlying op.

**Job** — a subset of the asset graph that can be scheduled or triggered. A job selects which assets to materialise.

**Materialisation** — the execution of an asset's computation, producing or updating the asset. Dagster tracks metadata about each materialisation: rows produced, execution time, output types.

**Asset group** — a named collection of related assets, used for organisation and selection.

**Repository (code location)** — a collection of assets, jobs, schedules, and sensors grouped for deployment.

Defining Assets

A basic asset producing a warehouse table:

from dagster import asset

import pandas as pd

@asset(compute_kind="python", group_name="raw")

def raw_orders(context) -> pd.DataFrame:

# Extract from source API

df = extract_from_api()

context.add_output_metadata({"num_rows": len(df)})

return df

@asset(compute_kind="sql", group_name="marts", deps=[raw_orders])

def fct_orders(raw_orders: pd.DataFrame) -> pd.DataFrame:

# Transform raw orders into fact table

return raw_orders[raw_orders["status"] == "completed"].copy()

The deps parameter (or function argument) defines asset dependencies. Dagster derives the execution graph from these dependencies — you never manually wire tasks together.

dbt Integration

Dagster provides native dbt integration through the dagster-dbt package. Each dbt model becomes a Dagster asset automatically:

from dagster_dbt import dbt_assets, DbtProject

@dbt_assets(manifest=DbtProject(project_dir="/dbt").manifest_path)

def dbt_project_assets(context, dbt):

yield from dbt.cli(["build"], context=context).stream()

This generates one Dagster asset per dbt model, source, and seed, with correct dependency relationships derived from the dbt manifest. The dbt assets integrate with Dagster's asset graph — upstream Python assets can feed into dbt models, and dbt model outputs can feed downstream Python assets.

This integration is more native than the Airflow-dbt integration: dbt models appear as first-class assets in Dagster's UI, with materialisation history, run times, and downstream impact visible in the same interface as non-dbt assets.

Asset Checks

Dagster's asset checks are assertions about asset state, similar to dbt tests but defined in the Dagster layer:

from dagster import asset_check, AssetCheckResult

@asset_check(asset=fct_orders)

def fct_orders_row_count(fct_orders: pd.DataFrame) -> AssetCheckResult:

row_count = len(fct_orders)

return AssetCheckResult(

passed=row_count > 0,

metadata={"row_count": row_count}

)

Asset checks run after the asset materialises and their results are displayed in the asset's details page. Failed checks block downstream materialisation in the same way failed tasks do in Airflow.

Partitioned Assets

Dagster supports partitioned assets — assets that represent multiple slices of data, each independently materialisable. Common partition types:

**Daily partitions** — one partition per day, allowing individual days to be rematerialised independently.

**Static partitions** — one partition per dimension value (e.g., one per region).

**Multi-dimensional partitions** — combinations of time and dimension partitions.

A daily partitioned asset:

from dagster import asset, DailyPartitionsDefinition

@asset(partitions_def=DailyPartitionsDefinition(start_date="2024-01-01"))

def daily_sales_summary(context):

partition_date = context.partition_key

# Process data for partition_date

pass

Partitioned assets allow surgical backfills — rematerialising only the partitions that need updating — without reprocessing the entire dataset.

Observability and the Asset Catalog

Dagster's asset catalog is a central UI showing every asset, its current state, and its history:

**Current status** — each asset shows whether it is up-to-date (recently materialised successfully), stale (a dependency has been updated but the asset has not been rematerialised), or failed.

**Materialisation history** — every materialisation event with metadata (rows produced, execution time, any attached metadata), logs, and asset check results.

**Lineage** — the full lineage graph from sources through transformations to downstream assets, navigable in the UI.

**Freshness policies** — assets can have freshness policies that define how recently they should have been materialised; Dagster flags stale assets based on these policies.

This observability is one of Dagster's key advantages over Airflow. In Airflow, you can see whether a task succeeded; Dagster tells you whether the data asset is fresh and correctly produced.

When to Use Dagster vs Airflow

Choose Dagster when:

- Analytics engineering team-friendly: dbt-native integration, asset-oriented thinking aligns with how analytics engineers model data

- Observability is a primary requirement: asset catalog, freshness policies, lineage in one place

- Team is starting fresh with orchestration and not locked into Airflow

Choose Airflow when:

- Existing Airflow infrastructure and expertise

- Broad operator ecosystem (thousands of Airflow providers)

- Task-based orchestration fits the workflow better than asset-based

Consider both when:

- Organisation is migrating: Dagster can integrate with existing Airflow DAGs during transition

Our data architecture practice designs orchestration architectures including Dagster implementations for analytics engineering teams — contact us to discuss data orchestration strategy for your stack.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →