BlogData Engineering

Dagster: The Asset-Oriented Data Orchestration Platform

James Okafor
James Okafor
Data & Cloud Engineer
·November 3, 202610 min read

How Dagster differs from Airflow — asset-based orchestration, software-defined assets, asset materialisation, observability, and the cases where Dagster reduces data pipeline complexity versus cases where Airflow remains the better choice.

Dagster is a data orchestration platform built around a fundamentally different concept than Airflow: instead of defining pipelines as directed acyclic graphs of tasks, Dagster defines pipelines as graphs of assets. An asset is any data artefact — a database table, a file, a trained machine learning model, a dashboard — and Dagster's job is to produce and maintain these assets according to their dependencies and freshness requirements.

This shift from task-orientation to asset-orientation changes how you think about, operate, and debug data pipelines in ways that matter for data engineering teams at scale.

The Asset Concept

In Airflow, a DAG defines the sequence of tasks to execute. The relationship between tasks and the data they produce is implicit — a task writes to a database table, but the DAG itself does not know that. Lineage, freshness, and observability require additional tooling to achieve.

In Dagster, a software-defined asset (SDA) is explicitly defined as a Python object with:

- A **function** that computes the asset (reads its inputs, applies logic, writes its output)

- **Dependencies**: other assets it reads from

- **Metadata**: description, owners, tags, freshness policies

- **Partition definition**: how the asset is partitioned (by date, by region, etc.)

Dagster builds the execution DAG automatically from asset dependencies. When you materialise an asset, Dagster traces the dependency graph and materialises any upstream assets that are stale or missing.

This explicitness is the core advantage: Dagster knows which assets are produced by which computation, which upstream data each asset depends on, and whether each asset is fresh. This makes lineage, freshness monitoring, and observability first-class features rather than add-ons.

Software-Defined Assets in Practice

A Dagster asset is defined with the @asset decorator:

from dagster import asset

import pandas as pd

@asset

def raw_orders(context) -> pd.DataFrame:

# Read from source database

return read_from_source("SELECT * FROM orders WHERE updated_at > {watermark}")

@asset

def cleaned_orders(raw_orders: pd.DataFrame) -> pd.DataFrame:

# Transform

return raw_orders.dropna(subset=['customer_id']).assign(

revenue=raw_orders['quantity'] * raw_orders['unit_price']

)

@asset

def orders_by_region(cleaned_orders: pd.DataFrame) -> pd.DataFrame:

return cleaned_orders.groupby('region')['revenue'].sum().reset_index()

Dagster infers the dependency graph from function signatures: cleaned_orders depends on raw_orders because raw_orders is a function parameter. Materialising orders_by_region automatically materialises its upstream dependencies if they are not already current.

dbt Integration

Dagster has first-class dbt integration via dagster-dbt. dbt models become Dagster assets automatically — Dagster reads the dbt project manifest and creates asset representations for every dbt model, test, and source.

This integration enables:

- Scheduling dbt models as Dagster assets alongside non-dbt data processing

- Triggering dbt runs based on upstream data availability (a Fivetran sync completing triggers the relevant dbt models)

- Unified lineage across Python assets, dbt models, and other data artefacts

- Observability for dbt runs through Dagster's asset monitoring

For teams using dbt, the Dagster integration is often the most compelling reason to consider Dagster over Airflow — it treats dbt models as first-class assets with visibility into freshness and dependencies, rather than executing dbt as a black-box shell command.

Observability: Knowing What Your Data Is

Dagster's asset-centric model enables a class of observability that task-based orchestrators cannot provide natively:

**Asset materialisation history**: for each asset, Dagster records every time it was materialised, how long it took, and whether it succeeded. You can see at a glance: is this table current? When was it last successfully produced?

**Freshness policies**: define how stale an asset can be before requiring re-materialisation. Dagster monitors asset freshness continuously and alerts when assets exceed their freshness threshold — without requiring custom monitoring code.

**Asset lineage**: the full dependency graph from raw source to final output is visible in Dagster's UI, with each node showing its current status (fresh, stale, failed, never materialised).

**Asset metadata**: assets can log structured metadata during materialisation (row counts, validation results, key statistics) that appears in the Dagster UI alongside the materialisation record. This makes it possible to see, at a glance, that today's orders table has 47,234 rows and the revenue total is within expected range — without writing separate monitoring code.

Dagster vs Airflow

The honest comparison:

Dagster wins:

- Asset-centric pipelines where lineage, freshness, and observability matter

- dbt-heavy pipelines where the dbt integration provides significant value

- New greenfield projects where there is no legacy Airflow investment

- Data teams that want a data-platform-focused tool rather than a general workflow tool

- Environments where non-engineers need to understand and monitor pipeline outputs

Airflow wins:

- Large existing Airflow investments with mature operator libraries and custom code

- General workflow orchestration beyond data pipelines (APIs, notifications, cross-system integration)

- Large community, vast provider ecosystem, more operators, more documentation

- Teams with deep Airflow expertise who are productive with its model

**The migration decision:** migrating an existing Airflow deployment to Dagster is a significant investment. The productivity benefits of Dagster's asset model are real but they accrue over time as the data platform grows. For large, mature Airflow deployments, the migration cost may not justify the switch unless the observability and lineage gaps in Airflow are causing active pain.

Dagster Cloud vs Self-Managed

Dagster Cloud provides a managed version of Dagster with hosted dagit (the web UI), managed metadata storage, and cloud-native deployment. The hybrid deployment model runs agent processes in your infrastructure while the cloud handles orchestration and metadata — keeping data processing in your environment while reducing operational overhead.

For teams that want Dagster's capabilities without managing Dagster's infrastructure, Dagster Cloud is worth the subscription cost. Self-managed Dagster is appropriate for teams with strong DevOps capability or strict data residency requirements.

For data engineering teams evaluating orchestration tools, our data architecture consulting practice can help assess Dagster vs Airflow for your specific pipeline requirements — contact us to discuss your orchestration architecture.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →