BlogData Engineering

Snowflake Streams and Tasks: Change Tracking and Scheduled Processing Explained

James Okafor
James Okafor
Data & Cloud Engineer
·July 3, 20269 min read

Snowflake Streams capture row-level changes to tables; Tasks schedule SQL or stored procedure execution. Together they enable lightweight CDC and event-driven processing without external orchestration tools.

The quick answer

Snowflake Streams and Tasks are two complementary features that, combined, enable lightweight change data capture and event-driven processing within Snowflake — without external orchestration tools. A Stream tracks row-level changes (inserts, updates, deletes) to a source table and exposes them as a queryable change set. A Task schedules SQL or stored procedure execution on a cron schedule or on a serverless trigger. Together, a Task that consumes a Stream processes only changed rows since the last consumption, providing an incremental processing pattern for CDC-style pipelines that stay entirely within Snowflake.

Snowflake Streams

A Stream is a change tracking object associated with a source table. It records which rows have been inserted, updated, or deleted since the Stream was last consumed. Querying the Stream returns a set of changed rows plus metadata columns: METADATA$ACTION (INSERT or DELETE), METADATA$ISUPDATE (true if the row is the new state of an updated row), and METADATA$ROW_ID (a unique ID for the changed row across all change events).

**Creating a stream**: CREATE STREAM mystream ON TABLE source_table. After creation, the Stream's offset is set to the current state of the table — changes from this point forward are captured.

**Consuming a stream**: query SELECT * FROM mystream to see pending changes. When you use Stream data in a DML statement (INSERT, MERGE, or UPDATE that references the Stream), Snowflake advances the Stream's offset — the consumed rows are cleared from the Stream's change set. If you only SELECT without DML, the offset does not advance — the rows remain available.

**Stream types**:

**Standard streams** (default): capture all changes — inserts, updates, and deletes. Each update appears as two rows: a DELETE of the old state and an INSERT of the new state.

**Append-only streams**: only capture inserts. More efficient for append-only tables (event logs, audit tables) where updates and deletes do not occur. Append-only streams are more performant than standard streams for high-insert-rate tables.

**Insert-only streams**: available for external tables (tables backed by S3 or other external storage). Captures new files added to the external stage.

**Stream staleness**: if a Stream's offset is not advanced within the table's DATA_RETENTION_TIME_IN_DAYS period (default 14 days, configurable up to 90 days), the Stream becomes stale and can no longer be consumed. Design your processing to consume streams at least as frequently as the retention window.

Snowflake Tasks

A Task is a Snowflake object that executes a SQL statement or a call to a stored procedure on a schedule. Tasks are the scheduling layer within Snowflake — similar to cron jobs or Airflow operators, but running entirely within the Snowflake compute layer.

**Scheduled tasks**: execute on a cron expression or interval. SCHEDULE = 'USING CRON 0 * * * * UTC' runs every hour. SCHEDULE = '5 MINUTE' runs every 5 minutes.

**Stream-triggered tasks (event-driven)**: WHEN SYSTEM$STREAM_HAS_DATA('mystream') conditions the task's execution on whether the stream has new data. If the stream has no pending changes, the task is a no-op (still counts toward the minimum schedule but does minimal work). This enables event-driven patterns — the task runs on schedule but only processes when there is new data.

**Task trees**: tasks can depend on other tasks, forming a directed acyclic graph (DAG) of scheduled work. A root task has a schedule; child tasks are triggered when their parent task succeeds. This models multi-step pipelines within Snowflake without an external orchestrator.

**Serverless tasks**: tasks without an assigned virtual warehouse execute on Snowflake-managed serverless compute, billed per second. Appropriate for short-running, frequently triggered tasks where provisioning a full virtual warehouse is wasteful.

**Task execution**: tasks run under the task owner's role. Ensure the task's role has the necessary privileges on all referenced tables, views, and stored procedures.

The Stream + Task pattern

The canonical use case: a source table receives new data (from Fivetran, Snowpipe, or manual inserts). A Stream captures the changes. A Task on a 5-minute schedule checks the stream using SYSTEM$STREAM_HAS_DATA, and when data is available, executes a MERGE statement that applies the changes to a target table.

This pattern:

1. Creates a Stream on the source table

2. Creates a Task on a short schedule with WHEN SYSTEM$STREAM_HAS_DATA

3. Task executes MERGE INTO target_table USING mystream ON matching_key ... — applying inserts, updates, and deletes from the stream to the target

The result: the target table is kept approximately in sync with the source table, with a lag equal to the task schedule (5 minutes, 1 hour, etc.), without external orchestration.

Limitations and when not to use Streams + Tasks

**Not a replacement for Kafka**: Streams + Tasks provide minute-level to hourly CDC within Snowflake. For sub-second event processing, operational systems depending on real-time data, or multi-consumer streaming (multiple consumers reading the same event stream independently), Kafka is the appropriate tool.

**Stream chaining complexity**: when you chain Streams and Tasks (Task A consumes Stream A and writes to Table B; Task B has a Stream on Table B), the chain can become difficult to debug when a task fails mid-chain. For complex multi-step pipelines, Airflow or Dagster provides better visibility, retry logic, and alerting than chained Snowflake tasks.

**Task visibility**: Snowflake's native task monitoring (INFORMATION_SCHEMA.TASK_HISTORY) shows execution history, but there is no built-in alerting for task failures beyond email notifications configured on the task. For production pipelines with SLA requirements, supplement with external monitoring.

**STREAM_HAS_DATA is not transactional**: SYSTEM$STREAM_HAS_DATA returns true when the stream has data at the moment the task evaluates the condition. In high-frequency insert scenarios, a task might evaluate the condition as false immediately after a batch of inserts clears before the check — the inserts would be captured on the next cycle.

Use cases

**CDC propagation within Snowflake**: a raw ingestion table (loaded by Snowpipe or Fivetran) feeds a staging layer. A Stream + Task propagates only changed rows to the staging table, implementing incremental processing without rebuilding the full staging table on each run.

**Audit logging**: a Stream on a sensitive table captures all changes. A Task periodically moves changed rows to an audit log table, providing a tamper-evident history of all modifications.

**Real-time (near) aggregates**: a Transaction stream feeds a Task that incrementally updates a pre-aggregated summary table (daily revenue totals) — keeping the aggregate fresh without full rebuilds.

**Triggering downstream workflows**: a Task can call an external function (Snowflake External Functions) or a Snowflake stored procedure that calls an HTTP endpoint — allowing Snowflake task execution to trigger downstream systems (a webhook to Slack, an API call to kick off a dbt Cloud job).

For the dynamic tables alternative for materialisation, see snowflake dynamic tables. For the broader Snowflake architecture, see snowflake architecture guide. For the external orchestration that handles complex multi-step pipelines, see apache airflow guide.

Our data architecture consulting practice implements Snowflake-native data pipelines — including Streams and Tasks for CDC propagation, incremental processing, and near-real-time aggregation. Book a free 30-minute audit to discuss your pipeline architecture.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →