dbt and Snowflake: The Modern Analytics Engineering Stack

dbt on Snowflake is the most widely adopted combination in the modern data stack — dbt for transformation logic and testing, Snowflake for scalable cloud warehousing. Together they enable analytics engineering teams to build reliable, tested, documented transformation pipelines that scale with data volume and team size. This guide covers the integration patterns, performance optimisation, and team workflow for running dbt on Snowflake in production.

dbt running on Snowflake is the most widely adopted configuration in the modern data stack. dbt handles the transformation layer — defining models as SELECT statements, managing dependencies, running tests, generating documentation — while Snowflake handles the compute and storage. The combination works well because they are philosophically aligned: both are SQL-first, both are designed for scale, and both operate through clearly defined interfaces (SQL for dbt models; Snowflake's compute architecture for execution) that keep the layers cleanly separated.

Why dbt and Snowflake Work Together

dbt's approach to transformation is to push computation into the database. Models are SQL SELECT statements that dbt wraps in CREATE TABLE AS or CREATE VIEW AS statements and executes against the database. The database — in this case Snowflake — does the actual computation. dbt manages the dependency graph, the execution order, and the materialisation strategy; Snowflake executes the queries.

This makes Snowflake's architectural properties directly relevant to dbt performance:

**Warehouse sizing** — Snowflake warehouse size directly determines query execution speed. An XS warehouse runs models slower than an XL warehouse on the same query. dbt's ability to target different Snowflake warehouses for different model groups allows engineering teams to right-size compute: large, complex models on a larger warehouse; small reference models on a smaller one.

**Scalable storage** — Snowflake's separation of compute and storage means storing many tables (one per dbt model) has no compute cost when those tables are not being queried. dbt's model proliferation — potentially hundreds of staging, intermediate, and mart models — incurs storage cost but not idle compute cost.

**Concurrency** — Snowflake supports high query concurrency within a warehouse. dbt's parallel model execution sends multiple model queries simultaneously; Snowflake's multi-cluster warehouses scale compute to match query concurrency.

Snowflake-Specific dbt Configuration

The dbt-snowflake adapter requires a profile configuration specifying the Snowflake connection:

dbt_project_name:

target: prod

outputs:

prod:

type: snowflake

account: "your-account-identifier"

user: "dbt_service_account"

authenticator: externalbrowser # or private_key_path for key-pair auth

role: TRANSFORMER

database: ANALYTICS

warehouse: TRANSFORMING

schema: dbt_prod

threads: 8

Key configuration fields:

**account** — the Snowflake account identifier. For single-region accounts, this is in the format "orgname-accountname". For older accounts, it may be in the legacy "account.region" format.

**role** — the Snowflake role used by dbt. The role needs CREATE TABLE, CREATE VIEW, and CREATE SCHEMA privileges in the target database. Using a dedicated TRANSFORMER role limits the blast radius of dbt operations.

**warehouse** — the Snowflake virtual warehouse used for model execution. dbt supports per-model warehouse configuration to route expensive models to larger warehouses.

**threads** — the number of models dbt runs in parallel. In Snowflake, concurrent queries within a warehouse are queued if the warehouse is saturated; threads above the warehouse's concurrency capacity will queue rather than run in true parallel.

Materialisation Strategies on Snowflake

dbt supports four core materialisation types, each with different performance and cost implications on Snowflake:

**view** — creates a Snowflake view. No storage cost; the view's query runs at select time. Appropriate for lightweight transformations and reference data that changes infrequently. Queries against views run the underlying SQL at runtime, which can be expensive if the view logic is complex and the downstream consumer runs frequently.

**table** — creates a Snowflake table by running a CREATE OR REPLACE TABLE AS SELECT. The table is rebuilt from scratch on every dbt run. Appropriate for models where full rebuilds are acceptable and query speed matters more than build time. Table rebuilds on large models are expensive — a full scan of the source data on every run.

**incremental** — creates a table on the first run; on subsequent runs, only new or updated rows are processed. Appropriate for large fact tables where scanning the full source on every run would be expensive. Requires a strategy for identifying new/updated rows (typically a timestamp column or a unique key).

**ephemeral** — not materialised as a database object; the model's SQL is inlined into downstream models as a CTE. Appropriate for intermediate transformation steps that should not be visible as separate tables but improve SQL readability.

Snowflake-specific materialisation options:

**Dynamic tables** — Snowflake's native incremental computation primitive. A dynamic table refreshes automatically when upstream data changes, using a configured lag target. dbt-snowflake supports dynamic table materialisation. For continuously refreshing data where traditional dbt scheduling creates too much latency, dynamic tables are worth evaluating.

**Clustering keys** — for large Snowflake tables queried with consistent filter predicates, adding clustering keys via the dbt table config reduces the data scanned per query. Configure clustering keys in the model's config block.

Incremental Model Patterns on Snowflake

Incremental models on Snowflake have Snowflake-specific performance characteristics worth understanding:

**append strategy** — new rows only, no updates. Cheapest to run: dbt inserts new rows without checking for existing matches. Appropriate for immutable append-only fact tables.

**delete+insert strategy** — identifies rows to delete using the unique_key, deletes them, then inserts new versions. Appropriate for fact tables where rows can be updated.

**merge strategy** — uses Snowflake's MERGE statement to upsert rows. More efficient than delete+insert for low-update-rate tables; potentially less efficient for high-update-rate tables where MERGE's row-by-row semantics are slower than a bulk delete+insert.

For large fact tables, the performance of incremental models depends heavily on how efficiently the incremental predicate (the WHERE clause identifying new rows) can leverage Snowflake's partition pruning. Timestamp columns used as incremental predicates should be clustering keys on large tables to avoid full table scans for each incremental run.

Warehouse Management for dbt

Enterprise dbt-on-Snowflake deployments typically use multiple warehouses for different workload types:

**Transforming warehouse** — used for dbt model execution. Sized based on the largest models in the project. Auto-suspend after a short idle period (60–120 seconds) to avoid idle cost.

**CI warehouse** — used for dbt CI runs (testing PRs before merge). Can be smaller than production since CI typically runs a subset of models; auto-suspend aggressive since CI runs are episodic.

**Reporting warehouse** — separate from the transforming warehouse and used for BI tool queries. Keeping reporting and transformation on separate warehouses prevents dbt runs from impacting dashboard query performance and vice versa.

Per-model warehouse configuration in dbt:

{{ config(materialized='table', snowflake_warehouse='TRANSFORMING_XL') }}

SELECT ...

This runs the model on the larger warehouse without affecting other models using the default warehouse.

Testing Snowflake-Specific Constraints

dbt tests run as queries against Snowflake, which has implications for test strategy:

Snowflake does not enforce primary key or unique constraints at the storage layer — it stores constraints as metadata but does not reject duplicate inserts. dbt's not_null and unique tests are therefore important not just for documentation but for actual data quality enforcement.

For foreign key relationships, dbt's relationships test fires a query checking that all values in a foreign key column exist in the referenced table. On large Snowflake tables, this test can be expensive. Consider testing referential integrity on samples (dbt's store_failures_as configuration) for very large tables rather than running full scans.

Our data architecture practice designs dbt-on-Snowflake transformation architectures for enterprise analytics teams — contact us to discuss your analytics engineering stack.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →