dbt and BigQuery: Analytics Engineering on Google Cloud

dbt on BigQuery is the standard analytics engineering configuration for data teams operating in the Google Cloud ecosystem. BigQuery's serverless compute model, partitioning and clustering, and cost-per-query billing interact with dbt's materialisation strategies in ways that differ significantly from warehouse-based platforms. This guide covers the BigQuery-specific patterns that make dbt projects performant and cost-efficient.

dbt on BigQuery is the dominant analytics engineering configuration for data teams in the Google Cloud ecosystem. BigQuery's serverless compute model — where queries are charged per byte scanned rather than per compute-hour — interacts with dbt's materialisation and execution patterns in ways that differ substantially from warehouse-based platforms like Snowflake or Redshift. Effective dbt-on-BigQuery usage requires understanding these differences, particularly around partitioning, clustering, cost management, and incremental model design.

BigQuery Adapter Configuration

The dbt-bigquery adapter profile:

dbt_project_name:

target: prod

outputs:

prod:

type: bigquery

method: service-account

project: your-gcp-project-id

dataset: analytics_prod

threads: 4

keyfile: /path/to/service-account.json

location: US

job_execution_timeout_seconds: 600

job_retries: 1

Key configuration options:

**method** — authentication method. service-account uses a JSON key file; oauth uses application default credentials; impersonation allows assuming a service account identity.

**project / dataset** — the GCP project and BigQuery dataset where dbt writes output tables and views.

**location** — the BigQuery multi-region (US, EU) or region where datasets are created. Match to your data locality requirements.

**job_execution_timeout_seconds** — maximum execution time for a single dbt model query. BigQuery queries that exceed this timeout fail; set appropriately for your largest models.

Partitioning and Clustering

BigQuery charges per byte scanned. On large tables, queries without partition pruning scan the entire table, generating significant cost. Partitioning and clustering reduce scanned bytes:

**Partitioning** — divides a table into segments by a partition column (typically a date or timestamp). Queries that filter on the partition column scan only the relevant partitions.

Configure in dbt model config:

{{ config(

materialized='table',

partition_by={

"field": "order_date",

"data_type": "date",

"granularity": "day"

}

) }}

BigQuery supports date, datetime, timestamp, and integer range partitioning. Date partitioning is the most common for event and transaction tables.

**Clustering** — orders data within partitions by up to four columns, enabling efficient pruning on those columns. Clustering is not as strict as partitioning (there is no physical partition boundary) but reduces bytes scanned for queries filtering on the clustered columns.

{{ config(

materialized='table',

partition_by={"field": "order_date", "data_type": "date"},

cluster_by=["region", "product_category"]

) }}

Use partitioning for the primary time dimension and clustering for the secondary dimensions most commonly used in filter predicates.

Incremental Models on BigQuery

BigQuery's per-byte pricing makes incremental models more important than on fixed-cost platforms. A full refresh of a 10TB table scans 10TB and generates proportional cost. An incremental run that processes only the last day's data might scan 50GB — two orders of magnitude less.

BigQuery-specific incremental strategies:

**insert_overwrite** — the recommended incremental strategy for BigQuery. Rather than merging row by row, it replaces complete partitions. The incremental predicate identifies which partitions need to be replaced, the replacement data is computed, and the affected partitions are atomically swapped.

{{ config(

materialized='incremental',

incremental_strategy='insert_overwrite',

partition_by={"field": "order_date", "data_type": "date"},

on_schema_change='sync_all_columns'

) }}

{% if is_incremental() %}

WHERE order_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 3 DAY)

{% endif %}

The 3-day lookback (rather than 1 day) accounts for late-arriving data that may backfill recent partitions.

**merge** — uses BigQuery's MERGE statement for row-level upserts. More expensive than insert_overwrite for large tables because MERGE is a row-level operation; use insert_overwrite for fact tables and merge for dimension tables with row-level updates.

**copy_partitions** — a dbt-bigquery specific strategy that uses BigQuery's table copy for partition operations, which is cheaper than scanning and recomputing the partition. Appropriate for tables where the partition content is fully replaced rather than merged.

Cost Management for dbt on BigQuery

Without attention to query costs, dbt on BigQuery can generate unexpectedly high bills. Key cost management practices:

**Monitor query costs in BigQuery INFORMATION_SCHEMA** — BigQuery's INFORMATION_SCHEMA.JOBS_BY_PROJECT view records bytes processed for every query job. Regular review of top-cost dbt jobs identifies models that need partitioning or incremental optimisation.

SELECT

job_id,

statement_type,

total_bytes_processed / POW(1024,4) as tb_processed,

query

FROM region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT

WHERE creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)

ORDER BY total_bytes_processed DESC

LIMIT 20

**Use views for lightweight models** — views materialised in BigQuery do not store data; they are evaluated at query time by downstream consumers. Using views for staging models eliminates the cost of materialising intermediate data, at the expense of running the staging SQL on every downstream query.

**Partition pruning verification** — BigQuery's query plan inspector shows how many partitions were pruned for a query. Verify that incremental predicates actually result in partition pruning; predicates that cannot be evaluated at partition-selection time (e.g., those involving subqueries or non-deterministic functions) may result in full table scans despite partition configuration.

**Reserved vs on-demand billing** — for high-query-volume environments, BigQuery's reserved slot pricing (flat monthly fee for a reserved compute capacity) can be significantly cheaper than on-demand per-byte pricing. Evaluate the economics based on your actual query volume.

BigQuery-Specific dbt Features

**dbt Python models** — BigQuery supports dbt Python models that run via BigQuery DataFrames or Spark on Dataproc. For transformations that are difficult to express in SQL — complex array manipulation, ML feature engineering, custom aggregations — Python models extend dbt's SQL-first approach with Python.

**Authorized views** — BigQuery supports authorized views that can query tables in other datasets without granting the view's users access to the underlying tables. This is the BigQuery pattern for row-level and dataset-level access control. dbt models used as the basis for Looker explores or other downstream BI tools should typically be configured as authorized views.

**External tables** — dbt can materialise models as BigQuery external tables pointing to Google Cloud Storage files. For large datasets that are not queried frequently, external tables avoid storage costs while maintaining queryability.

**Time travel** — BigQuery supports querying historical table states using FOR SYSTEM_TIME AS OF syntax. This is useful for debugging incremental model issues: querying the state of a table before the last dbt run to compare against the current state.

Our data architecture practice designs dbt-on-BigQuery transformation architectures for enterprise analytics teams operating in the Google Cloud ecosystem — contact us to discuss your analytics engineering stack.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →