BlogData Architecture

BigQuery Cost Optimisation: How to Control Your Bill Without Sacrificing Performance

Obed Tsimi
Obed Tsimi
Founder & Senior Tableau Architect
·January 24, 202711 min read

BigQuery charges by bytes scanned, which means every unoptimised query is a cost event. This guide covers the techniques that reduce BigQuery costs — table partitioning, clustering, column selection, BI Engine reservations, slot reservations vs on-demand, query cost governance, and the monitoring setup that makes cost problems visible before they become surprises.

BigQuery's on-demand pricing model — charging per TB of data scanned — is transparent and predictable at small scale. At large scale, with dozens of analysts running queries and dozens of automated pipelines processing data, it becomes one of the largest items in the data infrastructure budget. The good news: most BigQuery cost problems are caused by a small number of fixable patterns. This guide covers the techniques that produce the largest cost reductions.

Understand What You Are Paying For

Before optimising, understand the cost breakdown. BigQuery has two cost components:

**Storage:** Charged per GB per month. Active storage (tables queried in the last 90 days) costs slightly more than long-term storage (tables not queried in over 90 days, automatically cheaper). Storage costs are relatively predictable and usually not the primary cost driver.

**Query processing (on-demand):** Charged per TB of data scanned. This is where costs can grow unexpectedly. A query that scans 1TB of data costs approximately $5 (at $5/TB on-demand pricing). A query that scans 100TB costs $500.

The optimisation target: reduce the bytes scanned per query. Almost every other technique — partitioning, clustering, column selection — exists to accomplish this goal.

Table Partitioning

Partitioned tables divide data into segments based on a column value — typically a DATE or TIMESTAMP column. BigQuery uses partition metadata to skip partitions that cannot contain rows matching the query's WHERE clause.

A query on a partitioned table filtering on the partition column scans only the partitions within the filter range. The same query on an unpartitioned table scans the entire table.

Date/timestamp partitioning:

CREATE TABLE my_project.my_dataset.fact_orders

PARTITION BY DATE(order_date)

AS SELECT * FROM source_table;

A query filtering WHERE order_date >= '2025-01-01' AND order_date < '2025-02-01' scans only January 2025 partitions, not the full historical table.

**Integer range partitioning:** For tables without a natural date column but with a numeric partition column (e.g., customer_id % 100 for customer sharding), integer range partitioning is available.

**Partition expiration:** Set partition expiration to automatically delete old partitions:

ALTER TABLE my_dataset.fact_events

SET OPTIONS (partition_expiration_days = 365);

This automatically manages storage retention without manual cleanup jobs.

**Enforcing partition filters:** For tables where unfiltered queries are always a mistake (and expensive), enable require_partition_filter:

CREATE TABLE my_dataset.fact_events

PARTITION BY DATE(event_date)

OPTIONS (require_partition_filter = TRUE);

Queries on this table without a partition filter return an error rather than scanning the full table. This prevents accidental full-table scans in BI tools and notebooks.

Clustering

Clustering sorts data within partitions by specified columns. When a query filters on a clustered column, BigQuery uses block-level metadata to skip blocks that cannot contain matching rows — a technique called block pruning.

Clustering does not partition the table (you can combine partitioning and clustering); it sorts within partitions. The benefit depends on the cardinality and query pattern of the clustered columns.

Best columns to cluster:

- Columns frequently used in WHERE clause filters

- Columns used in JOIN conditions

- Columns used in GROUP BY

- High-cardinality columns (region, product_id, customer_id) benefit more than low-cardinality columns

CREATE TABLE my_dataset.fact_orders

PARTITION BY DATE(order_date)

CLUSTER BY customer_id, product_category;

A query filtering on both order_date (partition pruning) and customer_id (clustering block pruning) scans a fraction of the full table.

Column Selection

BigQuery is a columnar database — it reads only the columns referenced in a query. SELECT star reads every column, including those not needed by the query.

This matters because BigQuery charges by bytes scanned. A wide table (100 columns) scanned with SELECT star costs 100x more than the same table scanned with SELECT specific_column. For tables with many columns, always specify exactly the columns the query needs.

For dbt models: ensure the final SELECT in each model specifies columns explicitly rather than propagating SELECT star from upstream CTEs.

Materialised Views

For frequently-executed aggregation queries, materialised views pre-compute and cache the result. When a query matches a materialised view, BigQuery rewrites the query to use the cached result rather than re-scanning the base table.

Materialised views are most valuable for:

- Dashboard queries that run every time a user loads a report

- Regularly-scheduled analytical queries with stable results

- Aggregations on large tables that are slow and expensive to compute on every run

CREATE MATERIALIZED VIEW my_dataset.mv_daily_revenue

AS

SELECT

DATE(order_date) AS order_date,

product_category,

SUM(revenue) AS total_revenue,

COUNT(order_id) AS order_count

FROM my_dataset.fact_orders

GROUP BY 1, 2;

BigQuery automatically refreshes materialised views when the base table changes. Queries that match the materialised view structure are served from the cache without scanning the base table.

BI Engine

BigQuery BI Engine is an in-memory analysis service that caches frequently-accessed data for sub-second query response. Unlike standard BigQuery (where each query scans data from storage), BI Engine queries are served from memory.

BI Engine is configured as a reservation in GB — you purchase in-memory capacity and queries that fit within that capacity are served from memory.

**Cost model difference:** BI Engine is charged by the GB of reserved memory per hour, not by bytes scanned. For BI tools (Looker, Looker Studio, Power BI, Tableau) running many repeated queries against the same data, BI Engine can significantly reduce total query costs compared to on-demand scanning.

**When it makes sense:** When BI tool query patterns are repetitive (the same queries run many times per day) and the data that queries access fits within the BI Engine reservation, the fixed memory cost is lower than the variable scan cost.

Slot Reservations vs On-Demand

On-demand BigQuery charges per TB scanned. For unpredictable or intermittent query workloads, this is flexible and appropriate.

For organisations with high, predictable query volumes, slot reservations (BigQuery editions — Standard, Enterprise, Enterprise Plus) replace per-scan costs with reserved compute capacity:

- You purchase a number of slots (units of compute)

- Your queries run using those slots without per-scan charges

- You pay per slot-hour (or can commit to 1-year or 3-year commitments for significant discounts)

The break-even point depends on query volume and pattern. As a rough guideline: organisations spending over $5,000/month on on-demand BigQuery should evaluate whether slot reservations would be cheaper for their specific pattern.

Slot reservations require capacity planning — you need to purchase enough slots to handle peak concurrency without queuing. Autoscaling slots (available in Enterprise edition) add capacity dynamically during peaks, reducing the need for over-provisioning.

Query Cost Governance

**INFORMATION_SCHEMA.JOBS:** BigQuery's INFORMATION_SCHEMA.JOBS table records every query job with total_bytes_processed, total_bytes_billed, user_email, and query text. This is the foundation for cost attribution and governance.

SELECT

user_email,

SUM(total_bytes_billed) / 1e12 AS tb_billed,

SUM(total_bytes_billed) / 1e12 * 5 AS estimated_cost_usd

FROM my_project.region-us.INFORMATION_SCHEMA.JOBS

WHERE creation_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)

AND statement_type = 'SELECT'

GROUP BY 1

ORDER BY 2 DESC;

This query identifies the users generating the most cost over the past 30 days. Combine with query text analysis to identify the specific queries responsible for high costs.

**Custom cost quotas:** BigQuery allows setting per-user and per-project query byte quotas. Users who exceed the daily quota have their queries paused. This prevents individual analysts from accidentally running runaway queries that exhaust the monthly budget.

**Cost attribution labels:** Tag queries with project, team, or use-case labels using BigQuery query labels. This enables cost attribution by business unit in billing exports.

Our data architecture consulting practice designs cost-optimised BigQuery architectures — contact us to discuss BigQuery cost management for your environment.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →