BigQuery Performance Optimization: Partitioning, Clustering, and Query Tuning

Most BigQuery performance and cost problems trace to missing partitioning and clustering. Here is the systematic approach to diagnosing slow queries and reducing bytes scanned.

The quick answer

BigQuery performance problems almost always reduce to one cause: queries scanning more data than necessary. BigQuery charges by bytes scanned in on-demand pricing ($6.25/TB), and scanning unnecessary data is both expensive and slow. The fixes are structural: partition tables by a date or timestamp column so queries filter to relevant partitions; cluster tables on the columns in your WHERE and JOIN conditions; and ensure queries specify only the columns they need rather than using SELECT *. This guide covers the systematic approach to BigQuery query tuning.

Understanding BigQuery's execution model

BigQuery is a serverless distributed SQL engine. It automatically allocates compute slots based on query complexity and your capacity configuration. There is no cluster to size or virtual warehouse to configure — BigQuery manages compute. The primary levers are in how data is stored and how queries are written, not in how much compute is allocated.

**Slots**: the unit of compute in BigQuery. A slot executes a portion of the query plan. In on-demand pricing, BigQuery allocates up to 2,000 slots per project for large queries. In capacity pricing (reservations), you purchase a fixed number of slots. Slot contention (your queries competing for a limited slot pool) is a problem at scale — if slot utilisation is consistently high, additional slot reservations are the fix.

**Columnar storage**: BigQuery's native Capacitor format stores data in compressed columnar format. Queries that reference fewer columns scan less data. SELECT * reads all columns; SELECT col1, col2 reads only those two — a potentially 100x scan reduction on wide tables.

**Bytes processed**: the primary cost and performance metric in BigQuery on-demand. INFORMATION_SCHEMA.JOBS_BY_PROJECT.total_bytes_processed tracks bytes per query. Aggregate this over 30 days to find the highest-cost queries — those are your optimisation targets.

Partitioning

Partitioning is the highest-impact optimisation for most BigQuery workloads. A partitioned table is divided into segments based on a partition column — each segment is physically stored and processed independently. When a query filters on the partition column, BigQuery scans only the relevant segments.

**Partition types**:

**Time-unit partitioning**: partition by a DATE, DATETIME, or TIMESTAMP column — daily, monthly, or yearly granularity. The most common pattern for analytical data. A table partitioned by created_date with daily granularity has one partition per day. A query filtering to the last 30 days scans 30 partitions, not the full table.

**Integer range partitioning**: partition by an integer column with a defined range and interval. Less common; useful for data naturally segmented by integer keys (user ID ranges, geographic region codes).

**Ingestion time partitioning**: partition by when rows were loaded (not a data column). Useful for append-only tables without a natural time column.

**Partition expiration**: automatically expire and delete partitions after a specified number of days. Useful for tables where historical data is not needed — raw event tables where only the last 90 days are queried can expire older partitions, reducing storage cost.

**Require partition filter**: use table options (require_partition_filter = true) to prevent accidental full table scans. Queries on the table must include a WHERE clause filtering on the partition column — queries without it fail with an error. This is a guardrail that prevents expensive full table scans from undisciplined queries.

Clustering

Clustering orders the data within each partition (or within the full table, for non-partitioned tables) by the cluster columns. This allows BigQuery to skip blocks within a partition that do not match query filters on the cluster columns.

**Cluster column selection**: choose the columns most commonly in WHERE and JOIN conditions after the partition filter. A table partitioned by date and clustered by region and product_category will efficiently answer queries like SELECT * FROM events WHERE date = '2026-06-01' AND region = 'AMER' AND category = 'Software'.

**Cluster cardinality**: clustering is most effective when cluster columns have moderate cardinality (hundreds to thousands of distinct values) and queries filter to a small subset of that range. Very low cardinality (boolean, 2-3 value status) or very high cardinality (user GUIDs) cluster less effectively.

**Automatic reclustering**: BigQuery automatically maintains clustering as new data is inserted. Unlike Snowflake, there is no separate reclustering cost — BigQuery handles it transparently.

**Combined partitioning + clustering**: the standard pattern for large analytical tables. Partition by date; cluster by the next 1–4 most common filter dimensions. The partition filter narrows to a time range; clustering narrows within that range.

Query structure optimisation

**Avoid SELECT ***: specify only the columns needed. BigQuery scans all columns in the referenced table when SELECT * is used — this matters for wide tables (100+ columns). dbt's explicit column selection in models, and well-defined SELECT lists in ad-hoc queries, are both important.

**Filter early**: apply the most selective filters as early as possible in the query. Filters on partitioned columns must appear in the WHERE clause of the outermost query — not buried in a subquery or CTE — to enable partition pruning.

**Efficient JOIN patterns**: BigQuery performs distributed joins using hash join. Performance is best when: smaller tables are on the probe side (RIGHT side of the JOIN), large tables are filtered before joining, and join conditions use equality operators on well-typed columns.

**Wildcard tables**: if you query multiple tables with a common prefix (events_2026_01, events_2026_02, etc.), use the _TABLE_SUFFIX wildcard to query them together with a suffix filter. This is more efficient than UNION ALL of individual table queries.

**Materialised views**: pre-aggregate frequently queried rollups into materialised views. BigQuery automatically maintains materialised views when the base table changes. Smart materialisation (queries against the materialised view read from pre-computed results) reduces scan bytes for aggregate dashboard queries.

Diagnosing with INFORMATION_SCHEMA

**Top queries by bytes processed (last 30 days)**:

SELECT query, SUM(total_bytes_processed) as total_bytes, COUNT(*) as execution_count FROM INFORMATION_SCHEMA.JOBS_BY_PROJECT WHERE creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY) GROUP BY 1 ORDER BY 2 DESC LIMIT 20

This returns the 20 queries with the highest cumulative bytes processed — your highest-ROI optimisation targets.

**Queries with full table scans**: look for queries where total_bytes_processed approaches or equals the table's storage size. These are queries with no partition pruning.

**Slot utilisation**: INFORMATION_SCHEMA.JOBS_BY_PROJECT.total_slot_ms divided by the query's execution time gives average slot count. Queries consuming thousands of slots may benefit from query restructuring or increased slot reservations.

On-demand vs capacity pricing

At ~50TB/day of query volume, BigQuery capacity reservations (slot-based pricing) often become cheaper than on-demand ($6.25/TB). The calculation: 50TB/day * $6.25/TB * 30 days = $9,375/month; a 100-slot flat-rate reservation costs ~$2,000/month. The break-even varies by query pattern — workloads with sustained high scan volume benefit most from reservations.

For the data architecture that determines what gets stored in BigQuery, see data lakehouse vs data warehouse and bigquery vs snowflake. For the transformation layer performance, see dbt best practices.

Our data architecture consulting practice optimises BigQuery environments — partitioning and clustering design, query analysis, and cost governance. Book a free 30-minute audit to discuss your BigQuery performance and cost situation.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →