BigQuery Architecture Guide: Partitioning, Clustering, and Cost Control

BigQuery charges by bytes scanned. Every architectural decision — how tables are partitioned, whether clustering is applied, how queries are structured — directly affects the bill. This guide covers the key BigQuery design decisions and how to control costs without sacrificing query performance.

BigQuery is a serverless, columnar data warehouse that charges per byte scanned. Unlike traditional warehouses where you provision capacity and pay for it regardless of utilisation, BigQuery charges you for the queries you run. This pricing model has architectural consequences: every design decision — how you partition tables, whether you cluster them, how you structure queries — directly affects your monthly bill. This guide covers the key BigQuery architectural decisions and how to build a cost-efficient BigQuery environment without compromising query performance.

The BigQuery pricing model

BigQuery offers two compute pricing models:

**On-demand (per-query)**: You pay $5–$6.25 per terabyte of data scanned. The first 1TB per month is free. On-demand is appropriate for variable workloads, small organisations, or environments where query volume is unpredictable.

**Capacity reservations (slots)**: You purchase a fixed number of BigQuery slots (units of compute) per month. 100 slots is the minimum; enterprise commitments run into thousands of slots. You pay for the slots regardless of whether you use them. Capacity reservations make sense when monthly query spend exceeds approximately $1,500/month — at that volume, reservations are typically more cost-effective.

Storage pricing: $0.02/GB/month for active storage (queried within last 90 days), $0.01/GB/month for long-term storage (not queried for 90+ days). Storage is significantly cheaper than compute — the cost driver is almost always queries, not storage.

Partitioning

Table partitioning divides a table into discrete segments (partitions) based on a column value. Queries that include a filter on the partition column only scan the matching partitions — data in other partitions is not billed.

BigQuery supports four partition types:

**Date/timestamp partitioning**: The most common. Partitions the table by a DATE or TIMESTAMP column. Specify the granularity: DAY, MONTH, or YEAR. A query filtering on event_date BETWEEN '2026-05-01' AND '2026-05-31' scans only the 31 daily partitions matching those dates, not the full table.

**Integer range partitioning**: Partitions by an integer column with defined start, end, and interval values. Useful for user ID-based partitioning or other numeric identifiers.

**Ingestion time partitioning**: Automatically partitions rows based on when they were inserted into BigQuery, using _PARTITIONTIME pseudo-column. No partition column required on the table. Useful for append-only event streams where the query window is typically recent.

**Partition expiration**: Set a partition expiration on the table definition. Partitions older than the specified number of days are automatically deleted. Effective for retention compliance and cost control on high-volume tables.

Always partition tables larger than 10GB that have a natural date column. The cost savings and query performance improvement are immediately material at that scale.

Clustering

Clustering organises data within each partition (or within the full table if no partition is defined) by one or more columns. When a query includes a filter on a clustered column, BigQuery uses block-level metadata to skip blocks that do not match the filter — reducing bytes scanned further than partitioning alone.

Clustering is defined at table creation or modification. Up to four clustering columns can be specified; order matters — cluster on the columns used in the most frequent query filters first.

**Partitioning vs clustering**: Partitioning prunes entire partitions (can save 90%+ of bytes scanned for date-filtered queries). Clustering prunes blocks within a partition (typically saves 20–60% additionally). Use both together for large tables with frequent filter patterns.

**Automatic reclustering**: BigQuery automatically maintains clustering as new rows are inserted. There is no manual CLUSTER operation required — BigQuery handles it in the background.

Nested and repeated fields

BigQuery supports native JSON-like nested and repeated fields via STRUCT (nested record) and ARRAY (repeated) types. These allow denormalisation in the schema — instead of a separate line_items table joined to an orders table, you can store line_items as a nested ARRAY within the orders row.

For analytical queries, this pattern eliminates the need for joins between fact and sub-fact tables — a join that in traditional warehouses is one of the most expensive operations. BigQuery processes nested arrays efficiently in a single scan, whereas a separate table would require a join (and potentially a shuffle for distributed execution).

The trade-off: normalised schemas are more familiar to SQL practitioners and easier to query without UNNEST syntax. Nested schemas are more efficient for BigQuery but require UNNEST to flatten arrays for row-level analysis.

BigQuery views and materialised views

Standard views in BigQuery are SQL aliases — the underlying query runs at view query time. Materialised views precompute the query result and store it. Queries against a materialised view read the precomputed result; BigQuery automatically refreshes the materialised view as underlying data changes.

Materialised views are appropriate for expensive aggregate queries that are run frequently — a daily summary of user activity, a product-level revenue rollup. The materialised view reduces bytes scanned to the pre-aggregated result size rather than the full base table.

**Smart tuning**: BigQuery's smart tuning feature automatically rewrites queries to use materialised views when it determines that the materialised view covers the query — without requiring changes to the query. Define materialised views for common aggregate patterns; BigQuery routes qualifying queries automatically.

Authorised views and dataset access

BigQuery's security model is dataset-level. Access is granted on datasets; tables inherit dataset permissions. Authorised views allow sharing query results without sharing the underlying tables — the view runs under a service account that has table access, but the viewer only has access to the view, not the base table.

This is BigQuery's equivalent of Snowflake secure views. For analytical sharing where the recipient should see aggregated or filtered data but not the raw table, authorised views are the correct pattern.

Cost control best practices

**SELECT * is expensive**: Always select only the columns you need. In a columnar warehouse, selecting all columns defeats column pruning and scans all data. Audit SQL that uses SELECT * — replace with explicit column lists.

**Query caching**: BigQuery caches query results for 24 hours. Identical queries within the cache window are free. Enable caching for dashboards with frequently repeated queries against slowly changing data.

**Partitioned table expiration**: Set table and partition expiration on staging and intermediate tables. Staging tables that are rebuilt daily do not need to accumulate indefinitely.

**Quotas**: Set project-level and user-level daily query byte quotas via IAM policies. Prevents runaway ad-hoc queries from analysts from generating unexpected costs.

**INFORMATION_SCHEMA for cost audit**: Query INFORMATION_SCHEMA.JOBS_BY_PROJECT to identify the most expensive queries by user, by table, and by time period. Cost optimisation starts with understanding what is actually expensive.

For the broader data warehouse design context, see data warehouse design and bigquery vs snowflake. Our data architecture consulting practice optimises BigQuery environments for cost and performance — book a free architecture review.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →