BigQuery Best Practices: Performance, Cost, and Governance at Scale

The BigQuery practices that separate well-run data platforms from expensive, slow ones — partitioning strategy, clustering design, slot management, query cost control, data lifecycle policies, IAM governance, and the patterns that reduce monthly BigQuery bills by 50% or more.

BigQuery is deceptively easy to start using and surprisingly hard to use well. A team that provisions BigQuery and starts running queries without understanding its storage and execution model can generate five-figure monthly bills for work that should cost hundreds of dollars. This guide covers the practices that separate well-run BigQuery environments from expensive, slow ones.

Partitioning: The Most Important Design Decision

Partitioning is the single highest-impact optimisation in BigQuery. On-demand pricing charges for bytes scanned — a query that scans a 10TB table costs roughly $62.50 regardless of how many rows it returns. Partitioning eliminates entire date ranges from the scan when queries filter on the partition column.

**Partition by the column you filter on most.** For event tables, this is almost always the event timestamp. For slowly changing dimension tables, it may be a snapshot date. For a transactions table, it is typically transaction_date or created_at.

**Use DAY partitioning for most tables.** HOUR partitioning is appropriate for tables with very high ingestion rates (billions of rows per day) where you need intra-day query precision. MONTH or YEAR partitioning is appropriate for tables with low write frequency but very long retention.

**Require partition filters.** BigQuery allows setting require_partition_filter on a table, which forces queries to include a filter on the partition column or fail. This prevents accidental full-table scans from queries that forget date filters:

ALTER TABLE dataset.events

SET OPTIONS (require_partition_filter = true);

The most common BigQuery cost accident is a scheduled query or BI tool query that works correctly but silently scans the full unpartitioned table on every execution. Require partition filters surface these queries as errors rather than silent cost generators.

**Partition pruning requires direct column filters.** A filter WHERE DATE(event_time) = '2024-01-01' does not prune on a TIMESTAMP partition column — it requires evaluating the expression for every row. Use WHERE event_time BETWEEN '2024-01-01' AND '2024-01-02' instead. Similarly, WHERE event_time = CURRENT_DATE() works; WHERE DATE(event_time) = CURRENT_DATE() does not prune.

Clustering: The Second Design Decision

Clustering co-locates rows with similar column values within each partition, allowing BigQuery to skip column blocks that do not match query filters. It is complementary to partitioning — partitioning eliminates entire date partitions; clustering eliminates blocks within partitions.

Choose clustering columns based on the columns most commonly used in WHERE, GROUP BY, and JOIN clauses after the partition filter. Typical choices: customer_id, region, product_category, event_type.

BigQuery supports up to four clustering columns, applied in order. The first clustering column provides the most benefit; each subsequent column provides progressively less. Cluster on the most selective filter column first.

Clustering is maintained automatically by BigQuery as rows are inserted. For large tables with heavy insert load, BigQuery periodically runs automatic re-clustering to maintain the columnar sort order. There is no additional cost for automatic re-clustering on tables below 1TB.

Slot Management and Pricing Models

**On-demand pricing:** $6.25 per TB scanned. No capacity reservation. Suitable for teams with irregular query patterns, exploration workloads, and total monthly usage under roughly $5,000/month. On-demand slot allocation is shared across all BigQuery users in a project.

**Flat-rate (capacity) pricing:** You purchase dedicated slot reservations — 100-slot increments for standard tier, 500 for enterprise. Flat-rate is cost-effective when query volume is consistent and high. The break-even point against on-demand is approximately 30–40 TB scanned per day.

**BigQuery editions (2023+):** Google replaced the legacy flat-rate model with editions — Standard, Enterprise, and Enterprise Plus — with per-slot-hour pricing on annual, monthly, or flex commitments. Flex commitments allow scaling slot capacity up and down by the minute, enabling burst capacity without permanent over-provisioning.

**Slot reservations and assignments:** Under capacity pricing, slots are allocated to reservations (pools), and reservations are assigned to projects or folders. This allows billing separation between departments, preventing one team's heavy analytical workload from consuming all available slots.

Cost Control Practices

**Use the query validator.** Before running any large exploratory query, check the bytes-processed estimate in the BigQuery console. This is a reliable pre-execution cost indicator. Training your team to check this before running queries is the highest-leverage cost awareness practice.

**Set project-level quotas.** BigQuery allows per-user and per-project query cost quotas (daily TB limits). Set these at the project level to prevent individual queries from consuming disproportionate spend:

In the BigQuery API or console, set per-user quota limits at the project level under APIs & Services.

**Avoid SELECT *.** This is universal SQL advice, but in BigQuery it is a cost issue, not just style. Selecting unused columns from a wide table adds real bytes to the scan cost. Always project only the columns needed.

**Use INFORMATION_SCHEMA for cost attribution.** INFORMATION_SCHEMA.JOBS_BY_PROJECT (and JOBS_BY_USER, JOBS_BY_FOLDER) contains historical query metadata including bytes_processed, total_slot_ms, user_email, and statement_type. Build a cost attribution dashboard from this table to identify the users and queries generating the most spend:

SELECT

user_email,

SUM(total_bytes_processed) / POW(10, 12) as tb_scanned,

COUNT(*) as query_count

FROM region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT

WHERE DATE(creation_time) = CURRENT_DATE()

GROUP BY 1

ORDER BY 2 DESC;

**Use BI Engine for dashboard acceleration.** BigQuery BI Engine caches results in memory and serves queries sub-second without slot consumption. For BI tools (Looker, Looker Studio, Tableau) running repetitive queries against small aggregated result sets, BI Engine eliminates most query cost. BI Engine reservations start at $300/month for 1GB of in-memory cache — cost-effective for any environment with heavy dashboard usage.

Data Lifecycle and Storage Costs

BigQuery storage pricing: $0.02/GB/month for active storage, $0.01/GB/month for long-term storage (tables or partitions unchanged for 90+ days). For data that does not change after a certain point, ensuring it transitions to long-term pricing halves the storage cost.

**Partition expiration** automatically deletes old partitions after a specified number of days:

ALTER TABLE dataset.events

SET OPTIONS (partition_expiration_days = 365);

This is appropriate for raw event tables where you retain one year of history. It eliminates the need for manual cleanup processes and ensures old data automatically transitions to cheaper long-term pricing before being deleted.

**Table expiration** deletes temporary tables automatically — useful for tables created by queries writing intermediate results:

CREATE TABLE dataset.temp_analysis

OPTIONS (expiration_timestamp = TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 1 DAY))

AS SELECT ...;

Temporary tables and query results are already managed by BigQuery's 24-hour result cache. If your team creates explicit temporary tables for multi-step analysis, ensure expiration is set.

IAM and Access Governance

BigQuery access control uses Google Cloud IAM. Roles are granted at the project, dataset, or table level. Key roles:

**bigquery.dataViewer:** Can query table data. Grant to analysts who need read access to specific datasets.

**bigquery.dataEditor:** Can query and modify tables. Grant to service accounts running dbt or ETL pipelines.

**bigquery.jobUser:** Can run BigQuery jobs (queries). Required in addition to data roles — a user with dataViewer but not jobUser cannot run queries.

**bigquery.admin:** Full control. Grant sparingly; restrict to data platform team.

**Column-level security** via policy tags restricts access to sensitive columns (PII, financial data) independent of table-level access. Analysts granted table access see masked values or query errors for policy-tagged columns unless they are also granted the policy tag fine-grained reader role.

**Row-level security** via row access policies restricts which rows a user can query. Useful for regional data residency requirements or department-level data segregation without maintaining separate tables per team.

**Authorised views and datasets** allow sharing query results without exposing underlying tables. An authorised view can query a dataset that the view's users cannot directly access. This is the correct pattern for sharing curated analytics with teams that should not have access to raw operational data.

Monitoring and Alerting

**INFORMATION_SCHEMA for operational queries:** INFORMATION_SCHEMA.JOBS provides query-level metadata including failures, slot consumption, and bytes processed. Monitor INFORMATION_SCHEMA for long-running queries (total_slot_ms / 1000 > 300 indicates a 5-minute query), frequent failures, and queries exceeding your expected cost threshold.

**BigQuery audit logs in Cloud Logging:** All BigQuery API calls are logged to Cloud Logging. Audit logs capture job submissions, data access, and configuration changes. Export audit logs to a BigQuery table (using Log Router) for cost-efficient historical analysis of access patterns and unusual activity.

**Cloud Monitoring alerts:** Set budget alerts on your BigQuery billing to trigger when monthly spend exceeds thresholds. Set metric alerts on bigquery.googleapis.com/storage/table_count and bigquery.googleapis.com/storage/stored_bytes to track storage growth.

A well-governed BigQuery environment should have cost attribution by user and project, partition filters enforced on high-value tables, BI Engine for dashboard workloads, and an automated process for reviewing INFORMATION_SCHEMA.JOBS for anomalous queries. The difference between a BigQuery environment that costs $1,000/month and one that costs $15,000/month for equivalent analytical output is almost always a combination of missing partitioning, missing column projection discipline, and no cost attribution to surface the worst offenders.

For data teams looking to reduce BigQuery costs or improve data platform governance, our data architecture consulting practice offers cloud warehouse optimisation assessments — contact us to discuss your BigQuery environment.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →