Cloud Data Warehouse Cost Optimization: How to Cut Spend Without Cutting Performance

Snowflake, BigQuery, and Redshift bills surprise organisations that build for scale without building for cost. Here is the systematic approach to cutting warehouse spend by 30–50% without degrading analytics performance.

The quick answer

Cloud data warehouse bills routinely surprise organisations. Snowflake, BigQuery, and Redshift all have cost models where unmanaged usage — large virtual warehouses left running, unoptimised queries scanning full tables, poorly designed incremental loads — results in bills 2–5x higher than necessary. The good news is that most organisations can reduce warehouse spend by 30–50% through a structured cost optimisation programme without degrading analytics performance. This guide covers the specific levers for each platform.

Why bills escalate

**Compute waste**: virtual warehouses or clusters left running when idle. A Snowflake medium warehouse (4 credits/hour) left running 24/7 costs ~$2,880/month — versus $192/month running only during business hours. Auto-suspend is often configured with too long a timeout (30 minutes instead of 5 minutes), leaving warehouses running for hours after the last query.

**Query inefficiency**: queries that scan full tables when partitioning or clustering would let them scan 1–5% of the data. A query scanning 1TB of data in BigQuery costs $5.00; the same query with proper partitioning might scan 20GB and cost $0.10. At scale, query scan optimisation delivers the highest ROI of any cost reduction measure.

**Materialisation mismatch**: dbt models or views that recompute expensive aggregations on every query because they are not materialised as tables. Alternatively, table materialisations that are rebuilt fully on every run when incremental would suffice.

**Storage inefficiency**: stale tables that are never queried but accumulate storage cost. Time travel retention set too high (Snowflake 90-day time travel on all tables adds significant storage cost for large, frequently updated tables).

**Extract and load volume**: Fivetran or Airbyte ingesting entire source tables on every sync rather than incremental deltas — both increasing ingestion costs and forcing warehouse transformations to process full table scans.

Snowflake cost optimisation

**Warehouse sizing**: start smaller than you think you need. Snowflake charges per second with a 60-second minimum per start. A small warehouse (2 credits/hour) that takes 4 minutes is cheaper than a medium warehouse (4 credits/hour) that takes 2 minutes — both cost the same amount. Benchmark query performance at different sizes before assuming larger is better.

**Auto-suspend**: set to 5 minutes for interactive query warehouses, 2 minutes for ETL warehouses that run in bursts. The default 10-minute auto-suspend wastes 5–8 minutes of credits per session. A 5-minute auto-suspend on a medium warehouse saves ~$45/month per warehouse compared to 10 minutes, assuming typical daily usage patterns.

**Multi-cluster warehouses**: use multi-cluster warehouses only for high-concurrency workloads (50+ concurrent users). For smaller teams, a single-cluster warehouse is sufficient and avoids the cost of spinning up additional clusters.

**Query profiling**: use the Query Profile in Snowflake's query history to identify expensive queries. Key signals: TableScan with no partition pruning (add clustering), Explode with high row count (review FLATTEN logic), large spill to local or remote storage (warehouse is undersized for the query).

**Clustering keys**: for large tables (500M+ rows) queried with consistent filter patterns (date range, customer_id), define a clustering key. Snowflake's automatic reclustering maintains the cluster over time. Without clustering, a query filtering by date scans every micro-partition; with clustering, it scans only the micro-partitions containing the date range.

**Search optimisation service**: for point lookup queries on large tables (SELECT where customer_id = X), the Search Optimization Service builds access paths that avoid full table scans. Costs ~1.5% of table storage per month; pays back quickly on warehouses doing high-volume point lookups.

**Time travel**: reduce time travel retention from 90 days (Enterprise default) to 14 days for large tables that do not require extended rollback capability. This reduces storage for fail-safe and time travel clone storage on large tables.

**Resource monitors**: set credit quotas and alerts at the warehouse level. A resource monitor that alerts at 80% of budget and suspends at 100% prevents runaway jobs from blowing past monthly budgets unnoticed.

BigQuery cost optimisation

**On-demand vs capacity pricing**: BigQuery on-demand pricing ($6.25/TB scanned) is variable. For predictable high-volume workloads, BigQuery reservations (slot-based capacity pricing) are typically cheaper above ~50TB/day of query volume. Evaluate your query scan volume to determine whether slot reservations pay off.

**Partitioning**: partition large tables by a date or timestamp column. BigQuery's on-demand pricing is based on bytes scanned — a query that filters by date on a partitioned table scans only the relevant partitions. This is the single highest-impact cost optimisation for most BigQuery deployments. Partition pruning can reduce scan volume by 99% on date-filtered queries.

**Clustering**: cluster tables on the columns most commonly used in WHERE and JOIN conditions. Clustering reduces scan volume within partitions. Works best when combined with partitioning on a date column and clustering on a second dimension (e.g., region or product_category).

**Require partition filter**: set a table option that requires queries to include a partition filter — prevents accidental full table scans from analysts who forget to add date filters.

**Materialised views**: for aggregate queries that run frequently (daily/weekly roll-ups, dashboard queries), materialised views pre-compute and cache results. BigQuery charges for the storage of the materialised view but the query reads from the cache rather than scanning the base table. Effective for high-frequency, identical queries.

**Query history analysis**: use INFORMATION_SCHEMA.JOBS_BY_PROJECT to identify the top 10 queries by bytes processed over the last 30 days. These are your optimisation targets — adding partitioning/clustering to the tables they scan produces immediate bill reductions.

Redshift cost optimisation

**Reserved instances**: Redshift provisioned clusters are significantly cheaper on 1-year or 3-year reserved pricing (35–63% discount vs on-demand). For stable production workloads, reserved pricing should always be used.

**Concurrency scaling**: disable Concurrency Scaling for workloads that can tolerate some queue time. Concurrency Scaling adds temporary compute capacity but at on-demand rates — unnecessary for workloads where queue delays are acceptable.

**Sort keys and distribution styles**: Redshift query performance and cost are heavily influenced by table sort keys (determines which rows are pruned based on filter conditions) and distribution style (determines how rows are distributed across nodes for join efficiency). Poorly designed sort keys and distribution styles cause full table scans and data redistribution during joins — both slow and compute-expensive.

**VACUUM and ANALYZE**: tables with many deletes and updates accumulate deleted rows (soft deletes) that Redshift still scans. Regular VACUUM reclaims storage and removes soft-deleted rows. ANALYZE keeps table statistics current for the query planner. Both are required maintenance for stable performance.

**Pause and resume**: Redshift Serverless and provisioned clusters can be paused when not in use (e.g., overnight, weekends). Pause/resume is a manual or scheduled operation — for clusters not needed 24/7, pausing during off-hours significantly reduces costs.

Cross-platform principles

**Tag and attribute costs**: tag Databricks clusters, Snowflake warehouses, and BigQuery projects with cost centre tags. Cost attribution makes it visible which teams or workloads are driving spend, enabling accountability and prioritisation of optimisation work.

**Compression and storage housekeeping**: audit tables that have not been queried in 90+ days. Drop or archive cold tables. For Snowflake, check storage usage with Snowflake's Account Usage views; for BigQuery, use INFORMATION_SCHEMA.TABLE_STORAGE; for Redshift, query SVV_TABLE_INFO.

**Incremental over full refresh**: for large fact tables, switch dbt models from full refresh to incremental materialisation. A 10-billion-row fact table rebuilt from scratch daily costs orders of magnitude more compute than an incremental model processing only the last day's data.

For the governance controls that support cost management, see snowflake architecture guide and bigquery vs snowflake. For the transformation layer optimisation, see dbt best practices.

Our data architecture consulting practice conducts cloud data warehouse cost optimisation engagements — analysing spend, identifying the highest-ROI opportunities, and implementing the changes. Typical outcomes are 30–50% spend reduction with 6–12 week payback. Book a free 30-minute audit to discuss your environment.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →