BlogData Architecture

What Is Data Warehouse Cost Optimization? Reducing Cloud Analytics Spend

James Okafor
James Okafor
Senior Data Engineer
·June 8, 202810 min read

Cloud data warehouse costs can grow rapidly as data volumes, query workloads, and team usage expand. This guide explains the primary cost drivers in Snowflake, BigQuery, and Redshift, and the optimization strategies that reduce spend without degrading analytical capability.

Cloud data warehouse costs grow in proportion to data volume, query workload, and team usage — and they can grow faster than anticipated when governance is weak. Organizations that deployed Snowflake or BigQuery without cost controls regularly encounter surprise bills as data volumes scale, new teams onboard, and analytical workloads expand.

Cost optimization is not about restricting analytical capability. It is about eliminating waste — compute that runs unnecessarily, queries that read more data than they need, storage that holds data no longer used — while preserving or improving the analytical experience.

Where Cloud Warehouse Costs Come From

Understanding cost drivers is the prerequisite for optimization. The major cost components differ slightly by platform:

### Snowflake Cost Drivers

**Virtual warehouse compute** — the primary cost driver for most Snowflake customers. Credits are consumed while virtual warehouses are running, regardless of whether they are actively processing queries. An auto-suspend timeout that is too long burns credits while the warehouse sits idle. An under-sized warehouse that runs four times as long as a larger warehouse would have run on the same query consumes more credits.

**Storage** — relatively low cost but grows continuously as data accumulates. Long-lived raw tables, failed pipeline artifacts, and unused data that is never cleaned up are common storage waste sources.

**Cloud services** — the overhead layer that handles authentication, query compilation, and metadata operations. Cloud services credits are typically small relative to compute but can grow with high-frequency automated queries (monitoring, metadata API polling).

**Data transfer** — cross-region or cross-cloud data transfer incurs network egress charges from the underlying cloud provider (AWS, GCP, Azure) in addition to Snowflake costs.

### BigQuery Cost Drivers

**Query bytes processed** (on-demand billing) — BigQuery bills per byte of data read by each query. Queries that scan full tables of large datasets generate costs proportional to the table size, regardless of how many rows they return.

**Slot usage** (flat-rate billing) — for organizations on capacity reservations, slot utilization against the committed capacity determines effective cost per query.

**Storage** — split between active storage (data modified in the last 90 days) and long-term storage (data not modified in more than 90 days, priced at roughly half the active rate). Partitioned tables where old partitions are infrequently modified are automatically billed at the long-term rate.

**Streaming inserts** — BigQuery's streaming insert API is billed per row inserted. High-frequency operational writes via the streaming API can accumulate significant cost. Batch loads from Cloud Storage are free.

### Redshift Cost Drivers

**Reserved or on-demand node hours** — cluster nodes run continuously in provisioned Redshift (unlike Snowflake's suspend capability). Unused clusters still incur full costs. Redshift Serverless bills per compute unit-second, similar to BigQuery.

**Spectrum queries** — querying external S3 data via Redshift Spectrum is billed by bytes scanned.

Optimization Strategies by Platform

### Snowflake Optimization

**Aggressive auto-suspend.** The default auto-suspend timeout is ten minutes. For most workloads, two minutes or less is appropriate. Each minute of unnecessary idle time is direct cost waste. For warehouses that serve interactive BI dashboards, the auto-resume delay (typically two to three seconds) is not user-visible.

**Right-size virtual warehouses.** A query that takes 90 seconds on an XS warehouse takes 45 seconds on a Small warehouse — but costs the same because the larger warehouse runs for half the time. For queries where latency matters, larger warehouses are not more expensive. For bulk loads and transformations where latency does not matter, smaller warehouses can reduce cost by running longer.

**Use separate warehouses for separate workloads.** A single large warehouse shared by BI queries and dbt transformation jobs cannot be right-sized for either workload and cannot be suspended when either is idle. Separate warehouses allow each workload to be sized and governed independently.

**Enable result caching.** Snowflake caches query results for 24 hours. Repeated identical queries return cached results at no compute cost. For BI dashboards that run the same queries repeatedly, result caching can eliminate a large fraction of compute consumption.

**Cluster tables by high-cardinality filter columns.** Snowflake's micro-partition architecture provides pruning for query filters on clustered columns. A table clustered by date allows date-filtered queries to skip the vast majority of micro-partitions, dramatically reducing data scanned and credits consumed.

### BigQuery Optimization

**Use partitioned and clustered tables.** Partitioning by date and clustering by high-cardinality filter columns allows queries to specify filters that skip partitions and clusters not needed. This is the highest-leverage BigQuery optimization. A two-year transaction table partitioned by date allows a query filtered to "last 30 days" to scan 4% of the data.

**Avoid SELECT star.** BigQuery scans every column in a SELECT star query. On tables with many columns, explicitly listing only the needed columns can reduce bytes scanned by an order of magnitude.

**Materialize expensive intermediate results.** Queries that run repeatedly and scan the same large underlying tables should be materialized as BigQuery materialized views or as tables refreshed on a schedule. The expensive scan runs once; all subsequent queries against the materialized result incur minimal cost.

**Use table expiration for temporary tables.** BigQuery allows tables to be configured with an expiration date. Development tables, intermediate results, and exploration datasets that are created and then forgotten contribute to both storage cost and query surface area. Setting automatic expiration prevents accumulation.

**Switch long-term storage to the flat-rate.** For organizations with large, stable datasets that are queried frequently, flat-rate (capacity) reservations for BigQuery often produce better economics than on-demand billing at scale.

### Shared Optimization Strategies

**Enforce query governance.** Without query governance, individual analysts can write and execute queries that scan terabytes of data for exploratory analysis. Query policies that set maximum bytes billed per query, require partitioned columns in WHERE clauses, and flag full-table scans above a threshold prevent runaway costs from experimental queries.

**Implement warehouse-level budgets and alerts.** Both Snowflake (resource monitors) and BigQuery (budgets and alerts in Cloud Billing) support automated alerting when spending exceeds defined thresholds within a period. Alerts that fire before a budget is exhausted allow intervention before the overage occurs.

**Audit unused tables and objects.** Large data warehouses accumulate tables that were created for a project and never used again, views that reference queries run once, and schema artifacts from failed experiments. Regular audits using information_schema views (which expose table creation dates, last modification dates, and query history) identify objects that can be dropped without analytical impact.

**Review dbt model materialization strategy.** dbt models that are materialized as tables incur storage costs and refresh costs on every pipeline run. Views avoid storage costs but add query time. Incremental models pay a cost for the merge/upsert operation on each run but avoid full-table replacement. Regularly reviewing which models justify which materialization strategy as data volumes change avoids unnecessary costs.

Our data architecture and cloud engineering practices conduct warehouse cost reviews and implement optimization programs for organizations where cloud analytics spend has grown beyond expectations. Contact us to discuss your warehouse cost optimization.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →