Data Warehouse Cost Management: Controlling Cloud Data Infrastructure Spend

Cloud data warehouse costs scale with usage in ways that are hard to predict without active management. This guide covers the cost levers in Snowflake, BigQuery, and Redshift, the monitoring and alerting setup that makes cost problems visible early, and the governance practices that prevent runaway spend without blocking analytical work.

Cloud data warehouse costs have a property that on-premise infrastructure costs do not: they scale with usage in near real-time. A single analyst running an expensive query, a pipeline that suddenly processes 100x its normal data volume, or an extract refresh that has grown from 10GB to 100GB over 18 months — all of these appear on next month's bill without any budget approval process. This makes cost management an active, ongoing operational discipline, not a one-time architecture decision.

Understanding the Cost Structure

Each major warehouse has a different cost model, which determines what to optimise.

**Snowflake:** Compute costs are charged by the second of virtual warehouse runtime, plus storage per TB per month. The primary cost lever is virtual warehouse usage — how long warehouses run and at what size. Warehouses that are not auto-suspended run continuously and accrue cost even when idle.

**BigQuery:** On-demand pricing charges per TB scanned by queries. Storage costs are per TB per month (active vs long-term storage). The primary cost lever is bytes scanned per query. A single unoptimised query scanning a large table can be expensive; a high-concurrency environment with many analysts running queries multiplies that cost.

**Redshift:** Redshift is primarily priced by reserved or on-demand node hours — you pay for provisioned compute capacity regardless of utilisation. The cost lever is rightsizing the cluster for actual workload requirements and using auto-scaling to handle peak concurrency without over-provisioning at baseline.

Cost Monitoring Setup

Cost surprises are the enemy of cost management. The first investment is making costs visible in near real-time.

**Snowflake:** ACCOUNT_USAGE.QUERY_HISTORY, ACCOUNT_USAGE.WAREHOUSE_METERING_HISTORY, and ACCOUNT_USAGE.STORAGE_USAGE expose cost driver data. Build a monitoring dashboard or scheduled alert that shows:

- Daily credit consumption by warehouse, with week-over-week comparison

- Top queries by credits consumed in the last 7 days

- Warehouses that consumed credits while idle (running but not executing queries)

- Storage growth rate by database/schema

Set budget alerts in Snowflake's account settings so that you are notified when monthly spend exceeds defined thresholds.

**BigQuery:** INFORMATION_SCHEMA.JOBS provides per-query bytes processed and estimated cost. Build a weekly cost attribution query by user and project. Enable billing budget alerts in the Google Cloud console.

**Redshift:** AWS Cost Explorer provides cluster-level cost data. CloudWatch metrics show cluster utilisation (CPU, disk, concurrency). An underutilised cluster at 15% CPU most of the day is over-provisioned.

Compute Cost Optimisation

### Snowflake

**Auto-suspend and auto-resume:** Every virtual warehouse should have auto-suspend configured. A warehouse that has been idle for 5 minutes should suspend automatically — not run continuously. The default auto-suspend is often set too long (30 minutes); 2–5 minutes is appropriate for most analytical warehouses.

**Warehouse rightsizing:** The default assumption is that a larger warehouse means faster queries. This is true for queries limited by compute, but many analytical queries are limited by data scanning — a larger warehouse does not reduce the amount of data scanned. Right-size warehouses by running a sample of typical queries at different sizes and measuring execution time and credit cost. For most analytical workloads, an X-Small or Small warehouse is sufficient.

**Query result caching:** Snowflake caches query results for 24 hours. Identical queries (same SQL text, same data) return cached results without consuming compute credits. BI tool queries that refresh dashboards on schedule hit the cache if the underlying data has not changed. Maximise cache hit rate by ensuring BI tools send consistent SQL.

**Separate warehouses by workload type:** Use different warehouses for different workload types — one for interactive analyst queries, one for dbt production runs, one for high-concurrency BI tool queries. This prevents batch jobs from consuming the virtual warehouse's resources while analysts are running queries.

### BigQuery

Covered in detail in the BigQuery cost optimisation guide — partitioning, clustering, column selection, and materialised views are the primary techniques.

**Scheduled queries cost:** BigQuery scheduled queries (used for dbt jobs or periodic aggregations) scan data on every run. Ensure dbt models use incremental materialisation for large tables to avoid scanning the full table on every production run.

**Looker/Data Studio persistent derived tables:** BI tools that create persistent derived tables in BigQuery generate regular scan cost. Audit which PDTs exist and ensure they are serving a sufficient query volume to justify their compute cost.

### Redshift

**Workload Management and Concurrency Scaling:** Concurrency Scaling adds cluster capacity automatically during peak concurrency, but charges by the second. Monitor concurrency scaling usage and ensure it is triggered by genuine peak demand, not by inefficient queries that could be optimised.

**Pause/resume for development clusters:** Development and testing Redshift clusters that are not used 24/7 should be paused overnight and on weekends using scheduled start/stop. A $1,000/month Redshift cluster used only during business hours costs roughly $350/month when paused outside working hours.

Storage Cost Management

Storage is usually the smaller cost component compared to compute, but it accumulates without active management.

**Retention policies:** Establish and enforce data retention policies. Raw ingested data that is more than 2 years old and has been processed through the warehouse is rarely re-queried. Archive or delete raw data beyond the retention threshold. Cloud object storage (S3, GCS) is cheaper than warehouse storage for long-term archival.

**Duplicate data identification:** It is common for organisations to have multiple copies of the same data — the same source data ingested by two different pipelines, or the same intermediate table materialised in two schemas for historical reasons. Audit your schemas for tables with similar names and overlapping data. Eliminating duplicates reduces both storage cost and the maintenance overhead of keeping multiple copies in sync.

**Intermediate table cleanup:** dbt projects often accumulate stale views and tables from old model names, renamed models, or models that were removed from the project but whose database objects were never dropped. Run dbt clean or use the dbt --full-refresh flag periodically to clean up stale objects. In Snowflake, use SHOW TABLES to identify objects in the staging schema that are not referenced by the current dbt project.

Cost Attribution and Accountability

Cost attribution answers: which team, project, or use case is responsible for which spend? Without attribution, cost management decisions cannot be made with accountability.

**Snowflake:** Use resource monitors with per-warehouse credit quotas and alert thresholds. Assign warehouses to specific teams or use cases (a dedicated warehouse for the finance team's dbt models, a separate one for the BI tool). Resource monitors alert the warehouse owner when their usage exceeds defined thresholds.

**BigQuery:** Use project-level billing accounts and query labels to attribute costs to specific teams or use cases. A query with label analytics_team='finance' appears attributed to the finance team in billing reports.

**Redshift:** AWS Cost Allocation Tags on the Redshift cluster enable cost attribution in AWS Cost Explorer.

The Cost Review Process

Monthly cost review should be a standing practice for any organisation spending more than $5,000/month on cloud data infrastructure. The review covers:

- Total spend for the month vs. prior month vs. budget

- Top cost drivers (which warehouses, queries, or users account for the largest share)

- New cost increases (what changed this month that caused costs to grow?)

- Cost optimisation opportunities identified since the last review

Cost management without a review process is wishful thinking. The review is what turns monitoring data into action.

Our data architecture consulting practice includes cloud data warehouse cost optimisation — contact us to discuss cost management for your analytics infrastructure.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →