Cloud FinOps for Data Teams: Managing and Reducing Cloud Data Costs

The FinOps practices that data teams can implement immediately to reduce cloud data costs — warehouse compute rightsizing, storage lifecycle policies, query cost attribution, and the governance framework that prevents cloud cost surprises.

Cloud data infrastructure — warehouses, data lakes, streaming platforms, compute clusters — is one of the largest and fastest-growing line items in enterprise technology budgets. Unlike traditional software where costs are predictable and fixed, cloud data costs scale with usage in ways that are difficult to predict and easy to let run out of control. A single misconfigured pipeline, an unoptimised query pattern, or a forgotten development cluster can add tens of thousands of dollars per month to a cloud bill.

FinOps (Financial Operations) for data teams is the discipline of managing cloud data costs deliberately — understanding where costs come from, attributing them to the right teams and workloads, and optimising the cost-performance tradeoff rather than defaulting to either unlimited spending or blunt cost-cutting.

Understanding Cloud Data Cost Drivers

Different cloud data services have different cost models, and optimisation strategies differ accordingly.

**Snowflake:** the primary cost drivers are virtual warehouse compute (priced per credit per hour) and storage (priced per TB per month). Warehouse credits are consumed when a warehouse is running — whether queries are executing or the warehouse is idle. An always-on large warehouse with occasional queries wastes significant compute. The Snowflake FinOps lever: warehouse auto-suspend, right-sizing, and multi-cluster auto-scaling.

**BigQuery:** two pricing models — on-demand (pay per TB queried) and flat-rate/editions (reserved slots). On-demand is economical for variable, unpredictable workloads; flat-rate is better for consistent, predictable query volumes. The BigQuery FinOps lever: query optimisation to reduce bytes processed, partitioning to reduce scan scope, materialized views to reduce redundant aggregation.

**AWS Redshift:** compute is the primary cost driver (node hours). Redshift pricing is based on node type and count — you pay for running nodes regardless of utilisation. The Redshift FinOps lever: pause/resume for development clusters, right-sizing node type and count, serverless for variable workloads.

**Databricks:** costs accrue from DBU (Databricks Unit) consumption across clusters — Jobs compute (lower DBU rate), All-Purpose compute (higher rate for interactive work). The Databricks FinOps lever: use Jobs compute for scheduled workloads instead of All-Purpose, auto-terminate idle clusters, right-size cluster node types.

**Cloud storage (S3, ADLS, GCS):** storage costs are relatively low (cents per GB per month) but data transfer costs — egress from cloud storage to compute in a different region, cross-region replication — can be substantial. The storage FinOps lever: same-region compute for S3/ADLS/GCS, lifecycle policies to tier old data to cheaper storage classes, compression to reduce storage volume.

Cost Attribution: Knowing Who Spends What

Cost attribution — assigning cloud costs to specific teams, projects, or workloads — is the foundation of FinOps. Without attribution, cloud costs are a shared pool that no individual team is accountable for. With attribution, teams can be held accountable for their spend and incentivised to optimise.

**Warehouse tagging.** In Snowflake, create separate warehouses per team or workload and tag them. Warehouse cost (credits consumed) is then attributable to the team that owns the warehouse. For BigQuery, use project-level billing with separate projects per team. For Databricks, use cluster tags and the usage dashboard.

**Query cost attribution.** Some cloud warehouses provide per-query cost visibility: Snowflake's QUERY_HISTORY includes credits used per query; BigQuery's INFORMATION_SCHEMA.JOBS includes bytes processed. Aggregate per-query costs by user, by service account, or by dbt model to identify the most expensive workloads.

**Chargebacks vs showbacks.** A chargeback model bills individual teams for their cloud consumption, creating strong incentives to optimise. A showback model shares consumption data without billing, creating softer incentives. For data teams, showback is usually the appropriate starting point — teams can see their consumption and work to optimise without the organisational friction of internal billing.

Warehouse Compute Rightsizing

The most impactful FinOps lever for most organisations: matching warehouse size to actual query requirements.

**Snowflake warehouse rightsizing.** Snowflake warehouses range from X-Small to X6-Large, each doubling in cost and compute. Most analytical queries do not need a Large or X-Large warehouse — they complete equally fast on a Small or Medium because the bottleneck is I/O, not compute. Profile query execution on different warehouse sizes using the query history. If a query completes in 4 seconds on a Large and 5 seconds on a Medium, the Medium saves 50% of compute cost for that query.

Auto-suspend settings are equally important: a warehouse that auto-suspends after 1 minute of idle instead of the default 10 minutes can reduce idle compute costs by 80% for intermittent workloads. Most development warehouses should auto-suspend after 1–2 minutes.

**BigQuery slot optimisation.** For flat-rate BigQuery, monitor slot utilisation. Unused reserved slots are wasted spend. For on-demand pricing, monitor bytes billed per query — queries that process terabytes due to missing partition filters are the largest cost drivers.

Query Cost Optimisation

In on-demand pricing models (BigQuery, Athena), query cost is directly tied to data processed. Query optimisation reduces both cost and latency.

**Partition pruning.** Ensure queries that should filter to a partition always include a partition filter. A query on a 3-year events table that forgets the date filter processes the full 3 years instead of 1 day — potentially 1000x more data and cost.

**Column projection.** SELECT * is expensive in columnar databases — it reads all columns, even those not used. SELECT only the columns needed.

**Clustering and materialised views.** For frequently-run reports with known filter patterns, materialised views or clustering keys reduce bytes processed per query by enabling data skipping.

**Query result caching.** Cloud warehouses cache query results. Identical queries run within the cache window (typically 24 hours in Snowflake) return the cached result — zero compute cost. Design BI tool query patterns to maximise cache hits.

Development vs Production Cost Separation

Development workloads — exploratory analysis, pipeline testing, dbt development — often run on the same infrastructure as production, inflating production costs and making attribution difficult. Separate development from production:

- Separate Snowflake warehouses (or BigQuery projects) for development vs production

- Lower-tier development warehouses (XS or S in Snowflake) that auto-suspend aggressively

- Developer notebooks on smaller cluster configurations

- dbt development against DuckDB locally where possible (zero cloud cost)

Establishing clear conventions that development activity costs N% of production cost, attributable to the data engineering team budget, makes FinOps reporting clean and enables meaningful trend analysis.

Storage Lifecycle Management

Object storage cost (S3, ADLS, GCS) accumulates silently. Old data that is never accessed still costs money. Lifecycle policies automatically transition data to cheaper storage tiers or delete it after defined periods:

- S3 Standard → S3 Infrequent Access after 30 days: ~45% cheaper per GB

- S3 Infrequent Access → S3 Glacier after 90 days: ~75% cheaper than Standard

- Delete after 7 years (or whatever your retention requirement is)

Separately from lifecycle policies, audit your data lake for orphaned data — Parquet files from pipeline runs that were interrupted, development data that was never cleaned up, duplicate extracts. These accumulate over years and can represent significant wasted storage spend.

For cloud data infrastructure cost optimisation and FinOps framework design, our cloud engineering services and data architecture consulting practices can help — contact us to discuss your cloud cost management requirements.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →