BlogData Engineering

Snowflake Performance Optimization: Query Tuning, Clustering, and Warehouse Sizing

James Okafor
James Okafor
Data & Cloud Engineer
·June 25, 202612 min read

Slow Snowflake queries and oversized warehouses are common and fixable. Here is the systematic approach to diagnosing and resolving Snowflake performance issues.

The quick answer

Snowflake performance problems almost always trace to one of three root causes: queries scanning more data than necessary (solved with clustering and partitioning), queries too large for the warehouse handling them (solved with correct warehouse sizing), or poorly structured SQL (full table scans, inefficient joins, missing filter push-down). This guide covers the systematic approach: diagnose with Query Profile, fix the highest-impact issues first, verify with benchmark queries.

Diagnosing with Query Profile

Every Snowflake query has a Query Profile — accessible from query history in the Snowflake UI (Classic Console or Snowsight). The profile shows the execution plan as a tree of operators, with time and row counts at each node. The most important signals:

**Bytes scanned vs bytes in table**: if a query scans 100% of a large table, there is no pruning happening. The filter conditions are not using the table's clustering or micro-partition statistics. This is the highest-priority signal.

**Spill to local storage / spill to remote storage**: spill means the intermediate results do not fit in the warehouse's in-memory working space and overflow to disk. Spill to local storage is less severe; spill to remote storage (S3/GCS) is very expensive — an order of magnitude slower than in-memory processing. Spill usually means the warehouse is too small for the query's intermediate result size, or the query itself needs optimisation.

**Partitions scanned vs partitions total**: under the TableScan node in the profile, this ratio shows pruning effectiveness. 100/50,000 partitions scanned is excellent; 45,000/50,000 is a problem.

**Most expensive node**: the profile highlights the most time-consuming operator. Focus optimisation on the bottleneck node, not the overall query structure.

**Explode/FLATTEN performance**: queries using FLATTEN on VARIANT/array data can be slow when the array is large. Check the row count expansion at the Explode node.

Clustering keys

Micro-partition pruning is Snowflake's primary mechanism for avoiding full table scans. Snowflake stores table data in micro-partitions (50–500MB compressed), and each micro-partition records the min/max values of every column. When a query filters on a column, Snowflake skips micro-partitions whose min/max range does not overlap the filter — this is partition pruning.

Clustering improves pruning by sorting data so that rows with similar values for the cluster key are physically co-located in the same micro-partitions. Without a clustering key, rows are inserted in arrival order — a query filtering on date range might find relevant rows scattered across every micro-partition in the table.

**When to add a clustering key**: for tables over 500M rows that are queried with consistent filter patterns. Common clustering keys: date/timestamp columns (most analytical queries filter by time period), foreign keys used in dashboard filters (customer_id, product_category), status columns used in WHERE clauses.

**Clustering key selection**: choose columns that appear most frequently in WHERE and JOIN conditions. Do not choose high-cardinality columns with completely random values (e.g., a GUID primary key with no range queries). Do not choose columns with very low cardinality (a boolean column with two values clusters poorly).

**Automatic clustering**: once a clustering key is defined, Snowflake Automatic Clustering maintains the cluster over time as new data is inserted. This consumes compute credits (billed separately from queries) — typically 15–30% of the table's monthly compute cost for frequently updated tables. Monitor automatic clustering credit consumption in Account Usage views.

Warehouse sizing

The relationship between warehouse size and query performance is not linear. Doubling the warehouse size (from Medium to Large, 4 to 8 credits/hour) does not double query speed for most queries — it provides more CPU cores for parallelism and more memory to avoid spill.

**When larger warehouses help**: queries that spill to storage (more memory eliminates spill), queries with heavy parallelism (large aggregations over many rows), dbt runs with many concurrent model builds, and high-concurrency user workloads (Snowflake multi-cluster handles this better than manual upsizing).

**When larger warehouses do not help**: simple queries on small tables (the bottleneck is network, not compute); queries where the bottleneck is partition scanning (clustering fixes this, not warehouse size); queries waiting in queue (multi-cluster warehouses fix queuing, not upsizing).

**Right-sizing approach**: run a representative benchmark set of your slowest queries at multiple warehouse sizes. Compare query wall time and total credit consumption (smaller warehouse x longer time vs larger warehouse x shorter time). The optimal size is often one step larger than where spill disappears from the Query Profile.

SQL optimisation patterns

**Filter push-down**: put the most selective filters early in the query. Snowflake's optimizer pushes filters down the plan, but complex subquery structures or non-standard SQL can prevent this. Ensure date range filters on large tables are in the top-level WHERE clause, not in nested subqueries.

**Avoid SELECT STAR on large tables**: SELECT * forces Snowflake to read every column from every scanned micro-partition. Snowflake uses columnar storage — reading only the columns needed by the query is significantly faster than reading all columns. Specify only needed columns.

**Efficient join order**: join the smallest table first when building hash tables. For a fact table joined to a dimension table, filter the fact table first (reduce its row count as early as possible) before joining to dimensions. Snowflake's optimizer handles most join ordering automatically, but extremely complex queries benefit from manual join order specification.

**Materialise expensive subqueries**: if the same expensive subquery appears multiple times in a query or is used in multiple downstream queries, materialise it as a temporary table or Snowflake dynamic table (for continuously refreshed materialised views). Recomputing the same expensive aggregation 10 times is avoidable.

**Use LIMIT with ORDER BY carefully**: SELECT ... ORDER BY col LIMIT 100 on a large table requires Snowflake to sort the entire result set before applying the limit. If the use case allows, add a WHERE clause to filter to a manageable subset before sorting.

Search Optimization Service

For workloads with high-frequency point lookup queries — SELECT where customer_id = 12345 or SELECT where order_id = 'ORD-99887766' — the Search Optimization Service builds access paths (similar to indexes in traditional databases) that allow Snowflake to find specific rows without scanning the full table.

Search Optimization costs ~1.5% of the table's storage per month. For tables with frequent point lookups, this typically pays back within the first day of reduced warehouse usage. Enable it with ALTER TABLE t ADD SEARCH OPTIMIZATION and monitor with SYSTEM$ESTIMATE_SEARCH_OPTIMIZATION_COSTS before enabling.

Materialised views and dynamic tables

**Materialised views**: pre-compute and store the result of a view definition. Queries against the materialised view read the pre-computed result rather than recomputing against the base tables. Automatically maintained by Snowflake when the base table changes. Best for: aggregation queries run frequently by many BI users; flattened VARIANT structures used in dashboards.

**Dynamic tables**: Snowflake's newer materialisation feature — define a transformation SQL query and a target refresh lag (e.g., "refresh within 5 minutes of changes"). Dynamic tables are closer to continuous materialised views. Best for: real-time or near-real-time dashboard data, replacing expensive per-query aggregations.

Extract and load optimisation

Poorly structured extracts create performance problems at the warehouse layer. Common patterns:

**Micro-file loading**: loading thousands of tiny files (under 1MB each) via COPY INTO creates excessive metadata overhead and slow load performance. Consolidate to files of 100–250MB before loading. Snowpipe handles this automatically for streaming use cases; for batch loads, consolidate upstream.

**Over-eager full refreshes**: replacing large tables entirely on every load instead of incrementally merging new rows. A full TRUNCATE + INSERT on a 1B-row table is expensive in compute and time; a MERGE based on the last-updated timestamp processes only changed rows.

For the warehouse selection context, see snowflake vs redshift and snowflake architecture guide. For cost optimisation beyond performance, see cloud data warehouse cost optimization. For the dbt layer performance that feeds Snowflake, see dbt best practices.

Our data architecture consulting practice conducts Snowflake performance reviews — diagnosing the most expensive queries, implementing clustering and warehouse rightsizing, and measuring the before/after impact. Book a free 30-minute audit to discuss your Snowflake environment.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →