SQL Query Optimisation: A Practical Guide for Data Engineers and Analysts

The techniques that actually reduce query cost and latency in production data warehouses — execution plan reading, partition pruning, materialisation strategy, join order, and the optimisation traps that waste engineering time without improving performance.

SQL query optimisation is one of the highest-leverage skills in data engineering. A single poorly written query running on a schedule can cost thousands of dollars per month and degrade warehouse performance for everyone. The techniques in this guide apply across cloud data warehouses — Snowflake, BigQuery, Redshift, and Databricks — with platform-specific notes where they diverge.

Start with the Execution Plan

Before optimising anything, read the execution plan. Every major warehouse exposes this:

- **Snowflake:** EXPLAIN statement, or the Query Profile UI in Snowsight (visual execution plan with time breakdowns per node)

- **BigQuery:** EXPLAIN plan in the console, or query execution details showing slot time, bytes processed per stage

- **Redshift:** EXPLAIN statement; STL_EXPLAIN table for historical plans

- **Databricks:** EXPLAIN EXTENDED or the Spark UI

What to look for:

**Full table scans** where partition pruning should occur. If your WHERE clause filters on a partition column but the plan shows bytes/rows scanned equal to the full table, the query is not pruning. Common cause: implicit type casting between filter value and column type.

**Cartesian joins** (cross joins) producing row counts far larger than either input. These usually appear as massive row count spikes mid-plan and indicate a missing join condition or a join key that is null in both tables.

**Spill to disk** on join or sort operations. In Snowflake this shows as spilled remote bytes in the Query Profile. In Databricks it shows in the Sort or HashAggregate nodes. Spill indicates the operation exceeds available memory — the fix is warehouse/cluster sizing or reducing intermediate row counts before the expensive operation.

**Skewed distribution** in distributed execution. If one node processes 10x the rows of others, a skewed join key (e.g., joining on a column where one value accounts for 40% of rows) is likely causing work imbalance. The fix is adding a secondary distribution key or filtering the skewed value out before the join.

Partition Pruning

Partition pruning is the most impactful single optimisation in most cloud warehouses. It eliminates data scan at the storage level before any compute runs.

**BigQuery:** Tables should be partitioned on the column most commonly used in WHERE filters — usually an event timestamp or a date column. Queries must filter on the exact partition column; filtering on a derived expression (DATE_TRUNC(created_at, MONTH)) does not trigger partition pruning unless the partition column is created_at and the filter is on created_at directly. Clustering on the next most selective filter column further reduces bytes scanned.

**Snowflake:** Uses micro-partition pruning based on column statistics. Snowflake automatically tracks min/max values per column per micro-partition (16MB compressed chunks). Queries filtering on columns with good clustering (low overlap between micro-partition ranges) prune effectively. Explicit clustering keys can be defined for tables where the natural insertion order produces poor pruning. Check the Partitions Scanned vs Partitions Total ratio in the Query Profile — anything above 30% pruned is good; sub-10% suggests clustering should be reviewed.

**Redshift:** Sort keys define the physical row order and enable zone map pruning (similar to Snowflake micro-partitions). COMPOUND sort keys prune on prefix columns; INTERLEAVED sort keys prune on any column but require more maintenance. Distribution keys (DISTKEY) determine which node stores which rows — good DISTKEY selection eliminates redistributing data during joins.

Join Optimisation

**Join order matters.** Most query optimisers choose join order automatically, but when they do not: join the smaller (more filtered) table first. Reducing row counts early reduces the work in every subsequent join.

**Filter before joining.** If you are filtering rows from a joined table, apply the filter in a CTE or subquery before the join rather than in the outer WHERE clause. The optimiser sometimes does this automatically (predicate pushdown), but it is not guaranteed — being explicit prevents large intermediate join results that are immediately filtered.

**Avoid joining on expressions.** Joining on LOWER(email) = LOWER(other.email) requires computing the expression for every row and prevents index/clustering use. Normalise data at ingestion or create a computed column.

**Broadcast joins for small tables.** When joining a large fact table to a small dimension table (< a few hundred thousand rows), force a broadcast join if your warehouse supports it. Snowflake does this automatically when one table is small; Databricks requires a broadcast hint. A broadcast join copies the small table to every node, eliminating shuffle.

Materialisation Strategy

Not every transformation should run as a query at execution time. Materialising intermediate results as tables (or Snowflake dynamic tables, BigQuery materialised views, dbt incremental models) converts expensive compute at query time into cheap storage reads.

Materialise when:

- An intermediate CTE is referenced multiple times in a single query (the warehouse may recompute it each time)

- The same transformation runs in multiple downstream queries

- A query aggregates a large fact table to a smaller grain that is then joined to other tables — materialise the aggregation

Avoid materialising when:

- The source table changes frequently and the materialised result will be stale for more than a few hours

- The intermediate result is only used once and is not particularly expensive to compute

- Storage cost of the materialised table exceeds the compute savings from not recomputing it

**dbt incremental models** implement a common materialisation pattern: on the first run, build the full table; on subsequent runs, process only new/changed records and merge them into the existing table. This is the right approach for fact tables where full rebuilds are prohibitively expensive but incremental processing is feasible.

Common Anti-Patterns

**SELECT *** is the most common warehouse query anti-pattern. Cloud warehouses are columnar — reading an unused column costs real money and compute. Always project only the columns you need. In BigQuery, SELECT * on a 10TB table scans 10TB regardless of how few rows the WHERE clause returns.

**Correlated subqueries** execute a subquery once per row of the outer query. A correlated subquery in a WHERE or SELECT clause on a million-row table runs the subquery a million times. Replace with a JOIN to a CTE that pre-computes the subquery result once.

**Scalar UDFs in WHERE clauses.** Custom scalar functions in WHERE clauses prevent partition pruning and force row-by-row evaluation. If you must use a UDF, apply it to a CTE first and filter the CTE result.

**UNION instead of UNION ALL.** UNION deduplicates results with an implicit DISTINCT operation. If duplicates are not possible (or not a problem), use UNION ALL — it is substantially faster because it skips the sort-based deduplication step.

**Unnecessary DISTINCT.** DISTINCT forces a sort and deduplication of the entire result set. Many uses of DISTINCT are compensating for a Cartesian join or incorrect aggregation upstream. Fix the root cause rather than adding DISTINCT to silence the symptom.

**Unindexed LIKE with leading wildcard.** WHERE description LIKE '%word%' cannot use any form of indexing and forces a full column scan. If full-text search is required, use a dedicated full-text index or search service rather than LIKE.

Platform-Specific Notes

**Snowflake:** Result cache returns cached results for identical queries run within 24 hours when the underlying table has not changed. For dashboards querying the same aggregated result repeatedly, this eliminates warehouse compute entirely. Ensure queries are parameterised consistently to maximise cache hit rate. Use query tags to attribute credit consumption to specific workloads or teams.

**BigQuery:** Prefer partitioned tables over LIMIT for cost control — LIMIT does not reduce bytes scanned. Use the query validator before running exploratory queries: the bytes processed estimate in the BigQuery console is reliable. Scheduled queries with date-based filters against partitioned tables can reduce monthly scan costs by 80-95% compared to equivalent unpartitioned queries.

**Databricks:** Use ZORDER BY on Delta tables to co-locate related values within data files, improving file pruning for point queries. Cache frequently accessed DataFrames with CACHE TABLE for iterative analysis. Use Photon engine for vectorised query execution on Delta tables — it significantly accelerates SQL aggregation workloads.

SQL query optimisation is ultimately about understanding what the warehouse is doing with your query and removing work it does not need to do. Read the execution plan first. Fix the most expensive node. Measure. Repeat.

For data teams looking to reduce warehouse costs or improve query performance at scale, our data engineering consulting practice offers warehouse optimisation assessments — contact us to discuss your specific environment.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →