SQL Query Performance Tuning: Making Slow Queries Fast

Slow SQL queries have predictable causes — full table scans, missing indexes, expensive joins, poor filter ordering, and non-sargable predicates. This guide covers the techniques for diagnosing slow queries using EXPLAIN/EXPLAIN ANALYZE, the optimisation strategies that produce the most improvement, and warehouse-specific performance considerations for Snowflake, BigQuery, and Redshift.

Why Queries Are Slow

SQL query performance problems have predictable root causes. Before tuning, understand what the query engine is actually doing:

**Full table scans**: Scanning every row in a table when a filter could have reduced the scan to a subset. The most common cause of slow queries on large tables.

**Poor join order**: The query engine joins tables in an order that produces large intermediate result sets, which are then filtered later. Reordering joins to apply the most selective filters earliest reduces intermediate set size.

**Non-sargable predicates**: Filter conditions that prevent index or partition use — function calls on the filter column, implicit type conversions, or LIKE conditions with leading wildcards.

**Missing partitioning**: Queries on large partitioned tables that do not include a partition filter scan the entire table.

**Insufficient or wrong statistics**: Query planners use statistics (row counts, cardinality, value distributions) to choose execution plans. Stale statistics lead to suboptimal plans.

**N+1 query patterns**: Running individual queries in a loop rather than one set-based query — common in ORM-generated SQL.

Reading EXPLAIN Plans

EXPLAIN (PostgreSQL, Redshift, BigQuery, Snowflake) shows the execution plan without running the query. EXPLAIN ANALYZE (PostgreSQL) runs the query and shows actual execution times.

Key elements to read in a plan:

**Seq Scan vs Index Scan**: Seq Scan reads every row (full table scan). Index Scan uses a B-tree index to locate matching rows. On large tables, Seq Scan is almost always the problem.

**Hash Join vs Nested Loop**: Hash Join builds a hash table of the smaller relation and probes it — efficient for large joins. Nested Loop iterates the outer table for every row of the inner table — efficient when the inner table is small; exponentially slow when both tables are large.

**Rows estimate vs actual rows**: A large discrepancy between the planner's row estimate and actual rows indicates stale statistics. Run ANALYZE on the table to refresh statistics.

**Cost**: Shown as (startup_cost..total_cost). Higher cost nodes are the bottleneck. EXPLAIN ANALYZE shows actual time.

Indexing for Performance

**B-tree indexes** (PostgreSQL, Redshift, SQL Server): Speed up equality and range predicates, ORDER BY. Create on columns used in WHERE, JOIN ON, and ORDER BY. Composite indexes: column order matters — the index is useful for queries filtering on (a), (a, b), or (a, b, c) but not just (b) or (c).

**Sargable predicates**: For an index to be used, the filter condition must be sargable (Search ARGument ABLE) — meaning the filter can use the index directly.

Sargable: WHERE created_at >= '2024-01-01'

Not sargable: WHERE YEAR(created_at) = 2024 (function call prevents index use)

Not sargable: WHERE CAST(user_id AS VARCHAR) = '12345' (type conversion)

Not sargable: WHERE name LIKE '%Smith' (leading wildcard)

For date filters, always filter on the raw column with comparison operators, not on functions applied to the column.

Snowflake Performance Tuning

**Clustering**: Snowflake organises data in micro-partitions. CLUSTER BY (date_column) physically orders micro-partitions so queries with date range predicates can skip non-matching partitions. Apply clustering to large tables with frequent range queries. Reclustering happens automatically for new data; run ALTER TABLE ... RECLUSTER for existing data.

**Virtual warehouse sizing**: A larger warehouse (L, XL) processes the same query faster by using more compute resources. For long-running single queries, scaling up is effective. For concurrent users, scaling out (additional clusters) is more effective.

**Result cache**: Snowflake caches query results for 24 hours. Identical queries (same SQL, same data, same user role) return the cached result instantly — no warehouse compute consumed. Design dashboards to use consistent query text to maximise cache hits.

**Materialised views**: Pre-computed aggregations refreshed automatically on base table change. Queries against the base table that match the materialised view's definition are automatically redirected to the materialised view. Most effective for aggregations run repeatedly by BI tools.

BigQuery Performance Tuning

**Partitioned tables**: Queries with WHERE clause filtering on the partition column scan only matching partitions. A 1-year daily-partitioned table with a filter on today scans 1/365 of the data. Require partition filters: require_partition_filter=TRUE prevents full table scans.

**Clustered tables**: CLUSTER BY reorders data within partitions by specified columns. Queries filtering on cluster columns skip non-matching blocks. Most effective when combined with partitioning.

**Avoid SELECT star**: BigQuery columnar storage reads only columns referenced in the query. SELECT * scans all columns. For a 100-column table, specifying the 5 needed columns reduces scan by 95%.

**Shuffle operations**: Cross-join, GROUP BY, and JOIN operations may require data movement (shuffle) across slots. Minimise shuffles by filtering before grouping, using partitioned tables to co-locate related data, and avoiding large-to-large table joins without selective filters.

**Slot consumption**: BigQuery on-demand processes queries using shared slot capacity. Queries consuming many slots complete faster but cost more. For concurrent workloads, slot reservations provide predictable capacity.

Redshift Performance Tuning

**Distribution keys**: DISTKEY co-locates rows with the same key value on the same node, eliminating data movement during JOIN operations on the DISTKEY column. Use DISTKEY on the most commonly joined column in large fact tables.

**Sort keys**: SORTKEY orders data on disk. Zone maps track min/max per 1MB disk block, enabling block-level skipping for range queries on the sort key column. Choose the column most used in range predicates.

**VACUUM and ANALYZE**: Deleted rows are not immediately reclaimed — VACUUM reclaims space and re-sorts data. ANALYZE updates statistics for the query planner. Run both after large DML operations.

**WLM configuration**: Workload management queues allocate memory and concurrency slots per queue. ETL jobs competing with BI queries for the same resources cause both to slow. Assign ETL to a low-concurrency, high-memory queue; BI to a high-concurrency, lower-memory queue.

General Query Optimisation Patterns

**Filter early**: Apply the most selective filters (high cardinality columns, date ranges) as early as possible in the query. Push WHERE conditions down to CTEs rather than filtering in the outer query.

**Avoid functions on filter columns**: Convert filter comparisons to be applied to constants, not columns. Instead of YEAR(order_date) = 2024, use order_date >= '2024-01-01' AND order_date < '2025-01-01'.

**Use EXISTS instead of IN for large lists**: IN with a subquery materialises the subquery into a list; EXISTS short-circuits as soon as a match is found. For large subqueries, EXISTS is more efficient.

**Aggregate before joining**: When joining an aggregation to another table, aggregate first in a CTE or subquery, then join the smaller result. Joining the full large table before aggregating moves more data through the join.

Our data architecture practice optimises data warehouse SQL performance and query architecture — contact us to discuss your warehouse performance requirements.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →