BlogData Engineering

SQL Window Functions: A Practical Guide for Data Analysts

James Okafor
James Okafor
Data & Cloud Engineer
·June 2, 202713 min read

Window functions are the most powerful analytical capability in SQL and the most underused. Most analysts reach for subqueries or self-joins for problems that window functions solve in a single query. This guide covers the full syntax, the key functions, and the practical patterns that appear repeatedly in analytical work.

Window functions are the part of SQL that separates analysts who write queries from analysts who think in SQL. They handle a class of problems — ranking, running totals, period-over-period comparisons, moving averages, percentile calculations — that are either impossible or extremely awkward without them. Every analyst working with a modern data warehouse should have window functions as automatic vocabulary.

What a Window Function Is

A regular aggregate function like SUM or COUNT collapses rows into a single result. A GROUP BY with SUM(revenue) produces one row per group. The individual row detail is gone.

A window function computes an aggregate across a defined set of rows — the "window" — but returns the result on every input row without collapsing them. You get the aggregate alongside the row-level detail.

The syntax is:

function_name(expression) OVER (

PARTITION BY partition_columns

ORDER BY order_columns

ROWS|RANGE frame_specification

)

All three clauses inside OVER are optional. Understanding what each does is the key to using window functions correctly.

**PARTITION BY**: Defines the grouping for the window. The function resets and recomputes for each partition, like a GROUP BY that preserves row-level detail. PARTITION BY customer_id means the function computes separately for each customer.

**ORDER BY**: Defines the ordering within the partition. Required for ranking functions and running totals; optional for pure aggregates.

**Frame specification**: Defines which rows within the partition contribute to the computation. ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW means all rows from the start of the partition to the current row — the standard for running totals. ROWS BETWEEN 6 PRECEDING AND CURRENT ROW means the current row and the 6 rows before it — a 7-day moving average if rows are daily.

Ranking Functions

ROW_NUMBER(), RANK(), and DENSE_RANK() are the most commonly used ranking functions.

**ROW_NUMBER()**: Assigns a unique sequential integer starting at 1 within each partition, ordered by the ORDER BY clause. Ties produce arbitrary but consistent ordering. Use for deduplication and top-N selection.

SELECT

customer_id,

order_date,

order_total,

ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date DESC) AS recency_rank

FROM orders

Wrapping this in a CTE and filtering WHERE recency_rank = 1 gives the most recent order per customer — one of the most common analytical patterns.

**RANK()**: Like ROW_NUMBER but ties get the same rank, and the next rank skips. Two rows tied for rank 1 both get rank 1, the next row gets rank 3. Use when ties should be visible as equal.

**DENSE_RANK()**: Like RANK but does not skip. Two rows tied for rank 1 both get rank 1, the next row gets rank 2. Use for percentile bucket assignment where gaps in ranking produce unintuitive results.

**NTILE(n)**: Divides the partition into n buckets and assigns each row to a bucket. NTILE(4) produces quartile assignments. Note: bucket boundaries depend on row count; uneven divisions are assigned to lower buckets.

**PERCENT_RANK() and CUME_DIST()**: Return the relative rank of a row as a fraction between 0 and 1. PERCENT_RANK computes (rank - 1) / (total rows - 1). CUME_DIST computes rank / total rows. CUME_DIST answers "what fraction of rows have a value less than or equal to this row" — useful for percentile calculations.

Running Totals and Cumulative Aggregates

Running totals use the frame specification explicitly:

SELECT

order_date,

daily_revenue,

SUM(daily_revenue) OVER (

ORDER BY order_date

ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

) AS running_total

FROM daily_revenue

The ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW frame is the default when you specify ORDER BY without an explicit frame, but stating it explicitly makes intent clear.

For month-to-date calculations with a date spine, PARTITION BY with a month expression and ORDER BY date gives MTD per month:

SUM(daily_revenue) OVER (

PARTITION BY DATE_TRUNC('month', order_date)

ORDER BY order_date

ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

)

For year-to-date, PARTITION BY year expression works the same way.

Moving Averages and Rolling Aggregates

A 7-day rolling average uses the frame to define the window:

AVG(daily_revenue) OVER (

ORDER BY order_date

ROWS BETWEEN 6 PRECEDING AND CURRENT ROW

)

ROWS BETWEEN 6 PRECEDING AND CURRENT ROW includes the current row plus the 6 rows before it — 7 rows total. For a centered moving average, use ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING.

Note the distinction between ROWS and RANGE frame modes. ROWS counts physical rows. RANGE counts rows within a value range. For date-based rolling aggregates with no gaps in the date spine, ROWS is usually correct. RANGE becomes relevant when multiple rows can share the same ORDER BY value and you want them treated identically.

LAG and LEAD

LAG and LEAD access values from other rows in the result set without a self-join.

LAG(expression, offset, default) OVER (PARTITION BY ... ORDER BY ...)

LEAD(expression, offset, default) OVER (PARTITION BY ... ORDER BY ...)

LAG looks backward in the ordered partition; LEAD looks forward. The offset defaults to 1 (one row). The default value is returned when there is no row at the offset (at the start or end of the partition).

Period-over-period comparison with LAG:

SELECT

month,

revenue,

LAG(revenue, 1) OVER (ORDER BY month) AS prior_month_revenue,

revenue - LAG(revenue, 1) OVER (ORDER BY month) AS mom_change,

(revenue - LAG(revenue, 1) OVER (ORDER BY month)) / LAG(revenue, 1) OVER (ORDER BY month) AS mom_pct_change

FROM monthly_revenue

For year-over-year with monthly data, LAG with offset 12 gives the same month in the prior year — no complex date math needed.

FIRST_VALUE and LAST_VALUE

FIRST_VALUE and LAST_VALUE return the first or last value within the window frame.

A common use: annotate every row with the customer's first order date without a separate lookup:

FIRST_VALUE(order_date) OVER (

PARTITION BY customer_id

ORDER BY order_date

ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

) AS first_order_date

Note the frame: ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING. Without this, LAST_VALUE with ORDER BY uses the default frame (UNBOUNDED PRECEDING to CURRENT ROW), which means LAST_VALUE returns the current row's value — not the last row in the partition. This is a common mistake that produces incorrect results silently.

Performance Considerations

Window functions execute after WHERE, GROUP BY, and HAVING — but before the final SELECT and ORDER BY. This means you cannot filter on a window function result in the same query layer; wrap in a CTE or subquery.

On columnar databases like Snowflake, BigQuery, and Redshift, window functions generally perform well because the execution engine can exploit column-oriented storage for the sort and partition operations. On row-oriented databases (PostgreSQL, MySQL), complex window functions on large tables may benefit from an index on the PARTITION BY and ORDER BY columns.

Multiple window functions in the same SELECT clause that share identical OVER clauses can often be computed in a single pass by the query planner. Explicit WINDOW clause syntax names a window definition once and reuses it:

SELECT

customer_id,

order_date,

revenue,

SUM(revenue) OVER w AS running_total,

AVG(revenue) OVER w AS running_avg,

ROW_NUMBER() OVER w AS row_num

FROM orders

WINDOW w AS (PARTITION BY customer_id ORDER BY order_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

This is cleaner to read and may allow the query planner to reuse the computed window.

Our data architecture and BI strategy teams build analytical infrastructure that puts SQL patterns like these to work on the problems that matter to your business — contact us to discuss your analytics engineering needs.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →