Window functions are the most powerful analytical capability in SQL and the most underused. Most analysts reach for subqueries or self-joins for problems that window functions solve in a single query. This guide covers the full syntax, the key functions, and the practical patterns that appear repeatedly in analytical work.
Window functions are the part of SQL that separates analysts who write queries from analysts who think in SQL. They handle a class of problems — ranking, running totals, period-over-period comparisons, moving averages, percentile calculations — that are either impossible or extremely awkward without them. Every analyst working with a modern data warehouse should have window functions as automatic vocabulary.
What a Window Function Is
A regular aggregate function like SUM or COUNT collapses rows into a single result. A GROUP BY with SUM(revenue) produces one row per group. The individual row detail is gone.
A window function computes an aggregate across a defined set of rows — the "window" — but returns the result on every input row without collapsing them. You get the aggregate alongside the row-level detail.
The syntax is:
function_name(expression) OVER (
PARTITION BY partition_columns
ORDER BY order_columns
ROWS|RANGE frame_specification
)
All three clauses inside OVER are optional. Understanding what each does is the key to using window functions correctly.
**PARTITION BY**: Defines the grouping for the window. The function resets and recomputes for each partition, like a GROUP BY that preserves row-level detail. PARTITION BY customer_id means the function computes separately for each customer.
**ORDER BY**: Defines the ordering within the partition. Required for ranking functions and running totals; optional for pure aggregates.
**Frame specification**: Defines which rows within the partition contribute to the computation. ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW means all rows from the start of the partition to the current row — the standard for running totals. ROWS BETWEEN 6 PRECEDING AND CURRENT ROW means the current row and the 6 rows before it — a 7-day moving average if rows are daily.
Ranking Functions
ROW_NUMBER(), RANK(), and DENSE_RANK() are the most commonly used ranking functions.
**ROW_NUMBER()**: Assigns a unique sequential integer starting at 1 within each partition, ordered by the ORDER BY clause. Ties produce arbitrary but consistent ordering. Use for deduplication and top-N selection.
SELECT
customer_id,
order_date,
order_total,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date DESC) AS recency_rank
FROM orders
Wrapping this in a CTE and filtering WHERE recency_rank = 1 gives the most recent order per customer — one of the most common analytical patterns.
**RANK()**: Like ROW_NUMBER but ties get the same rank, and the next rank skips. Two rows tied for rank 1 both get rank 1, the next row gets rank 3. Use when ties should be visible as equal.
**DENSE_RANK()**: Like RANK but does not skip. Two rows tied for rank 1 both get rank 1, the next row gets rank 2. Use for percentile bucket assignment where gaps in ranking produce unintuitive results.
**NTILE(n)**: Divides the partition into n buckets and assigns each row to a bucket. NTILE(4) produces quartile assignments. Note: bucket boundaries depend on row count; uneven divisions are assigned to lower buckets.
**PERCENT_RANK() and CUME_DIST()**: Return the relative rank of a row as a fraction between 0 and 1. PERCENT_RANK computes (rank - 1) / (total rows - 1). CUME_DIST computes rank / total rows. CUME_DIST answers "what fraction of rows have a value less than or equal to this row" — useful for percentile calculations.
Running Totals and Cumulative Aggregates
Running totals use the frame specification explicitly:
SELECT
order_date,
daily_revenue,
SUM(daily_revenue) OVER (
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total
FROM daily_revenue
The ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW frame is the default when you specify ORDER BY without an explicit frame, but stating it explicitly makes intent clear.
For month-to-date calculations with a date spine, PARTITION BY with a month expression and ORDER BY date gives MTD per month:
SUM(daily_revenue) OVER (
PARTITION BY DATE_TRUNC('month', order_date)
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
)
For year-to-date, PARTITION BY year expression works the same way.
Moving Averages and Rolling Aggregates
A 7-day rolling average uses the frame to define the window:
AVG(daily_revenue) OVER (
ORDER BY order_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
)
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW includes the current row plus the 6 rows before it — 7 rows total. For a centered moving average, use ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING.
Note the distinction between ROWS and RANGE frame modes. ROWS counts physical rows. RANGE counts rows within a value range. For date-based rolling aggregates with no gaps in the date spine, ROWS is usually correct. RANGE becomes relevant when multiple rows can share the same ORDER BY value and you want them treated identically.
LAG and LEAD
LAG and LEAD access values from other rows in the result set without a self-join.
LAG(expression, offset, default) OVER (PARTITION BY ... ORDER BY ...)
LEAD(expression, offset, default) OVER (PARTITION BY ... ORDER BY ...)
LAG looks backward in the ordered partition; LEAD looks forward. The offset defaults to 1 (one row). The default value is returned when there is no row at the offset (at the start or end of the partition).
Period-over-period comparison with LAG:
SELECT
month,
revenue,
LAG(revenue, 1) OVER (ORDER BY month) AS prior_month_revenue,
revenue - LAG(revenue, 1) OVER (ORDER BY month) AS mom_change,
(revenue - LAG(revenue, 1) OVER (ORDER BY month)) / LAG(revenue, 1) OVER (ORDER BY month) AS mom_pct_change
FROM monthly_revenue
For year-over-year with monthly data, LAG with offset 12 gives the same month in the prior year — no complex date math needed.
FIRST_VALUE and LAST_VALUE
FIRST_VALUE and LAST_VALUE return the first or last value within the window frame.
A common use: annotate every row with the customer's first order date without a separate lookup:
FIRST_VALUE(order_date) OVER (
PARTITION BY customer_id
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS first_order_date
Note the frame: ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING. Without this, LAST_VALUE with ORDER BY uses the default frame (UNBOUNDED PRECEDING to CURRENT ROW), which means LAST_VALUE returns the current row's value — not the last row in the partition. This is a common mistake that produces incorrect results silently.
Performance Considerations
Window functions execute after WHERE, GROUP BY, and HAVING — but before the final SELECT and ORDER BY. This means you cannot filter on a window function result in the same query layer; wrap in a CTE or subquery.
On columnar databases like Snowflake, BigQuery, and Redshift, window functions generally perform well because the execution engine can exploit column-oriented storage for the sort and partition operations. On row-oriented databases (PostgreSQL, MySQL), complex window functions on large tables may benefit from an index on the PARTITION BY and ORDER BY columns.
Multiple window functions in the same SELECT clause that share identical OVER clauses can often be computed in a single pass by the query planner. Explicit WINDOW clause syntax names a window definition once and reuses it:
SELECT
customer_id,
order_date,
revenue,
SUM(revenue) OVER w AS running_total,
AVG(revenue) OVER w AS running_avg,
ROW_NUMBER() OVER w AS row_num
FROM orders
WINDOW w AS (PARTITION BY customer_id ORDER BY order_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
This is cleaner to read and may allow the query planner to reuse the computed window.
Our data architecture and BI strategy teams build analytical infrastructure that puts SQL patterns like these to work on the problems that matter to your business — contact us to discuss your analytics engineering needs.
A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.
Book a Call →