Window functions are the most powerful SQL feature most analysts underuse. They compute aggregations across rows related to the current row without collapsing the result set — enabling running totals, rankings, lag/lead comparisons, and moving averages that are difficult or impossible to write with GROUP BY alone.
Window functions are the SQL feature that most separates analysts who write maintainable analytical queries from analysts who write query abominations. Before window functions, computing running totals required self-joins. Ranking required correlated subqueries. Period-over-period comparisons required multiple CTEs that were difficult to read and slow to execute. Window functions replace all of those patterns with a single, readable function call.
They are supported in every modern SQL database — Snowflake, BigQuery, Redshift, DuckDB, Postgres, and SQL Server all implement the ANSI standard window function syntax. The syntax is identical across all of them with minor variations.
How Window Functions Work
A regular aggregate function (SUM, COUNT, AVG) collapses multiple rows into one. GROUP BY groups rows, the aggregate computes across the group, and the result has one row per group.
A window function computes an aggregate across a set of rows related to the current row, but it does NOT collapse the result set. Every input row produces exactly one output row. The "window" is the set of rows considered for each input row's computation.
The OVER() clause defines the window:
SELECT
order_id,
customer_id,
revenue,
SUM(revenue) OVER (PARTITION BY customer_id ORDER BY order_date) AS running_revenue
FROM orders
This adds a running_revenue column to every order row — the cumulative revenue for that customer up to and including that order date. The PARTITION BY divides rows into groups (one per customer). The ORDER BY defines the row ordering within each partition. The function (SUM) computes across the window.
Without PARTITION BY, the window is the entire result set. Without ORDER BY, the window includes all rows in the partition (unordered). The combination of PARTITION BY and ORDER BY defines precisely what "rows related to the current row" means.
Ranking Functions
Three ranking functions cover most needs:
**ROW_NUMBER()** assigns a unique sequential integer to each row within a partition. Ties get different numbers (the assignment is arbitrary for ties, but each tied row gets a distinct number). Use when you need exactly one row per rank — like the most recent order per customer:
SELECT *
FROM (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date DESC) AS rn
FROM orders
)
WHERE rn = 1
This is one of the most common window function patterns. The inner query assigns row numbers within each customer partition, ordered by date descending. The outer query filters to rn = 1 — one row per customer, the most recent order.
**RANK()** assigns the same rank to tied rows but leaves gaps. If three rows tie for rank 1, they all get rank 1 and the next rank is 4. Use RANK() when the gap matters analytically — to show that the #4 item is actually tied with #1–3.
**DENSE_RANK()** assigns the same rank to tied rows without gaps. Three rows tied for rank 1 all get rank 1, and the next distinct value gets rank 2. Use DENSE_RANK() when you want Top N by rank and ties should count as one rank position.
Running Totals and Cumulative Aggregates
Running totals are the most common window function use:
SELECT
date,
daily_revenue,
SUM(daily_revenue) OVER (ORDER BY date) AS cumulative_revenue
FROM daily_revenue_summary
The absence of PARTITION BY means the window is the entire result set. The ORDER BY date means each row's window includes all rows up to and including the current date row. The result: a cumulative sum that grows with each date.
Cumulative average (running average):
SELECT
date,
daily_revenue,
AVG(daily_revenue) OVER (ORDER BY date) AS cumulative_avg_revenue
FROM daily_revenue_summary
Same structure, different function. The cumulative average at each date reflects average daily revenue from the start of the period through that date.
Percent of total (without collapsing the result set):
SELECT
product_name,
revenue,
revenue / SUM(revenue) OVER () AS pct_of_total
FROM product_revenue
SUM(revenue) OVER () with no PARTITION BY and no ORDER BY computes the total revenue across all rows. Dividing each row's revenue by that total gives percent of total — without requiring a subquery or join.
LAG and LEAD: Period-over-Period Comparisons
LAG() accesses the value of a prior row. LEAD() accesses the value of a subsequent row. Both take the function argument (the column to access), an optional offset (default 1), and an optional default value for when the offset goes out of bounds.
Month-over-month revenue comparison:
SELECT
month,
revenue,
LAG(revenue, 1) OVER (ORDER BY month) AS prior_month_revenue,
revenue - LAG(revenue, 1) OVER (ORDER BY month) AS mom_change,
(revenue - LAG(revenue, 1) OVER (ORDER BY month)) / LAG(revenue, 1) OVER (ORDER BY month) AS mom_pct_change
FROM monthly_revenue
LAG(revenue, 1) returns the revenue from the prior month row. Subtracting it from current revenue gives the absolute change. Dividing gives the percentage change. All in a single query scan.
Year-over-year comparison requires a LAG offset matching the number of periods in a year. For monthly data, LAG(revenue, 12) returns the same month from 12 months ago:
SELECT
month,
revenue,
LAG(revenue, 12) OVER (ORDER BY month) AS prior_year_same_month,
(revenue - LAG(revenue, 12) OVER (ORDER BY month)) / LAG(revenue, 12) OVER (ORDER BY month) AS yoy_pct_change
FROM monthly_revenue
For daily data comparing year-over-year at the same day of week (to avoid day-of-week bias), LAG(revenue, 364) returns the revenue 52 weeks prior — same day of week, approximately the same calendar date one year ago.
Moving Averages and Rolling Windows
A rolling window specifies a frame of rows relative to the current row, rather than all rows from the start of the partition. The frame is defined with ROWS BETWEEN or RANGE BETWEEN:
SUM(revenue) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)
This computes the sum of the 7 rows ending at the current row — the trailing 7-day total, useful for computing 7-day rolling revenue.
7-day rolling average:
SELECT
date,
daily_revenue,
AVG(daily_revenue) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS rolling_7d_avg
FROM daily_revenue
ROWS BETWEEN specifies physical row offsets. RANGE BETWEEN specifies logical value offsets — useful when dates have gaps and you want "all rows within 6 days" rather than "6 rows back regardless of date gaps."
Common frame specifications:
- ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW — all rows from partition start through current row (the default when ORDER BY is present without a frame clause)
- ROWS BETWEEN N PRECEDING AND CURRENT ROW — trailing N+1 rows
- ROWS BETWEEN CURRENT ROW AND N FOLLOWING — leading N+1 rows
- ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING — all rows in partition
NTILE: Quartile and Percentile Assignment
NTILE(n) divides the partition into n buckets of approximately equal size and assigns each row to a bucket. This is how you compute quartiles, quintiles, and deciles:
SELECT
customer_id,
total_revenue,
NTILE(4) OVER (ORDER BY total_revenue DESC) AS revenue_quartile
FROM customer_revenue
Quartile 1 contains the top 25% of customers by revenue. Quartile 4 contains the bottom 25%. Use this for customer segmentation, cohort analysis, or any distribution-based assignment.
NTILE distributes rows as evenly as possible — if the number of rows is not evenly divisible by n, the first few buckets get one extra row. This is usually fine for large result sets; be aware of it for small ones.
FIRST_VALUE and LAST_VALUE
FIRST_VALUE() returns the value of the first row in the window frame. LAST_VALUE() returns the value of the last row. Common use: within each partition, what is the first and last value of a given column?
SELECT
customer_id,
order_date,
revenue,
FIRST_VALUE(order_date) OVER (PARTITION BY customer_id ORDER BY order_date) AS first_order_date,
FIRST_VALUE(revenue) OVER (PARTITION BY customer_id ORDER BY order_date) AS first_order_revenue
FROM orders
This adds every customer's first order date and first order revenue to every row — without a subquery or self-join.
LAST_VALUE() requires an explicit frame to work as expected. Without a frame clause, the default frame is UNBOUNDED PRECEDING TO CURRENT ROW, which means LAST_VALUE at any given row returns the current row's value, not the last row in the partition. To get the partition's last value: LAST_VALUE(column) OVER (PARTITION BY ... ORDER BY ... ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING).
Performance Considerations
Window functions are generally efficient in columnar query engines. They avoid the self-joins and correlated subqueries they replace, which were often query performance disasters. However:
Large windows (UNBOUNDED PRECEDING) on unsorted data require a full sort before the window computation. If you are running multiple window functions with the same PARTITION BY and ORDER BY, the query engine typically executes the sort once. If you mix different orderings, you pay for multiple sorts.
For very large tables with complex window functions, materialising intermediate results in a CTE before applying window functions can help the query planner make better decisions.
Our data engineering practice writes analytical SQL for complex warehouse environments — contact us to discuss your data modelling and transformation requirements.
A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.
Book a Call →