BlogData Engineering

What Is a Window Function? Running Totals, Rankings, and Period Comparisons in SQL

James Okafor
James Okafor
Senior Data Engineer
·August 21, 202810 min read

SQL window functions perform calculations across a set of rows related to the current row — without collapsing them into a single aggregate result. This guide explains how window functions work, the common function types (ROW_NUMBER, RANK, LAG, LEAD, SUM OVER), and when to use them.

A SQL window function performs a calculation across a set of rows that are somehow related to the current row — defined by a PARTITION BY clause (which rows are in scope) and an ORDER BY clause (in what sequence) — without collapsing those rows into a single aggregate result. The calculation is computed for each row individually, with each row retaining its own identity in the result set.

Regular aggregate functions (SUM, COUNT, AVG) collapse a group of rows into one result row. Window functions compute an aggregate or positional result for each row based on a defined window of related rows, while still returning one result row per input row.

The canonical example: total revenue per customer, shown alongside each individual order. A GROUP BY query collapses to one row per customer; a window function computes the total while retaining every order row.

The OVER Clause

Every window function uses an OVER clause that defines the window:

SUM(revenue) OVER (PARTITION BY customer_id ORDER BY order_date) computes a running sum of revenue, partitioned by customer (each customer's running total is independent), ordered by date (the running total accumulates in date order).

**PARTITION BY** divides the rows into independent groups — the window function restarts for each partition. Without PARTITION BY, the window spans the entire result set. PARTITION BY customer_id means the window restarts for each customer.

**ORDER BY** defines the sequence within the partition — critical for running totals, rankings, and lag/lead calculations. Without ORDER BY, the window function has no defined sequence.

**ROWS BETWEEN / RANGE BETWEEN** controls which rows within the partition are included in the calculation relative to the current row. ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW includes all rows from the partition start through the current row — standard for running totals. ROWS BETWEEN 6 PRECEDING AND CURRENT ROW includes the 7 rows ending at the current row — for rolling 7-period averages.

Common Window Functions

**ROW_NUMBER():** Assigns a unique sequential integer to each row within the partition, ordered by the specified column. Used for deduplication (keep only ROW_NUMBER = 1 per customer per day), for pagination, and for identifying the latest record per entity.

**RANK() and DENSE_RANK():** Like ROW_NUMBER but with tied ranks. RANK gives the same rank to tied rows and skips subsequent ranks (1, 2, 2, 4); DENSE_RANK does not skip (1, 2, 2, 3). Used for top-N queries per partition.

**SUM() OVER and AVG() OVER:** Aggregate functions used as window functions. Running totals, rolling averages, cumulative percentages. SUM(revenue) OVER (PARTITION BY customer_id ORDER BY order_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) produces a running customer revenue total.

**LAG() and LEAD():** Access a value from a previous or subsequent row within the partition. LAG(revenue, 1) returns the prior row's revenue; LEAD(revenue, 1) returns the next row's revenue. Critical for period-over-period calculations: revenue change from previous period is revenue - LAG(revenue, 1) OVER (PARTITION BY product_id ORDER BY month).

**FIRST_VALUE() and LAST_VALUE():** Return the first or last value in the window frame. FIRST_VALUE(order_date) OVER (PARTITION BY customer_id ORDER BY order_date) returns the earliest order date for each customer on every row — the cohort acquisition date pattern.

**NTILE():** Divides the partition into N equal-sized buckets and assigns a bucket number to each row. NTILE(4) over revenue values produces quartiles; NTILE(10) produces deciles. Used for quantile-based segmentation.

Window Functions vs GROUP BY

The key distinction:

**GROUP BY:** One result row per group. Loses individual row identity. Cannot return both the individual order details and the customer total in the same query without a join.

**Window function:** One result row per input row. Retains individual row identity. Returns the customer total alongside each individual order in the same result set.

Both are appropriate in different scenarios. GROUP BY for aggregated reports where individual row detail is not needed. Window functions for analytical queries that need both individual detail and group context.

Performance Considerations

Window functions are computed after filtering but before HAVING and ORDER BY. The database must sort data within partitions, which can be expensive for large result sets. Indexes on PARTITION BY and ORDER BY columns help, but window function queries over very large tables on live connections may be slow.

In cloud data warehouses (BigQuery, Snowflake, Redshift), window functions are efficient for analytical workloads — they are first-class operations that the query optimizer handles well. On transactional databases with large tables, window functions are sometimes better pre-computed in a data warehouse transformation rather than computed on every query.

dbt models can use window functions in SQL to pre-compute running totals, rankings, and cohort attributes — materializing the result so Tableau queries a pre-computed mart table rather than re-executing expensive window function logic on every dashboard load.

Our data engineering services and data architecture practice designs SQL transformation models using window functions for complex analytical requirements. Contact us to discuss your data engineering needs.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →