BlogAnalytics

SQL CTEs: Common Table Expressions for Readable, Maintainable Queries

Eric Chen
Eric Chen
BI Solutions Architect
·January 26, 202810 min read

Common Table Expressions (CTEs) let you define named subqueries at the top of a SQL statement and reference them like temporary tables throughout the query. This guide covers CTE syntax, recursive CTEs, when to use CTEs versus subqueries or temporary tables, and patterns for using CTEs to build readable multi-step analytical queries.

What CTEs Are

A Common Table Expression (CTE) is a named, temporary result set defined at the beginning of a SQL statement using the WITH keyword. CTEs are referenced like tables within the same query. They are not persisted — they exist only for the duration of the query.

WITH monthly_revenue AS (

SELECT

DATE_TRUNC('month', order_date) AS month,

SUM(order_amount) AS revenue

FROM orders

GROUP BY 1

)

SELECT

month,

revenue,

LAG(revenue) OVER (ORDER BY month) AS prior_month_revenue,

revenue - LAG(revenue) OVER (ORDER BY month) AS month_over_month_change

FROM monthly_revenue

ORDER BY month

The CTE monthly_revenue is computed first, then the outer query references it to add window function calculations. The query reads as a sequence of logical steps rather than a nested tangle.

CTE vs Subquery

The same result is achievable with a subquery:

SELECT

month,

revenue,

LAG(revenue) OVER (ORDER BY month) AS prior_month_revenue

FROM (

SELECT

DATE_TRUNC('month', order_date) AS month,

SUM(order_amount) AS revenue

FROM orders

GROUP BY 1

) monthly_revenue

ORDER BY month

The query logic is identical. The CTE version is preferred because:

- **Readability**: The WITH clause at the top names and describes each logical step. Readers understand the structure before reading the main query. Subqueries buried inside FROM clauses require reading inside-out.

- **Reusability within the query**: A CTE defined once can be referenced multiple times in the same query. A subquery must be duplicated everywhere it is used.

- **Debugging**: You can test a CTE in isolation by running its body as a standalone query. Nested subqueries require extracting and reformatting.

CTEs do not have a performance advantage over equivalent subqueries — most query optimisers treat them identically. Choose CTEs for readability, not performance.

Multiple CTEs

Chain multiple CTEs before the main query using commas:

WITH

cohort_assignments AS (

SELECT

user_id,

DATE_TRUNC('month', MIN(event_date)) AS cohort_month

FROM events

GROUP BY user_id

),

monthly_activity AS (

SELECT

user_id,

DATE_TRUNC('month', event_date) AS activity_month

FROM events

GROUP BY user_id, DATE_TRUNC('month', event_date)

),

cohort_periods AS (

SELECT

ca.user_id,

ca.cohort_month,

ma.activity_month,

DATEDIFF('month', ca.cohort_month, ma.activity_month) AS period_number

FROM cohort_assignments ca

JOIN monthly_activity ma ON ca.user_id = ma.user_id

)

SELECT

cohort_month,

period_number,

COUNT(DISTINCT user_id) AS active_users

FROM cohort_periods

GROUP BY 1, 2

ORDER BY cohort_month, period_number

Each CTE can reference CTEs defined before it. Later CTEs build on earlier ones. The query reads like a series of transformation steps — the same logical structure as a dbt model with multiple CTEs, which is a common dbt convention.

The dbt CTE Convention

dbt models commonly use a specific CTE structure that has become standard in analytics engineering:

WITH

-- Import CTEs: bring in source data

orders AS (SELECT * FROM {{ ref('stg_orders') }}),

customers AS (SELECT * FROM {{ ref('stg_customers') }}),

-- Logical transformation CTEs

orders_with_revenue AS (

SELECT

order_id,

customer_id,

order_date,

order_amount,

order_amount * tax_rate AS tax_amount

FROM orders

),

customer_totals AS (

SELECT

customer_id,

COUNT(*) AS order_count,

SUM(order_amount) AS lifetime_revenue

FROM orders_with_revenue

GROUP BY customer_id

),

-- Final SELECT

final AS (

SELECT

o.order_id,

o.order_date,

c.customer_name,

ct.lifetime_revenue,

ct.order_count

FROM orders_with_revenue o

JOIN customers c USING (customer_id)

JOIN customer_totals ct USING (customer_id)

)

SELECT * FROM final

This structure makes dbt models readable by separating imports (no logic), transformations (logic), and the final SELECT (assembly). New readers can understand the model's purpose by skimming the CTE names.

Recursive CTEs

Recursive CTEs compute iterative results — useful for hierarchical data (org charts, product categories, BOM explosions) or date spine generation.

A recursive CTE has two parts:

1. An anchor member: the starting row(s) of the recursion

2. A recursive member: the query that references the CTE itself and adds the next level

**Generating a date spine**:

WITH RECURSIVE date_spine AS (

SELECT DATE '2024-01-01' AS date

UNION ALL

SELECT date + INTERVAL '1 day'

FROM date_spine

WHERE date < DATE '2024-12-31'

)

SELECT * FROM date_spine

The anchor produces 2024-01-01. The recursive member adds one day each iteration until the WHERE condition terminates recursion. The result: every date in 2024.

**Traversing an org hierarchy**:

WITH RECURSIVE org_hierarchy AS (

-- Anchor: top-level employees (no manager)

SELECT employee_id, employee_name, manager_id, 0 AS level

FROM employees

WHERE manager_id IS NULL

UNION ALL

-- Recursive: employees reporting to any employee already in the result

SELECT e.employee_id, e.employee_name, e.manager_id, oh.level + 1

FROM employees e

JOIN org_hierarchy oh ON e.manager_id = oh.employee_id

)

SELECT employee_id, employee_name, level

FROM org_hierarchy

ORDER BY level, employee_name

Note: Recursive CTEs require RECURSIVE keyword in PostgreSQL and most ANSI SQL databases. BigQuery uses recursive CTEs with WITH RECURSIVE. Snowflake supports recursive CTEs natively.

CTE vs Temporary Table

For a query that references the same CTE three or more times, a temporary table may be more efficient — the CTE result set is computed once and stored, rather than re-evaluated on each reference.

CREATE TEMP TABLE expensive_aggregation AS

SELECT customer_id, SUM(amount) AS total

FROM large_fact_table

GROUP BY customer_id;

-- Reference multiple times cheaply

SELECT a.customer_id FROM expensive_aggregation a WHERE total > 1000;

SELECT b.customer_id FROM expensive_aggregation b WHERE total < 100;

Most modern query optimisers materialise CTEs referenced multiple times automatically (particularly in Snowflake and BigQuery). Check execution plans to confirm whether the CTE is being re-evaluated or cached.

When NOT to Use CTEs

**Simple single-step queries**: A CTE wrapping a single SELECT is unnecessary complexity. Use CTEs when they add readability to multi-step logic.

**Extremely large intermediate result sets**: If a CTE produces billions of rows that are then immediately filtered in the outer query, a subquery with the filter pushed inside is more efficient — it avoids materialising the full CTE before filtering.

**Replacing views**: CTEs are per-query and not reusable across queries. If the same transformation logic is needed in multiple queries, create a view or a dbt model instead of duplicating the CTE.

Our data architecture practice designs SQL-based analytics transformations using dbt — contact us to discuss your analytics engineering requirements.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →