Common Table Expressions (CTEs) let you define named subqueries at the top of a SQL statement and reference them like temporary tables throughout the query. This guide covers CTE syntax, recursive CTEs, when to use CTEs versus subqueries or temporary tables, and patterns for using CTEs to build readable multi-step analytical queries.
What CTEs Are
A Common Table Expression (CTE) is a named, temporary result set defined at the beginning of a SQL statement using the WITH keyword. CTEs are referenced like tables within the same query. They are not persisted — they exist only for the duration of the query.
WITH monthly_revenue AS (
SELECT
DATE_TRUNC('month', order_date) AS month,
SUM(order_amount) AS revenue
FROM orders
GROUP BY 1
)
SELECT
month,
revenue,
LAG(revenue) OVER (ORDER BY month) AS prior_month_revenue,
revenue - LAG(revenue) OVER (ORDER BY month) AS month_over_month_change
FROM monthly_revenue
ORDER BY month
The CTE monthly_revenue is computed first, then the outer query references it to add window function calculations. The query reads as a sequence of logical steps rather than a nested tangle.
CTE vs Subquery
The same result is achievable with a subquery:
SELECT
month,
revenue,
LAG(revenue) OVER (ORDER BY month) AS prior_month_revenue
FROM (
SELECT
DATE_TRUNC('month', order_date) AS month,
SUM(order_amount) AS revenue
FROM orders
GROUP BY 1
) monthly_revenue
ORDER BY month
The query logic is identical. The CTE version is preferred because:
- **Readability**: The WITH clause at the top names and describes each logical step. Readers understand the structure before reading the main query. Subqueries buried inside FROM clauses require reading inside-out.
- **Reusability within the query**: A CTE defined once can be referenced multiple times in the same query. A subquery must be duplicated everywhere it is used.
- **Debugging**: You can test a CTE in isolation by running its body as a standalone query. Nested subqueries require extracting and reformatting.
CTEs do not have a performance advantage over equivalent subqueries — most query optimisers treat them identically. Choose CTEs for readability, not performance.
Multiple CTEs
Chain multiple CTEs before the main query using commas:
WITH
cohort_assignments AS (
SELECT
user_id,
DATE_TRUNC('month', MIN(event_date)) AS cohort_month
FROM events
GROUP BY user_id
),
monthly_activity AS (
SELECT
user_id,
DATE_TRUNC('month', event_date) AS activity_month
FROM events
GROUP BY user_id, DATE_TRUNC('month', event_date)
),
cohort_periods AS (
SELECT
ca.user_id,
ca.cohort_month,
ma.activity_month,
DATEDIFF('month', ca.cohort_month, ma.activity_month) AS period_number
FROM cohort_assignments ca
JOIN monthly_activity ma ON ca.user_id = ma.user_id
)
SELECT
cohort_month,
period_number,
COUNT(DISTINCT user_id) AS active_users
FROM cohort_periods
GROUP BY 1, 2
ORDER BY cohort_month, period_number
Each CTE can reference CTEs defined before it. Later CTEs build on earlier ones. The query reads like a series of transformation steps — the same logical structure as a dbt model with multiple CTEs, which is a common dbt convention.
The dbt CTE Convention
dbt models commonly use a specific CTE structure that has become standard in analytics engineering:
WITH
-- Import CTEs: bring in source data
orders AS (SELECT * FROM {{ ref('stg_orders') }}),
customers AS (SELECT * FROM {{ ref('stg_customers') }}),
-- Logical transformation CTEs
orders_with_revenue AS (
SELECT
order_id,
customer_id,
order_date,
order_amount,
order_amount * tax_rate AS tax_amount
FROM orders
),
customer_totals AS (
SELECT
customer_id,
COUNT(*) AS order_count,
SUM(order_amount) AS lifetime_revenue
FROM orders_with_revenue
GROUP BY customer_id
),
-- Final SELECT
final AS (
SELECT
o.order_id,
o.order_date,
c.customer_name,
ct.lifetime_revenue,
ct.order_count
FROM orders_with_revenue o
JOIN customers c USING (customer_id)
JOIN customer_totals ct USING (customer_id)
)
SELECT * FROM final
This structure makes dbt models readable by separating imports (no logic), transformations (logic), and the final SELECT (assembly). New readers can understand the model's purpose by skimming the CTE names.
Recursive CTEs
Recursive CTEs compute iterative results — useful for hierarchical data (org charts, product categories, BOM explosions) or date spine generation.
A recursive CTE has two parts:
1. An anchor member: the starting row(s) of the recursion
2. A recursive member: the query that references the CTE itself and adds the next level
**Generating a date spine**:
WITH RECURSIVE date_spine AS (
SELECT DATE '2024-01-01' AS date
UNION ALL
SELECT date + INTERVAL '1 day'
FROM date_spine
WHERE date < DATE '2024-12-31'
)
SELECT * FROM date_spine
The anchor produces 2024-01-01. The recursive member adds one day each iteration until the WHERE condition terminates recursion. The result: every date in 2024.
**Traversing an org hierarchy**:
WITH RECURSIVE org_hierarchy AS (
-- Anchor: top-level employees (no manager)
SELECT employee_id, employee_name, manager_id, 0 AS level
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive: employees reporting to any employee already in the result
SELECT e.employee_id, e.employee_name, e.manager_id, oh.level + 1
FROM employees e
JOIN org_hierarchy oh ON e.manager_id = oh.employee_id
)
SELECT employee_id, employee_name, level
FROM org_hierarchy
ORDER BY level, employee_name
Note: Recursive CTEs require RECURSIVE keyword in PostgreSQL and most ANSI SQL databases. BigQuery uses recursive CTEs with WITH RECURSIVE. Snowflake supports recursive CTEs natively.
CTE vs Temporary Table
For a query that references the same CTE three or more times, a temporary table may be more efficient — the CTE result set is computed once and stored, rather than re-evaluated on each reference.
CREATE TEMP TABLE expensive_aggregation AS
SELECT customer_id, SUM(amount) AS total
FROM large_fact_table
GROUP BY customer_id;
-- Reference multiple times cheaply
SELECT a.customer_id FROM expensive_aggregation a WHERE total > 1000;
SELECT b.customer_id FROM expensive_aggregation b WHERE total < 100;
Most modern query optimisers materialise CTEs referenced multiple times automatically (particularly in Snowflake and BigQuery). Check execution plans to confirm whether the CTE is being re-evaluated or cached.
When NOT to Use CTEs
**Simple single-step queries**: A CTE wrapping a single SELECT is unnecessary complexity. Use CTEs when they add readability to multi-step logic.
**Extremely large intermediate result sets**: If a CTE produces billions of rows that are then immediately filtered in the outer query, a subquery with the filter pushed inside is more efficient — it avoids materialising the full CTE before filtering.
**Replacing views**: CTEs are per-query and not reusable across queries. If the same transformation logic is needed in multiple queries, create a view or a dbt model instead of duplicating the CTE.
Our data architecture practice designs SQL-based analytics transformations using dbt — contact us to discuss your analytics engineering requirements.
A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.
Book a Call →