BlogData Engineering

What Is a CTE? Common Table Expressions for Readable SQL

James Okafor
James Okafor
Senior Data Engineer
·August 28, 20288 min read

A Common Table Expression (CTE) is a named temporary result set defined within a SQL query using the WITH clause. CTEs make complex SQL more readable by naming intermediate results and composing them step by step — the foundation of how dbt models are structured. This guide explains CTEs, how they work, and when to use them.

A Common Table Expression (CTE) is a named temporary result set defined within a SQL query using the WITH keyword. It gives an intermediate query result a name, making it referenceable in the main query body that follows. CTEs do not create tables or persist results — they exist only for the duration of the query in which they are defined.

CTEs transform complex SQL from a deeply nested mess into a sequence of named, readable steps. This is why they are the structural foundation of dbt models: each CTE represents one named transformation step, composable with the steps that follow.

Basic CTE Syntax

A CTE begins with WITH, followed by a name and a SELECT statement in parentheses. The main query follows the CTE definition and can reference the CTE by name as if it were a table.

Multiple CTEs can be chained in a single query — each CTE can reference the CTEs defined before it. This chaining is the mechanism for building transformation logic step by step: import source data, apply a first transformation, apply a second transformation, produce the final output. Each step is named and readable.

Why CTEs Make SQL More Readable

Without CTEs, complex analytical SQL nests subqueries inside other subqueries — the innermost subquery runs first, its result becomes the input to the next subquery, and so on. Reading nested subqueries requires mentally traversing inward before understanding what each layer does.

CTEs invert this: you read top to bottom, with each named step building on what came before. "orders: the raw orders" — "filtered_orders: orders in the last 90 days" — "customer_order_counts: count orders per customer from filtered_orders" — "final: join customer_order_counts to customers to get customer attributes." The logic reads like documentation.

This readability matters for maintenance. SQL that another engineer can quickly understand is SQL that can be correctly modified. SQL that requires 10 minutes of mental parsing to understand is SQL that introduces bugs when modified under time pressure.

CTEs and dbt

dbt models are SQL files that produce a single table or view. The convention for structuring dbt models uses CTEs:

The pattern: import CTEs reference upstream models (using dbt's ref() function), logical CTEs apply business logic step by step, and the final SELECT produces the model's output. This structure makes dbt models readable and testable — each CTE can be inspected by temporarily commenting out later CTEs and running the model up to a particular step.

dbt's compilation replaces ref() function calls with the actual table or view names before executing the SQL, but the authored form uses meaningful model names that serve as documentation.

Recursive CTEs

Recursive CTEs — defined with WITH RECURSIVE — allow a CTE to reference itself, enabling iterative queries that are otherwise impossible in standard SQL:

The canonical recursive CTE use case is traversing hierarchical data: organizational reporting chains, product category hierarchies, bill-of-materials structures. The CTE defines an anchor member (the root of the hierarchy) and a recursive member (the step that traverses one level down), and the database engine iterates until no more rows are added.

Recursive CTEs are supported in most production databases (PostgreSQL, BigQuery, Snowflake, SQL Server, and others). Syntax varies slightly between databases — BigQuery's RECURSIVE keyword placement differs from PostgreSQL's.

CTEs vs Subqueries vs Temp Tables

**Subqueries** are inline — nested inside the main query. They accomplish the same transformation logic as CTEs but are less readable. Subqueries that are referenced more than once in the main query must be written multiple times (or the database executes them multiple times); CTEs can be referenced multiple times with a single definition.

**Temporary tables** persist within the session (or transaction) and can be indexed. For very large intermediate results that are referenced multiple times, a temp table can be more performant than a CTE because it is computed once and stored, while a CTE may be re-evaluated on each reference (depending on the database's query optimizer). For cloud analytical warehouses, the difference is usually negligible.

**CTEs** are the right choice for complex analytical queries where readability is the primary concern and the intermediate results are not large enough to warrant materialization. In dbt, CTEs are always preferable to nested subqueries — dbt models are long-lived artifacts that must be readable for future maintenance.

Our data engineering services practice writes production-quality SQL and dbt transformation models using well-structured CTEs. Contact us to discuss your data engineering requirements.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →