BlogAnalytics

Cohort Analysis in SQL: Retention, Churn, and Lifetime Value

Eric Chen
Eric Chen
BI Solutions Architect
·December 22, 202711 min read

Cohort analysis groups users by a shared characteristic at a point in time — typically acquisition date — and tracks their behaviour over subsequent periods. This guide covers building cohort retention tables in SQL, calculating rolling retention rates, visualising cohort grids, and extending cohort analysis to revenue and lifetime value calculations.

What Cohort Analysis Is

Cohort analysis groups users by a shared characteristic at a point in time — typically when they first interacted with a product (acquisition cohort) — and tracks their behaviour across subsequent time periods. Instead of looking at aggregate user behaviour, cohort analysis compares the behaviour of users acquired in different periods or through different channels, revealing how retention, revenue, and engagement evolve over a user's lifetime.

For product and marketing teams, cohort analysis answers: Are users acquired last month retaining better than users acquired six months ago? Does the March cohort have higher 90-day retention than the February cohort? How much revenue does the typical user generate in months 1 through 12 after acquisition?

Building a Cohort Retention Table in SQL

The standard cohort analysis calculates, for each cohort (users grouped by acquisition month), what percentage of users were active in each subsequent month.

**Step 1 — Assign cohort period**:

WITH user_cohorts AS (

SELECT

user_id,

DATE_TRUNC('month', first_activity_date) AS cohort_month

FROM users

),

This assigns each user to their acquisition cohort — the calendar month they first appeared.

**Step 2 — Join activity back to cohort**:

user_activity AS (

SELECT

a.user_id,

uc.cohort_month,

DATE_TRUNC('month', a.activity_date) AS activity_month,

DATEDIFF('month', uc.cohort_month, DATE_TRUNC('month', a.activity_date)) AS period_number

FROM activity_events a

JOIN user_cohorts uc ON a.user_id = uc.user_id

),

period_number = 0 is the acquisition month; period_number = 1 is one month after acquisition; and so on.

**Step 3 — Count users per cohort per period**:

cohort_counts AS (

SELECT

cohort_month,

period_number,

COUNT(DISTINCT user_id) AS active_users

FROM user_activity

GROUP BY 1, 2

),

**Step 4 — Calculate retention rate**:

cohort_sizes AS (

SELECT cohort_month, active_users AS cohort_size

FROM cohort_counts

WHERE period_number = 0

)

SELECT

c.cohort_month,

c.period_number,

c.active_users,

cs.cohort_size,

ROUND(100.0 * c.active_users / cs.cohort_size, 1) AS retention_rate

FROM cohort_counts c

JOIN cohort_sizes cs ON c.cohort_month = cs.cohort_month

ORDER BY cohort_month, period_number

The result is a table where each row is one cohort-period combination with its retention rate. Pivot this into a grid for the classic cohort heatmap visualisation.

Pivoting the Cohort Grid

Most BI tools can pivot the long-format cohort table into a grid with cohorts as rows and period numbers as columns. In SQL:

SELECT

cohort_month,

MAX(CASE WHEN period_number = 0 THEN retention_rate END) AS period_0,

MAX(CASE WHEN period_number = 1 THEN retention_rate END) AS period_1,

MAX(CASE WHEN period_number = 2 THEN retention_rate END) AS period_2,

MAX(CASE WHEN period_number = 3 THEN retention_rate END) AS period_3,

MAX(CASE WHEN period_number = 6 THEN retention_rate END) AS period_6,

MAX(CASE WHEN period_number = 12 THEN retention_rate END) AS period_12

FROM retention_base

GROUP BY cohort_month

ORDER BY cohort_month

In Tableau or Power BI, use the long-format table with cohort_month on rows, period_number on columns, and retention_rate as the colour measure — this produces the retention heatmap without pivoting in SQL.

Revenue Cohort Analysis

Retention counts users; revenue cohort analysis tracks cumulative revenue generated per cohort. This reveals customer lifetime value by acquisition cohort — essential for evaluating marketing channel efficiency.

SELECT

uc.cohort_month,

DATEDIFF('month', uc.cohort_month, DATE_TRUNC('month', o.order_date)) AS period_number,

COUNT(DISTINCT o.user_id) AS paying_users,

SUM(o.revenue) AS cohort_revenue,

SUM(o.revenue) / cs.cohort_size AS revenue_per_cohort_user

FROM orders o

JOIN user_cohorts uc ON o.user_id = uc.user_id

JOIN cohort_sizes cs ON uc.cohort_month = cs.cohort_month

GROUP BY 1, 2, cs.cohort_size

ORDER BY cohort_month, period_number

Adding cumulative revenue with window functions:

SUM(cohort_revenue) OVER (

PARTITION BY cohort_month

ORDER BY period_number

ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

) AS cumulative_revenue,

SUM(revenue_per_cohort_user) OVER (

PARTITION BY cohort_month

ORDER BY period_number

ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

) AS cumulative_ltv

Cumulative LTV by period number lets you compare how quickly different cohorts are generating total lifetime value — cohorts that monetise faster have better unit economics.

Behavioural Cohorts

Not all cohorts are defined by acquisition date. Behavioural cohort analysis groups users by a specific action they took:

- Users who completed onboarding vs users who did not

- Users who used a specific feature in week 1

- Users who upgraded to paid in their first 7 days

Compare 30-day retention between users who completed onboarding and users who did not:

WITH onboarding_status AS (

SELECT

user_id,

MAX(CASE WHEN event_type = 'onboarding_complete' THEN 1 ELSE 0 END) AS completed_onboarding,

MIN(created_at) AS cohort_date

FROM events

WHERE DATEDIFF('day', user_created_at, event_date) <= 7

GROUP BY user_id

),

Compare retention curves between the two cohorts — if users who complete onboarding have 30-day retention of 40% vs 15% for those who do not, onboarding completion is a strong leading indicator and an intervention target.

Common Pitfalls

**Denominator consistency**: The cohort size denominator must be fixed at period 0. Do not recalculate it at each period or your retention rates will be inconsistent.

**Definition of active**: Define active precisely — any login, a meaningful action (completed a core task), or generated revenue. Different definitions produce different retention curves. Document the definition alongside the analysis.

**Partial cohorts at analysis time**: The most recent cohort (e.g., users acquired last month) has had less time to churn than older cohorts. Period 3 retention for last month's cohort is undefined or will be populated as data arrives. Handle by showing NULL or by flagging future periods for recent cohorts.

**Recency bias in visible data**: A cohort grid may appear to show improving retention for recent cohorts simply because those cohorts have not yet had time to churn. Compare cohorts at the same period number (e.g., all cohorts at period 3) for apples-to-apples comparison.

**User identification**: If user identity changes (anonymous to authenticated, account merges), cohort assignment may be inconsistent. Resolve identity before building cohort tables.

Connecting Cohort Analysis to Business Decisions

Cohort analysis is most valuable when connected to acquisition channels, product changes, and intervention programmes:

- Break cohorts by acquisition channel — which channels produce users with higher long-term retention?

- Break cohorts by month relative to a product change — did the March feature launch improve 30-day retention for the April cohort vs the March cohort?

- Identify the period where retention curves diverge between cohorts — where are users churning, and what changes might address it?

The numbers are the starting point for investigation, not conclusions. A better April cohort could be a product improvement or a seasonal acquisition channel shift. Cohort analysis surfaces the where and when; understanding why requires qualitative research alongside the quantitative signal.

Our BI strategy practice designs analytics frameworks including cohort analysis, retention tracking, and customer lifetime value models — contact us to discuss your analytics requirements.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →