Cohort analysis groups users by a shared characteristic at a point in time — typically acquisition date — and tracks their behaviour over subsequent periods. This guide covers building cohort retention tables in SQL, calculating rolling retention rates, visualising cohort grids, and extending cohort analysis to revenue and lifetime value calculations.
What Cohort Analysis Is
Cohort analysis groups users by a shared characteristic at a point in time — typically when they first interacted with a product (acquisition cohort) — and tracks their behaviour across subsequent time periods. Instead of looking at aggregate user behaviour, cohort analysis compares the behaviour of users acquired in different periods or through different channels, revealing how retention, revenue, and engagement evolve over a user's lifetime.
For product and marketing teams, cohort analysis answers: Are users acquired last month retaining better than users acquired six months ago? Does the March cohort have higher 90-day retention than the February cohort? How much revenue does the typical user generate in months 1 through 12 after acquisition?
Building a Cohort Retention Table in SQL
The standard cohort analysis calculates, for each cohort (users grouped by acquisition month), what percentage of users were active in each subsequent month.
**Step 1 — Assign cohort period**:
WITH user_cohorts AS (
SELECT
user_id,
DATE_TRUNC('month', first_activity_date) AS cohort_month
FROM users
),
This assigns each user to their acquisition cohort — the calendar month they first appeared.
**Step 2 — Join activity back to cohort**:
user_activity AS (
SELECT
a.user_id,
uc.cohort_month,
DATE_TRUNC('month', a.activity_date) AS activity_month,
DATEDIFF('month', uc.cohort_month, DATE_TRUNC('month', a.activity_date)) AS period_number
FROM activity_events a
JOIN user_cohorts uc ON a.user_id = uc.user_id
),
period_number = 0 is the acquisition month; period_number = 1 is one month after acquisition; and so on.
**Step 3 — Count users per cohort per period**:
cohort_counts AS (
SELECT
cohort_month,
period_number,
COUNT(DISTINCT user_id) AS active_users
FROM user_activity
GROUP BY 1, 2
),
**Step 4 — Calculate retention rate**:
cohort_sizes AS (
SELECT cohort_month, active_users AS cohort_size
FROM cohort_counts
WHERE period_number = 0
)
SELECT
c.cohort_month,
c.period_number,
c.active_users,
cs.cohort_size,
ROUND(100.0 * c.active_users / cs.cohort_size, 1) AS retention_rate
FROM cohort_counts c
JOIN cohort_sizes cs ON c.cohort_month = cs.cohort_month
ORDER BY cohort_month, period_number
The result is a table where each row is one cohort-period combination with its retention rate. Pivot this into a grid for the classic cohort heatmap visualisation.
Pivoting the Cohort Grid
Most BI tools can pivot the long-format cohort table into a grid with cohorts as rows and period numbers as columns. In SQL:
SELECT
cohort_month,
MAX(CASE WHEN period_number = 0 THEN retention_rate END) AS period_0,
MAX(CASE WHEN period_number = 1 THEN retention_rate END) AS period_1,
MAX(CASE WHEN period_number = 2 THEN retention_rate END) AS period_2,
MAX(CASE WHEN period_number = 3 THEN retention_rate END) AS period_3,
MAX(CASE WHEN period_number = 6 THEN retention_rate END) AS period_6,
MAX(CASE WHEN period_number = 12 THEN retention_rate END) AS period_12
FROM retention_base
GROUP BY cohort_month
ORDER BY cohort_month
In Tableau or Power BI, use the long-format table with cohort_month on rows, period_number on columns, and retention_rate as the colour measure — this produces the retention heatmap without pivoting in SQL.
Revenue Cohort Analysis
Retention counts users; revenue cohort analysis tracks cumulative revenue generated per cohort. This reveals customer lifetime value by acquisition cohort — essential for evaluating marketing channel efficiency.
SELECT
uc.cohort_month,
DATEDIFF('month', uc.cohort_month, DATE_TRUNC('month', o.order_date)) AS period_number,
COUNT(DISTINCT o.user_id) AS paying_users,
SUM(o.revenue) AS cohort_revenue,
SUM(o.revenue) / cs.cohort_size AS revenue_per_cohort_user
FROM orders o
JOIN user_cohorts uc ON o.user_id = uc.user_id
JOIN cohort_sizes cs ON uc.cohort_month = cs.cohort_month
GROUP BY 1, 2, cs.cohort_size
ORDER BY cohort_month, period_number
Adding cumulative revenue with window functions:
SUM(cohort_revenue) OVER (
PARTITION BY cohort_month
ORDER BY period_number
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cumulative_revenue,
SUM(revenue_per_cohort_user) OVER (
PARTITION BY cohort_month
ORDER BY period_number
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cumulative_ltv
Cumulative LTV by period number lets you compare how quickly different cohorts are generating total lifetime value — cohorts that monetise faster have better unit economics.
Behavioural Cohorts
Not all cohorts are defined by acquisition date. Behavioural cohort analysis groups users by a specific action they took:
- Users who completed onboarding vs users who did not
- Users who used a specific feature in week 1
- Users who upgraded to paid in their first 7 days
Compare 30-day retention between users who completed onboarding and users who did not:
WITH onboarding_status AS (
SELECT
user_id,
MAX(CASE WHEN event_type = 'onboarding_complete' THEN 1 ELSE 0 END) AS completed_onboarding,
MIN(created_at) AS cohort_date
FROM events
WHERE DATEDIFF('day', user_created_at, event_date) <= 7
GROUP BY user_id
),
Compare retention curves between the two cohorts — if users who complete onboarding have 30-day retention of 40% vs 15% for those who do not, onboarding completion is a strong leading indicator and an intervention target.
Common Pitfalls
**Denominator consistency**: The cohort size denominator must be fixed at period 0. Do not recalculate it at each period or your retention rates will be inconsistent.
**Definition of active**: Define active precisely — any login, a meaningful action (completed a core task), or generated revenue. Different definitions produce different retention curves. Document the definition alongside the analysis.
**Partial cohorts at analysis time**: The most recent cohort (e.g., users acquired last month) has had less time to churn than older cohorts. Period 3 retention for last month's cohort is undefined or will be populated as data arrives. Handle by showing NULL or by flagging future periods for recent cohorts.
**Recency bias in visible data**: A cohort grid may appear to show improving retention for recent cohorts simply because those cohorts have not yet had time to churn. Compare cohorts at the same period number (e.g., all cohorts at period 3) for apples-to-apples comparison.
**User identification**: If user identity changes (anonymous to authenticated, account merges), cohort assignment may be inconsistent. Resolve identity before building cohort tables.
Connecting Cohort Analysis to Business Decisions
Cohort analysis is most valuable when connected to acquisition channels, product changes, and intervention programmes:
- Break cohorts by acquisition channel — which channels produce users with higher long-term retention?
- Break cohorts by month relative to a product change — did the March feature launch improve 30-day retention for the April cohort vs the March cohort?
- Identify the period where retention curves diverge between cohorts — where are users churning, and what changes might address it?
The numbers are the starting point for investigation, not conclusions. A better April cohort could be a product improvement or a seasonal acquisition channel shift. Cohort analysis surfaces the where and when; understanding why requires qualitative research alongside the quantitative signal.
Our BI strategy practice designs analytics frameworks including cohort analysis, retention tracking, and customer lifetime value models — contact us to discuss your analytics requirements.
A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.
Book a Call →