BlogAnalytics

Customer Churn Analysis: Measuring, Predicting, and Reducing Churn

Austin Duncan
Austin Duncan
Project Manager & Data Strategist
·December 29, 202712 min read

Churn analysis quantifies how many customers stop using a product or service in a given period. This guide covers calculating churn rate and gross revenue churn in SQL, identifying leading indicators of churn using cohort analysis, building a basic predictive churn model with logistic regression, and designing intervention programmes using churn propensity scores.

Why Churn Analysis Matters

Churn is the rate at which customers stop using a product or service. For subscription businesses, churn directly determines whether growth compounds or erodes: a 5% monthly churn rate means replacing 46% of the customer base every year just to stay flat.

Churn analysis answers three distinct questions. Measurement: what is the churn rate, when does churn happen, and which customer segments churn most? Prediction: which active customers are at high risk of churning in the next 30-90 days? Intervention: which actions reduce churn probability, and what is the ROI of retention programmes?

Calculating Churn Rate in SQL

Churn rate definition varies by business model. Define precisely before building the query.

**Monthly user churn** (for subscription or active-user products):

WITH monthly_actives AS (

SELECT

DATE_TRUNC('month', activity_date) AS month,

user_id

FROM user_activity

GROUP BY 1, 2

),

churn_calc AS (

SELECT

m1.month,

COUNT(DISTINCT m1.user_id) AS active_users,

COUNT(DISTINCT m2.user_id) AS retained_users,

COUNT(DISTINCT m1.user_id) - COUNT(DISTINCT m2.user_id) AS churned_users

FROM monthly_actives m1

LEFT JOIN monthly_actives m2

ON m1.user_id = m2.user_id

AND m2.month = m1.month + INTERVAL '1 month'

GROUP BY m1.month

)

SELECT

month,

active_users,

churned_users,

ROUND(100.0 * churned_users / active_users, 2) AS churn_rate_pct

FROM churn_calc

ORDER BY month

**Gross Revenue Churn (GRC)**: Revenue lost from churned and downgraded customers as a percentage of prior-period revenue. Critical for subscription businesses — even if user count stays flat, revenue churn can be high if churning customers are high-value.

SELECT

month,

prior_mrr,

lost_mrr_from_churn,

lost_mrr_from_downgrade,

ROUND(100.0 * (lost_mrr_from_churn + lost_mrr_from_downgrade) / prior_mrr, 2) AS gross_revenue_churn_pct

FROM mrr_movements

**Net Revenue Retention (NRR)**: The amount retained plus expansion revenue, as a percentage of prior-period revenue. NRR above 100% means existing customers are growing faster than they churn — a key SaaS health metric.

Identifying When Churn Happens: Survival Analysis

Survival analysis tracks what percentage of a cohort survives (remains active) at each period after acquisition. It reveals whether churn is front-loaded (users churn immediately), distributed, or back-loaded (users stay for a while then leave).

WITH cohort_periods AS (

SELECT

u.user_id,

DATE_TRUNC('month', u.created_at) AS cohort_month,

generate_series(0, 24) AS period

FROM users u

),

SELECT

cohort_month,

period,

COUNT(DISTINCT cp.user_id) AS cohort_size,

COUNT(DISTINCT CASE WHEN a.user_id IS NOT NULL THEN cp.user_id END) AS surviving_users,

ROUND(100.0 * COUNT(DISTINCT CASE WHEN a.user_id IS NOT NULL THEN cp.user_id END) / MAX(cohort_size), 1) AS survival_rate

FROM cohort_periods cp

LEFT JOIN monthly_actives a

ON cp.user_id = a.user_id

AND DATE_DIFF('month', cp.cohort_month, a.month) = cp.period

GROUP BY 1, 2

The survival curve reveals at which period churn is highest — common patterns: a steep drop in months 1-3 (onboarding failure churn), gradual decline, or a delayed cliff at contract renewal dates.

Segmenting Churn by Customer Attributes

Aggregate churn rate rarely yields actionable insight. Break churn by segment to identify where it is concentrated:

SELECT

u.plan_type,

u.acquisition_channel,

u.company_size_band,

COUNT(DISTINCT CASE WHEN churn.user_id IS NOT NULL THEN u.user_id END) AS churned_users,

COUNT(DISTINCT u.user_id) AS active_users,

ROUND(100.0 * COUNT(DISTINCT CASE WHEN churn.user_id IS NOT NULL THEN u.user_id END) / COUNT(DISTINCT u.user_id), 1) AS churn_rate

FROM users u

LEFT JOIN churned_users churn ON u.user_id = churn.user_id

GROUP BY 1, 2, 3

ORDER BY churn_rate DESC

Common high-churn segments: users on free or trial plans, users who did not complete onboarding, users acquired through low-intent channels, users at small companies with lower budget stability.

Leading Indicators of Churn

Predicting churn requires identifying behaviours that precede churn. Build a feature table for each customer over a trailing period:

- Login frequency in last 30 days (declining = risk signal)

- Days since last login

- Feature adoption breadth (users of one feature churn faster than users of multiple)

- Support ticket volume and sentiment

- Contract renewal date proximity

- Plan tier and price sensitivity signals

- Stakeholder change (new decision-maker = re-evaluation risk)

Compute these features in SQL:

SELECT

u.user_id,

DATEDIFF('day', MAX(a.activity_date), CURRENT_DATE) AS days_since_last_active,

COUNT(DISTINCT CASE WHEN a.activity_date >= CURRENT_DATE - 30 THEN DATE(a.activity_date) END) AS active_days_last_30,

COUNT(DISTINCT a.feature_used) AS feature_breadth,

u.plan_type,

u.contract_renewal_date,

DATEDIFF('day', CURRENT_DATE, u.contract_renewal_date) AS days_to_renewal

FROM users u

LEFT JOIN activity_events a ON u.user_id = a.user_id

GROUP BY u.user_id, u.plan_type, u.contract_renewal_date

Basic Predictive Churn Model

With churn labels (churned in next 30 days: 1/0) and features per customer, a logistic regression provides a churn propensity score for each active customer.

In Python with scikit-learn:

from sklearn.linear_model import LogisticRegression

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

X = features[['days_since_last_active', 'active_days_last_30', 'feature_breadth', 'days_to_renewal']]

y = features['churned_next_30_days']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = StandardScaler()

X_train_scaled = scaler.fit_transform(X_train)

model = LogisticRegression()

model.fit(X_train_scaled, y_train)

scores = model.predict_proba(scaler.transform(X[features]))[:, 1]

Score every active customer daily. Customers above a threshold (typically the top 10-20% of risk score) are flagged for intervention — customer success outreach, discount offer, proactive training session.

Intervention Design

Churn prediction without intervention design is reporting, not value creation. For each risk tier:

- **High risk, high value**: Customer success manager personal outreach within 24 hours — not an automated email, a human conversation to identify the root cause of declining engagement

- **High risk, medium value**: Automated in-app nudge to a feature they have not adopted; email offering a training session; proactive support check-in

- **Medium risk**: Automated re-engagement campaign highlighting unused features or new releases relevant to their use case

Track intervention outcomes: did customers who received outreach churn at a lower rate than a holdout group that did not? This measures the actual ROI of the retention programme — not just whether the model predicts churn, but whether interventions actually reduce it.

Our BI strategy practice designs churn analytics frameworks and customer health scoring models — contact us to discuss your retention analytics requirements.

Get your data architecture audit in 30 minutes.

A former Microsoft data architect audits your data foundation, identifies your top priorities, and sends you a written plan. Free. No pitch.

Book a Call →