Codeflix, a streaming video startup, is interested in measuring their user churn rate. Four months into launching Codeflix, management asked to look into subscription churn rates. It’s early on in the business and people are excited to know how the company is doing.
The marketing department is particularly interested in how the churn compares between two segments of users.
They provide a dataset containing subscription data for users who were acquired through two distinct channels.
The dataset provided contains one SQL table, subscriptions. Within the table, there are 4 columns:
1. Take a look at the first 100 rows of data in the subscriptions table. How many different segments?
SELECT * FROM subscriptions LIMIT 100;
SELECT DISTINCT segment FROM subscriptions; -- 2, segment 87 and 30
2. Determine the range of months of data provided. Which months to calculate churn for?
-- start
SELECT DISTINCT subscription_start FROM subscriptions ORDER BY 1; -- From 2016-12-01 to 2017-3-30
-- end
SELECT DISTINCT subscription_end FROM subscriptions ORDER BY 1; -- From 2017-01-01 to 2017-03-31 We can calculate churn rate from January 2017.
3. Calculate the churn rate for both segments (87 and 30) over the first 3 months of 2017. To get started, create a temporary table of months.
WITH months AS(
SELECT '2017-01-01' AS first_day,
'2017-01-31' AS last_day
UNION
SELECT '2017-02-01' AS first_day,
'2017-02-28' AS last_day
UNION
SELECT '2017-03-01' AS first_day,
'2017-03-31' AS last_day
) SELECT * FROM months;
-- Create a temporary table, cross_join, from subscriptions and the months.
WITH months AS(
SELECT '2017-01-01' AS first_day,
'2017-01-31' AS last_day
UNION
SELECT '2017-02-01' AS first_day,
'2017-02-28' AS last_day
UNION
SELECT '2017-03-01' AS first_day,
'2017-03-31' AS last_day
),
cross_join AS(
SELECT * FROM subscriptions
CROSS JOIN
months
) SELECT * FROM cross_join;
/* Create a temporary table, status, from the cross_join table that I created. This table should contain:
1. id selected from cross_join
2. month as an alias of first_day
3. is_active_87 created using a CASE WHEN to find any users from segment 87 who existed prior to the beginning of the month. This is 1 if true and 0 otherwise.
5. is_active_30 created using a CASE WHEN to find any users from segment 30 who existed prior to the beginning of the month. This is 1 if true and 0 otherwise.
Create a status_aggregate temporary table that is a SUM of the active and canceled subscriptions for each segment, for each month.
The resulting columns should be:
1. sum_active_87
2. sum_active_30
3. sum_canceled_87
4. sum_canceled_30
*/
/*
My Result(during 3 months):
1. sum_active_87 --> 1271
2. sum_active_30 --> 1525
3. sum_canceled_87 --> 476
4. sum_canceled_30 --> 144
Result(by month):
January:
sum_active_87 --> 278
sum_active_30 --> 291
sum_canceled_87 --> 70
sum_canceled_30 --> 22
February:
sum_active_87 --> 462
sum_active_30 --> 518
sum_canceled_87 --> 148
sum_canceled_30 --> 38
March:
sum_active_87 --> 531
sum_active_30 --> 716
sum_canceled_87 --> 258
sum_canceled_30 --> 84
*/
WITH
months AS(
SELECT '2017-01-01' AS first_day,
'2017-01-31' AS last_day
UNION
SELECT '2017-02-01' AS first_day,
'2017-02-28' AS last_day
UNION
SELECT '2017-03-01' AS first_day,
'2017-03-31' AS last_day
),
cross_join AS(
SELECT * FROM subscriptions
CROSS JOIN
months
),
status AS(
SELECT id, first_day AS month,
CASE
WHEN
subscription_start < first_day AND
segment == 87 AND
(subscription_end > first_day OR subscription_end IS NULL)
THEN 1 ELSE 0
END AS is_active_87,
CASE
WHEN
subscription_start < first_day AND
segment == 30 AND
(subscription_end > first_day OR subscription_end IS NULL)
THEN 1 ELSE 0
END AS is_active_30,
CASE
WHEN
subscription_end BETWEEN first_day AND last_day
AND segment == 87
THEN 1 ELSE 0
END AS is_canceled_87,
CASE
WHEN
subscription_end BETWEEN first_day AND last_day
AND segment == 30
THEN 1 ELSE 0
END AS is_canceled_30
FROM cross_join),
status_aggregate AS (
SELECT
month,
SUM(is_active_87) AS sum_active_87,
SUM(is_active_30) AS sum_active_30,
SUM(is_canceled_87) AS sum_canceled_87,
SUM(is_canceled_30) AS sum_canceled_30
FROM status GROUP BY 1),
churn_rate AS(
SELECT
month,
ROUND(sum_canceled_87*1.0/sum_active_87,2) AS churn_rate_87,
ROUND(sum_canceled_30*1.0/sum_active_30,2) AS churn_rate_30
FROM status_aggregate)
SELECT * FROM churn_rate;
4. Calculate the churn rates for the two segments over the three month period. Which segment has a lower churn rate?
/*
Result(by month): segment 30 has lower churn rate than segment 87.
January:
churn_rate_87 = 0.25
churn_rate_30 = 0.08
February:
churn_rate_87 = 0.32
churn_rate_30 = 0.07
March:
churn_rate_87 = 0.49
churn_rate_30 = 0.12
*/