In this blog, I want to introduce Markov chains and find out the customer lifetime value. Customer Lifetime Value is the net present value (NPV) of the future margin generated from a customer or a customer segment.

Concept

Markov Chains
In probability theory and statistics, a sequence or other collection of random variables is independent and identically distributed (i.i.d. or iid or IID) if each random variable has the same probability distribution as the others and all are mutually independent.
We have a set of states, S = {s1, s2,…,sr}. The process starts in one of these states and moves successively from one state to another. Each move is called a step. If the chain is currently in state \(s_i\), then it moves to state \(s_j\) at the next step with a probability denoted by \(p_{ij}\) , and this probability does not depend upon which states the chain was in before the current state.

That is the probability to be present in state j at time t+1 is only dependent on the state at time t. \[ P_{ij} = P(X_{t+1} = j | X_{t} = i) \]

Customer Lifetime Value
Customer lifetime value is important because, the higher the number, the greater the profits. You’ll always have to spend money to acquire new customers and to retain existing ones, but the former costs five times as much. When you know your customer lifetime value, you can improve it.

The customer segments can be represented as states of the Markov chain. Let {0, 1, 2, …, m} be the states of a Markov chain in which states {1, 2,…, m} denote different customer segments and state 0 denotes non-customer state.

The steady-state retention probability is given by \[ R_t = 1 - \frac{\pi_0(1-P_{00})}{1-\pi_0}\] Where \(P_{00}\) is the transition probability of State 0, and \(\pi_0\) is the steady-state distribution for State 0.

Similarly, Customer lifetime value for N periods is given by: \[ CLV = \sum_{t=0}^{N} \frac{P_I×P^t×R}{(1+i)^t} \] where PI is the initial distribution of customers in different states, P is the transition probability matrix, R is the reward vector (margin generated in each customer segment). The interest rate is i (discount rate), \(d = 1 + \frac{1}{1+i}\) is the discount factor.

Data

The data is obtained from the UCI machine learning repository. It is an Online Retail Data Set which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail. The company mainly sells unique all-occasion gifts. Many customers of the company are wholesalers.

A sample data is shown below
raw_data
InvoiceNo StockCode Description Quantity InvoiceDate UnitPrice CustomerID Country
566486 22729 ALARM CLOCK BAKELIKE ORANGE 3 2011-09-13 09:59:00 3.75 16098 United Kingdom
539337 21790 VINTAGE SNAP CARDS 12 2010-12-17 10:46:00 0.85 NA EIRE
550513 22969 HOMEMADE JAM SCENTED CANDLES 12 2011-04-18 16:37:00 1.45 16928 United Kingdom
580736 22326 ROUND SNACK BOXES SET OF4 WOODLAND 6 2011-12-06 08:55:00 2.95 12716 France
562339 23306 SET OF 36 DOILIES PANTRY DESIGN 4 2011-08-04 11:59:00 1.45 13571 United Kingdom
##   InvoiceNo          StockCode         Description       
##  Length:541909      Length:541909      Length:541909     
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##                                                          
##     Quantity          InvoiceDate                    UnitPrice        
##  Min.   :-80995.00   Min.   :2010-12-01 08:26:00   Min.   :-11062.06  
##  1st Qu.:     1.00   1st Qu.:2011-03-28 11:34:00   1st Qu.:     1.25  
##  Median :     3.00   Median :2011-07-19 17:17:00   Median :     2.08  
##  Mean   :     9.55   Mean   :2011-07-04 13:34:57   Mean   :     4.61  
##  3rd Qu.:    10.00   3rd Qu.:2011-10-19 11:27:00   3rd Qu.:     4.13  
##  Max.   : 80995.00   Max.   :2011-12-09 12:50:00   Max.   : 38970.00  
##                                                                       
##    CustomerID       Country         
##  Min.   :12346    Length:541909     
##  1st Qu.:13953    Class :character  
##  Median :15152    Mode  :character  
##  Mean   :15288                      
##  3rd Qu.:16791                      
##  Max.   :18287                      
##  NA's   :135080

As I want to calculate CLV, I want to filter for customers that have done a transaction in Dec 2010 (only they will have enough representation in all states).

Filter for customers that have done a transaction in Dec 2010

cust_name <- (raw_data %>% 
                filter(month(InvoiceDate) == 12, year(InvoiceDate) == 2010, !is.na(CustomerID) ) %>% 
  dplyr::select(CustomerID) %>% 
  unique())$CustomerID
filtered_data <- raw_data %>% filter(CustomerID %in% cust_name) %>% group_by(InvoiceDate, CustomerID, Country) %>% 
  summarise(no_trans = n(), total_sales = sum(UnitPrice), mean_sales = mean(UnitPrice), total_quantity = sum(Quantity))

For random 10 customers, the total sales and number of items sold across time are shown in a bubble plot.

You can observe that there are gaps between purchases for different customers. ‘Gap’ is the difference in months between two successive purchases or the difference between the current month (despite no purchase) and the last purchase month. The frequency distribution of all the purchases at different gaps is shown below:

Cumulative frequency
gap_month Count cumsum
0 489 51.58228
1 108 62.97468
2 50 68.24895
3 33 71.72996
4 20 73.83966
5 30 77.00422
6 18 78.90295
7 18 80.80169
8 21 83.01688
9 6 83.64979
10 20 85.75949
11 38 89.76793
12 97 100.00000

From the above distribution, I am assuming that a customer who has not transacted for greater than 6 months is inactive.

Creating a Markov chain

Loading the libraries required in this section

At 2011-06-01, the state of each customer is given by:

elapsed_months <- function(end_date, start_date) {
    ed <- as.POSIXlt(end_date)
    sd <- as.POSIXlt(start_date)
    12 * (ed$year - sd$year) + (ed$mon - sd$mon)
}
final_classes <- filtered_data %>% filter(InvoiceDate < date('2011-07-01')) %>%
  group_by(CustomerID) %>% 
  summarise(recent_purchase_date = max(InvoiceDate)) %>% 
  mutate(Class1 = elapsed_months(date('2011-07-01'), date(recent_purchase_date))) %>% 
  mutate(Class1 = as.integer(Class1))
kable(final_classes %>% sample_n(10), align = 'c', caption = 'Initial state') %>% 
   kable_styling(full_width = F)
Initial state
CustomerID recent_purchase_date Class1
14735 2011-06-16 12:11:00 1
17524 2010-12-13 14:32:00 7
16385 2010-12-08 12:56:00 7
18011 2010-12-01 17:35:00 7
17228 2011-05-16 10:40:00 2
14388 2011-05-20 16:25:00 2
13178 2011-06-21 14:53:00 1
14533 2010-12-22 11:07:00 7
16858 2010-12-08 14:52:00 7
16244 2011-05-12 10:18:00 2

Here the states are defined as:

State Recency Level Explanation
1 1 Last purchase made this month
2 2 Last purchase made last month
3 3 Last purchase made 2 months ago
4 4 Last purchase made 3 months ago
5 5 Last purchase made 4 months ago
6 6 Last purchase made 5 months ago
7 7-12 Purchase made 6 months or before (Churn state)

Similarly the state of the customer at the start of each month is:

States after every month
CustomerID recent_purchase_date Class1 Class2 Class3 Class4 Class5 Class6
14299 2011-01-27 14:26:00 6 7 1 1 1 1
16086 2011-05-24 12:44:00 2 3 4 1 1 2
13495 2011-04-21 15:15:00 3 4 5 6 7 1
17581 2011-06-15 12:50:00 1 1 1 1 2 1
18071 2011-03-27 11:59:00 4 5 6 7 7 7
17238 2011-06-08 10:43:00 1 1 2 1 1 1
17525 2010-12-15 12:05:00 7 7 7 7 7 7
16752 2010-12-02 12:18:00 7 7 7 7 7 7
17580 2011-04-01 12:41:00 3 4 5 6 7 7
12682 2011-06-15 10:22:00 1 1 1 1 1 1

I can observe that every customer moves from one class (state) to another state every month (step). According to Markov, the probability of a customer to move to state j at any step is only given by the previous state i. \[ P_{ij} = P(X_{t+1} = j | X_{t} = i) \] For each interaction (Class1 to Class 2, month 2 to month 3, Step 3 to step 4), a transaction matrix can be created which has the probability of moving from i the state to j state.

But before I can create the one-step transition probabilities, I need to check whether the sequence of random variables can be approximated to a Markov chain. This is carried out using Anderson− Goodman test which is a chi-square test of independence.
The null and alternative hypotheses to check whether the sequence of random variables follows a Markov chain is stated below:
H0: The sequences of transitions (X1, X2, …, Xn) are independent (zero-order Markov chain)
HA: The sequences of transitions (X1, X2, …, Xn) are dependent (first-order Markov chain)
The corresponding test statistic is \[ \chi^2 = \sum_{i} \sum_{j} (\frac{(O_{ij} -E_{ij})^2}{E_{jj}}) \]

The transition probability matrix for the transition form State 1 to 2 is:

seq_matr <- markovchainFit(final_classes[3:4], method = "mle", name = 'CLT')
seq_matr$estimate
## CLT 
##  A  7 - dimensional discrete Markov Chain defined by the following states: 
##  1, 2, 3, 4, 5, 6, 7 
##  The transition matrix  (by rows)  is defined as follows: 
##            1         2         3    4         5         6         7
## 1 0.57777778 0.4222222 0.0000000 0.00 0.0000000 0.0000000 0.0000000
## 2 0.42957746 0.0000000 0.5704225 0.00 0.0000000 0.0000000 0.0000000
## 3 0.32000000 0.0000000 0.0000000 0.68 0.0000000 0.0000000 0.0000000
## 4 0.13461538 0.0000000 0.0000000 0.00 0.8653846 0.0000000 0.0000000
## 5 0.18181818 0.0000000 0.0000000 0.00 0.0000000 0.8181818 0.0000000
## 6 0.08333333 0.0000000 0.0000000 0.00 0.0000000 0.0000000 0.9166667
## 7 0.10800000 0.0000000 0.0000000 0.00 0.0000000 0.0000000 0.8920000

From the above matrix, P(4,5) = 0.8684211 means that the probability of moving from State 4 to State 5 is 86%. That means that a customer who has not purchased any item in 3 months has a 86% probability of not purchasing any item the next month also.

Similarly the Transition matrix for all the steps is:

sequenceMatr = list()
for(i in 1:5){
  sequenceMatr[[i]] <- markovchainFit(final_classes[(2+i):(3+i)], method = "map")$estimate@transitionMatrix
}
sequenceMatr
## [[1]]
##            1         2         3    4         5         6         7
## 1 0.57777778 0.4222222 0.0000000 0.00 0.0000000 0.0000000 0.0000000
## 2 0.42957746 0.0000000 0.5704225 0.00 0.0000000 0.0000000 0.0000000
## 3 0.32000000 0.0000000 0.0000000 0.68 0.0000000 0.0000000 0.0000000
## 4 0.13461538 0.0000000 0.0000000 0.00 0.8653846 0.0000000 0.0000000
## 5 0.18181818 0.0000000 0.0000000 0.00 0.0000000 0.8181818 0.0000000
## 6 0.08333333 0.0000000 0.0000000 0.00 0.0000000 0.0000000 0.9166667
## 7 0.10800000 0.0000000 0.0000000 0.00 0.0000000 0.0000000 0.8920000
## 
## [[2]]
##            1         2         3         4         5         6         7
## 1 0.61309524 0.3869048 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## 2 0.36184211 0.0000000 0.6381579 0.0000000 0.0000000 0.0000000 0.0000000
## 3 0.39506173 0.0000000 0.0000000 0.6049383 0.0000000 0.0000000 0.0000000
## 4 0.13725490 0.0000000 0.0000000 0.0000000 0.8627451 0.0000000 0.0000000
## 5 0.22222222 0.0000000 0.0000000 0.0000000 0.0000000 0.7777778 0.0000000
## 6 0.11111111 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.8888889
## 7 0.08984375 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.9101562
## 
## [[3]]
##           1         2         3         4         5         6         7
## 1 0.6220238 0.3779762 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## 2 0.4769231 0.0000000 0.5230769 0.0000000 0.0000000 0.0000000 0.0000000
## 3 0.4020619 0.0000000 0.0000000 0.5979381 0.0000000 0.0000000 0.0000000
## 4 0.3877551 0.0000000 0.0000000 0.0000000 0.6122449 0.0000000 0.0000000
## 5 0.2727273 0.0000000 0.0000000 0.0000000 0.0000000 0.7272727 0.0000000
## 6 0.2000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.8000000
## 7 0.1011673 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.8988327
## 
## [[4]]
##           1         2         3         4         5         6         7
## 1 0.5828877 0.4171123 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## 2 0.4566929 0.0000000 0.5433071 0.0000000 0.0000000 0.0000000 0.0000000
## 3 0.3823529 0.0000000 0.0000000 0.6176471 0.0000000 0.0000000 0.0000000
## 4 0.2586207 0.0000000 0.0000000 0.0000000 0.7413793 0.0000000 0.0000000
## 5 0.1333333 0.0000000 0.0000000 0.0000000 0.0000000 0.8666667 0.0000000
## 6 0.1875000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.8125000
## 7 0.1042471 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.8957529
## 
## [[5]]
##           1         2         3         4         5         6         7
## 1 0.7118644 0.2881356 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## 2 0.6666667 0.0000000 0.3333333 0.0000000 0.0000000 0.0000000 0.0000000
## 3 0.4492754 0.0000000 0.0000000 0.5507246 0.0000000 0.0000000 0.0000000
## 4 0.3095238 0.0000000 0.0000000 0.0000000 0.6904762 0.0000000 0.0000000
## 5 0.2790698 0.0000000 0.0000000 0.0000000 0.0000000 0.7209302 0.0000000
## 6 0.2307692 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.7692308
## 7 0.2170543 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.7829457

The Transition Probabilities at various steps seem to follow a pattern. I can do a likelihood ratio test to test the homogeneity of transition matrices. That means I want to find out if the changes in TP across time are random and I can take a constant TP to describe the different TP;s or not.
The Null and alternative hypothesis is:
\(H_0 : P_{ij} (t) = P_{ij}\)
\(H_1 : P_{ij} (t) \neq P_{ij}\)

The test statistic for a likelihood ratio test (Chi-square test) is given by \[ \chi^2 = \sum_{t} \sum_{i} \sum_{j} \frac{n(t)[\hat P_{ij}(t)-\hat P_{ij}]^2}{\hat P_{ij}} \] where n(t) is the number of customers in state i at time t. The test statistic follows a \(\chi^2\) distribution with (t − 1) × m × (m − 1) degrees of freedom.

verifyHomogeneity(sequenceMatr)
## Testing homogeneity of DTMC underlying input list 
## ChiSq statistic is 0.8694698 d.o.f are 192 corresponding p-value is 1
## $statistic
## [1] 0.8694698
## 
## $dof
## [1] 192
## 
## $pvalue
## [1] 1

As the p-value is greater than the \(\alpha = 0.05\), I am retaining the Null hypothesis that the transition probabilities are homogeneous.

The final TP will be as follows:

finalTP <- sequenceMatr[[1]]
for(i in 2:5){
  finalTP <- finalTP+sequenceMatr[[i]]
}
finalTP <- finalTP/5
CLT.mc <- new('markovchain', 
                  # states = colnames(finalTP),
                  transitionMatrix = finalTP,
                  name = 'CLT')
CLT.mc
## CLT 
##  A  7 - dimensional discrete Markov Chain defined by the following states: 
##  1, 2, 3, 4, 5, 6, 7 
##  The transition matrix  (by rows)  is defined as follows: 
##           1         2         3         4        5         6         7
## 1 0.6215298 0.3784702 0.0000000 0.0000000 0.000000 0.0000000 0.0000000
## 2 0.4783404 0.0000000 0.5216596 0.0000000 0.000000 0.0000000 0.0000000
## 3 0.3897504 0.0000000 0.0000000 0.6102496 0.000000 0.0000000 0.0000000
## 4 0.2455540 0.0000000 0.0000000 0.0000000 0.754446 0.0000000 0.0000000
## 5 0.2178342 0.0000000 0.0000000 0.0000000 0.000000 0.7821658 0.0000000
## 6 0.1625427 0.0000000 0.0000000 0.0000000 0.000000 0.0000000 0.8374573
## 7 0.1240625 0.0000000 0.0000000 0.0000000 0.000000 0.0000000 0.8759375

The Markov chain can be visualized as follows:

plotmat(t(CLT.mc@transitionMatrix), box.size = 0.05)

The initial frequency of the classes are:

initial_freq <- (final_classes %>% group_by(Class1) %>% summarise(frequency = n()/nrow(final_classes)))$frequency
initial_freq
## [1] 0.37974684 0.14978903 0.07911392 0.05485232 0.03481013 0.03797468
## [7] 0.26371308

As TP is the transition probability between states, I can find the number of people in each state at any time t using the following formula:

\[u^{(n)} = uP^{(n)}\]

From the TP, the frequency in the first stage is compared with the actual frequency at the 1st stage.

## # A tibble: 7 x 3
##   Class2 actual_frequency predicted_frequency
##    <dbl>            <dbl>               <dbl>
## 1      1           0.354               0.236 
## 2      2           0.160               0.0567
## 3      3           0.0854              0     
## 4      4           0.0538              0     
## 5      5           0.0475              0     
## 6      6           0.0285              0     
## 7      7           0.270               0

Similarly for 3rd stage

## # A tibble: 7 x 3
##   Class4 actual_frequency predicted_frequency
##    <dbl>            <dbl>               <dbl>
## 1      1           0.395               0.409 
## 2      2           0.134               0.153 
## 3      3           0.0717              0.0787
## 4      4           0.0612              0.0458
## 5      5           0.0316              0.0360
## 6      6           0.0338              0.0285
## 7      7           0.273               0.249

I can observe that a Markov chain can be used to find the number of people in each state after each stage. So the probability of people in each class after n steps is given below:

## [1] "n = 1"
##              1         2          3          4          5          6
## [1,] 0.3984503 0.1437229 0.07813888 0.04827924 0.04138312 0.02722729
##              7
## [1,] 0.2627984
## [1] "n = 14"
##              1         2          3          4         5          6
## [1,] 0.4260002 0.1610742 0.08392567 0.05113945 0.0385104 0.03005181
##              7
## [1,] 0.2092984
## [1] "n = 34"
##             1         2          3          4          5          6
## [1,] 0.427639 0.1618467 0.08442764 0.05152099 0.03886892 0.03040108
##              7
## [1,] 0.2052957
## [1] "n = 39"
##              1         2          3          4          5        6
## [1,] 0.4276527 0.1618532 0.08443183 0.05152418 0.03887192 0.030404
##              7
## [1,] 0.2052623
## [1] "n = 43"
##              1        2          3          4          5          6
## [1,] 0.4276567 0.161855 0.08443306 0.05152511 0.03887279 0.03040485
##              7
## [1,] 0.2052525
## [1] "n = 51"
##             1         2          3          4          5          6
## [1,] 0.427659 0.1618562 0.08443378 0.05152566 0.03887331 0.03040535
##              7
## [1,] 0.2052467
## [1] "n = 59"
##              1         2         3          4         5          6
## [1,] 0.4276594 0.1618563 0.0844339 0.05152575 0.0388734 0.03040544
##              7
## [1,] 0.2052457
## [1] "n = 68"
##              1         2          3          4          5          6
## [1,] 0.4276595 0.1618564 0.08443393 0.05152577 0.03887341 0.03040545
##              7
## [1,] 0.2052456
## [1] "n = 82"
##              1         2          3          4          5          6
## [1,] 0.4276595 0.1618564 0.08443393 0.05152577 0.03887341 0.03040546
##              7
## [1,] 0.2052455
## [1] "n = 87"
##              1         2          3          4          5          6
## [1,] 0.4276595 0.1618564 0.08443393 0.05152577 0.03887341 0.03040546
##              7
## [1,] 0.2052455

I can observe that as ‘n’ is increasing, the frequency of customers is tending towards constant. This constant is called steady state frequency. The steady state probability is :

steady.state.prob <- steadyStates(CLT.mc)
steady.state.prob
##              1         2          3          4          5          6
## [1,] 0.4276595 0.1618564 0.08443393 0.05152577 0.03887341 0.03040546
##              7
## [1,] 0.2052455

The probability change with steps can be visualized as below:

Customer Lifetime value

Now that I established that the customer segments can be represented as a Marcov chain, I can compute the customer lifetime value.
The monetary value at each state is

revenue_in_states <- filtered_data %>% left_join(final_classes, by = 'CustomerID') %>%
  filter(InvoiceDate < date('2011-07-01')) %>%
  dplyr::select(Class1, total_sales) %>%
  group_by(Class1) %>%
  summarise(avg_revenue = mean(total_sales))
kable(revenue_in_states, align = 'c', caption = 'States after steps') %>% 
   kable_styling(full_width = F)
States after steps
Class1 avg_revenue
1 56.51656
2 56.17425
3 56.45380
4 44.33256
5 46.58667
6 49.00257
7 52.82802

The steady-state retention probability is given by \[ R_t = 1 - \frac{\pi_0(1-P_0)}{1-\pi_0}\] Where \(P_{00}\) is the transition probability of the null state (State 7), and \(\pi_0\) is the steady state distribution of the Null state (State 7). Substituting I get:

## [1] 0.9679608

Similarly Customer lifetime value for 5 periods is given by: \[ CLV = \sum_{t=0}^{5} \frac{P_I×P^t×R}{(1+i)^t} \] where PI is the initial distribution of customers in different states, P is the transition probability matrix, R is the reward vector (margin generated in each customer segment). The interest rate is i (discount rate), \(d = 1 + \frac{1}{1+i}=0.95\) is the discount factor.
Substituting, I get:

CLT <- 0
for(k in 0:5){
  CLT = CLT + (0.95^k)*(initial_freq %*% (CLT.mc@transitionMatrix^k) %*% revenue_in_states$avg_revenue)
}
CLT
##          [,1]
## [1,] 484.3484

Therefore the customer lifetime value is 484.34.


Created by Achyuthuni Sri Harsha
References:
1. Business Analytics: The Science of Data-Driven Decision Making Available
2. Introduction to probability, Charles M. Grinstead and J. Laurie Snell, Available
3. CS294 Markov Chain Monte Carlo: Foundations & Applications (lecture by Prof. Alistair Sinclair in Fall 2009) Available
4. Computer Science Theory for the Information Age, Spring 2012 (course material) Available
6. Customer Analytics at Flipkart.com Available
5. IIMB BAI Class notes and practice problems

One Reply to “Customer Lifetime Value”

Leave a Reply