Customer Churn Modelling

Published by Scott Jenkins on

In this post, I describe how the mathematics of modelling the spread of disease can be applied to the problem of the transition of customers between RFM segments. I use SciPy’s ODE numerical routine in Python and fit models to synthetic data. 

Context:

Covid-19 has blighted countries and industries across the world. There seemed to be so little time to prepare and the rise of the use of the word ‘unprecedented’ was unprecedented. I understood in part already about the reproduction number, R, as a measure of a disease’s ability to spread. It’s the average number of people that each infected person will pass the disease onto, and it governs the dynamics of how a population transitions through stages of susceptibility, infection and recovery.

virus

I was interested in learning more of the theory of epidemiological modelling and wanted to build my own models in python. My research began with SIR.

SIR Modelling:

Susceptible, Infected, Removed. These are the 3 states an individual can brave or endure in an epidemic. Individuals are initially at risk of catching the disease (Susceptible), may then become infected, and then either recover or become deceased. SIR Modelling aims to quantify how the population will transition between the 3 states.

SIR models are underpinned mathematically by a system of differential equations, which govern the flow of the population between states. Given these equations, building a model is straightforward with the odeint method in python. Fix parameters beta and gamma, set the initial conditions (how many in each group at time zero), and define the time steps to model. 

In the following code, S,I and R are arrays describing the number of susceptible, infected and removed people over time. N is the total population size, which we assume is fixed. Gamma is the length in days that an infection lasts, and beta is the average number of people each infected person infects per day. Social distancing measures aim to reduce beta, and in doing so, lower R, calculated as beta/gamma. 

In the graph below, I model the spread of a disease under the following conditions:

  • Population of 66 million people.
  • 100 people initially infected. 
  • The infection lasts 3 days.
  • On average, each infected person infects 1 other infected person per day.
beta=1

Beta = 1 

In the next graph, I have reduced beta to 0.6. All other parameters remain the same.

beta=0.6

Beta = 0.6

The above gives a clear example of how reducing beta ‘flattens the curve’. Basic SIR modelling understood, I wanted to apply these ideas to build a customer model.

Customer Modelling:

Custom CRM models may have dozens or hundreds of ways of defining customers, through RFM modelling, channel(s),categories or demography for example. Defining customer segments is another topic which I’m not going to go into here, but the model could be extended to cater for this. 

Instead, I’ll focus on demonstrating a customer churn model with 4 segments described below. Then, I’ll state the assumptions in building my models, display the output with an example set of parameters and finally fit parameters to a set of synthetic data. Even with the (heavily) manufactured data, the model won’t be perfect: All models are wrong, but some are useful.

To move away from the sinister linguistics of disease, let’s begin with some terminology. Here are the 4 segments I will consider in my model. The arrows on the chart illustrate how people can move between the segments.

  • Uncapitalised: people who haven’t shopped with us and wouldn’t consider it. 
  • Prospective : people who haven’t shopped with us, but would consider it.
  • Active: people who currently shop with us.
  • Lapsed: people who used to shop with us, but haven’t done so recently.

UPAL

States defined, I have chosen to model with a month between time steps. Explicitly, t = 1 makes predictions for 1 months time, t = 2 makes predictions for 2 months time, and so on. Here are the 3 main pieces of logic I am using to build my models.

Logic 1:

Total population remains constant. I’m confident that the birth rate and mortality rate will not significantly impact any output.

Logic 2:

The rate of movement out of each segment is proportional to the current population of the segment.

Logic 3:

The probabilities of moving between segments are time-independent

Pragmatically, the above logic allows us to unique define a model with:

  • A transition matrix which contains the probabilities of a customer moving segments in any given month.
  • The initial split of the population into the segments.

In the UPAL model, there are 5 parameters in our transition matrix, one for each of the arrows between segments (3 arrows, 2 of which are bidirectional). I’ll imaginatively name them up,pu,pa,ap and la where the first letter of each notates where people are moving from and the second letter notates where people are moving to.

UPAL Transition Matrix

Filling in the missing spaces is easy, since each row must add up to 1, and we know which segment pairs are not legal moves.

Transition Matrix

Mathematical Aside: rather than solve the SIR equations analytically (which was only first derived in 2014), we have set up a Markov Chain with the transition matrix defined above, and use a numerical method (e.g. Euler or Runge-Kutta) to step through the calculations

Putting this all together, I wrote the following code.

Model_Code
It is currently set up to model this transition matrix:

T

When run, it outputs the following graph:

Churn Graph

So far, we’ve plugged a transition matrix into the model and read off the output. Now, I’d like to go in the other direction. Given a dataset, the task is to fit a transition matrix to it.

Modelling with Synthetic Data:

I jumped into excel and produced the following series of customer population data. Fundamentally, I used the version of a transition matrix with parameters 0.20, 0.15, 0.40, 0.20 and 0.10, but with random noise added at each step.

Noisy_Data

With only 5 parameters I was quickly able to run grid search on the below array of values, recording the RMSE between the model and the actual values. Naively, testing p parameters across v values results in v to the power p models. More complicated models would require far more computing power.

vals

I kept track of the lowest RMSE as the model which best fit the data.

RMSE
We’ve now worked through the problem in both directions. I’ll stop here, and close with a couple of directions to motivate further extensions.

Further Thoughts:

So far, I’ve defined my transition matrix to be time-homogeneous, though I would be surprised if this assumption held. Population behaviour may be influenced by seasonality, buying strategy, marketing campaigns, and at a local level, the opening or refitting of stores. Dynamic matrix values may better capture this intricacy. The values may exhibit a relationship to marketing spend, or to customer reach.

Estimating Customer Life-time Value (CLV) can be broken into the problem of understanding customer behaviour within each segment, together with predicting customer trajectory through the segments. Running simulations on the above model helps with the second part. Overlaying customer spend estimates could give a robust tool for modelling CLV. 

Until next time,

Scott

 

Categories: Learning