Dark Matter and the Online Halo
Recently, I have been thinking about the Online Halo: the idea that having a physical retail store can increase local online sales through higher levels of brand awareness and purchasing intent. In this post, I explore ideas to measure the impact of this effect, and on how to attribute online sales to stores.
Why Explore the Halo?
Homeware retailers such as Dunelm will not hold full inventory of all products and through the lens of allocating store space, understanding the online halo is important for a couple of reasons. Appreciating the impact of the halo has significance in determining where new stores should be opened, on guiding a shift from transactional to inspirational store space, and on empowering store managers with knowledge of the impact that their store trading has on the growing e-commerce sales contribution.
As to whether the halo exists in the first place, a study by the International Council of Shopping Centres (ICSC) last year published this report with supporting evidence of increased web sales around the time of new stores opening.
Dark Matter
I’d like to make an analogy between the online halo effect and dark matter. Dark matter is composed of particles that do not absorb or reflect light, and so can not be seen directly. We only know that dark matter exists because of the effect it has on objects that we can observe. In a similar vein, we can’t accurately measure the impact of the online halo in a static environment of equilibrium. It is only when new stores are opened, when old stores are closed, or when new housing developments add population to a store catchment that the value of the online halo can be properly assessed. This adds complexity to the problem; it’s mostly one of unsupervised learning.
Approach:
From here, I explore two methods which companies could use to determine the value of an online halo. From a data and analytics standpoint, the first is lends itself to use of a decision tree structure and the other may be a good product of the logistic regression algorithm. Respectively, the methods have discrete and continuous solutions to the order allocation problem.
Binary classification of online orders to a store
In this scenario, I ask not how to attribute an order to a store, but whether it should be attributed at all. If a customer lives 50 miles from their nearest store, has never visited a Dunelm store, and has only heard of us through sponsorship of This Morning on ITV, their online transactions should not be attributed to a store. If a second customer visits their local Pausa coffee shop every Wednesday and buys a chest of drawers online for ease of delivery, then the transaction should be attributed to store. Each scenario in between these states is less clear cut.
In this model, the aim is to produce a list of rules to classify orders into ‘Allocate to nearest store’ and ‘Do not allocate to Store.’ As a basic example:
Is the order placed within 5 miles of a store?
If Yes, allocate the order to the closest store.
If No,
Does the order contain furniture?
If Yes, allocate the order to the closest store.
If No, do not allocate the order to a store.
Rules Upon Rules
As seen in the example, rules can be layered up to consider a number of different features with a fine balance of complexity and accuracy. Then, each time a new store is opened, a new store closes, the change in online sales can be compared to the existing rules, which in turn are updated subject to the new information.
Some possible rules could be based upon:
- Department: What is the HD participation? If low, then do not attribute to store.
- Drive Time between delivery postcode and store. If short, allocate to store.
- Online exclusivity: Don’t allocate these products to Store.
- Order basket size
- New or returning customer
- Product availability in store
- Product dimensions or delivery cost as a fraction of item value
- Time of Day or Day of Week
- SKU basket abandonment rate online
- Number of stores within an x mile radius
- Whether the local store is on a retail park, a stand-alone store, or a high street store
- The HD participation of the postcode.
How to set the rules? Compare online transaction patterns from before a store was opened to after it had opened. How did it shift? The features are chosen as a result of this analysis.
The danger of adding more features, especially in a problem which is difficult to measure is overfitting: building a model which over complicates the solution, enhancing current accuracy at the demise of future accuracy. In a nutshell, a solution which fits well on the current data, but performs poorly when measured against additional data.
Rinse and Repeat
With the rules set, every order can be attributed to a store, or not. But so far, it’s still dark matter, since the rules were built on historic data – a snapshot of the online halo in previous years. We won’t know immediately whether the rules chosen are the right ones for the present day. It will need to be tested: find an area where a new store is planned and analyse the current HD transactions. Then, once the store is opened, analyse again and compare. Does the model put too much emphasis on department, or day of the week? Then remove these features. Does the number of departments in the order have a large impact in whether customers buy online or in-store? Then add this rule into the mix. This becomes an ongoing learning exercise. The logic is similar when assessing the impact of competitors opening new stores.
Regression modelling: allocating fractions of an order
An alternative view is made if dropping the assumption that each order is either fully allocated to a store, or it isn’t. In a probabilistic sense, each online order was influenced by the store with a probability of between 0 and 1. We will never be sure but can run regression on the above features to attain an estimate. Using the rules determined above to label each order as ‘Store Attributed’ or not ‘Store attributed. We can run logistic regression on the order data. This will output a set of regression coefficients, which can be interpreted as the feature importance’s: a measure of how much each feature determines the likelihood of an order being attributed to a store.
With the model built, each order is run against it with an output of between 0 and 1, the fraction of the order to attribute to store.
Why do this? It gives another model for comparison. From an interpretability perspective, I don’t think it adds much.
Summary
This is an interesting area. The accurate measurement of online as a result of offline is foundational to improving omnichannel performance. No doubt, I’ll have more thoughts on this once I dive into the data.