Defining your Data Question
In this post, I write about the importance of asking the right question when conducting data analysis.
What products are bought with rugs in-store or online?
Answer: All of them.
This seemingly innocent question got the team talking a couple of weeks ago. An email from one of the commercial teams gave the above brief for us to explore and then report back with actionable insights. For them, perhaps, a simple request. For us, a fine example of an query lacking direction.
Having completed my graduate rotation in PPC a couple of weeks ago, I now work within the recently formed Insight & Analytics ‘Data’ Team. This work ties in really nicely with my Decoded Data Science apprenticeship and is a good fit for my mathematical background.
I’ve written before about the importance of defining the problem, in the context of improving our material ordering calculation. Now this attitude is even more valuable. The team have swaths of data dating back several years. Some of it belongs to us: transactional data, web analytics data, product data, store data, supply chain data, marketing data, financial data, whilst other sources are external: market data, or weather data for example. The priority is not gaining access to information but knowing which information to use to add the most value to the business. Knowing what information to use only comes once the aim is known.
What products are bought with rugs in-store or online?
Straight off the bat, we knew that this was a problem that could be answered using association rules, but before we threw the Apriori algorithm in the mix and output a list of rules together with their confidence and lift values, we paused to think.
Two questions bubbled up from this moment of thought: What does the question mean, and why are we trying to answer it?
Breaking down the question, it was clear that there is some ambiguity. In just a couple of minutes, we were challenging the request, hoping to better understand how to approach the problem.
- What timeframe were we considering? Were we interested in products in the same basket at the rug, or products bought within a sufficiently short period after purchase?
- At what level were we approaching the problem? Were we interested in which categories are bought together, or which SKU’s?
- Which rugs were we interested in? Our best sellers? Those part of a wider décor range? Rugs with low levels of sales? Promotional products?
- Were we concerned with the frequency of which products are bought with the rugs, or were we happy to output a binary yes/no value?
In order to answer these questions, it is useful to know what are we hoping to achieve with this information? This way, we can ensure that we present our analysis in the most helpful way possible. Some of our suggestions included:
- Improving product recommendations online
- Adjusting / testing store layout, with commonly bought products position near the rug department
- Offering bundles or special offers on these products
- Sequencing these products together in our email marketing campaigns
- Laying out our distribution centres such that these products are easier to pick together
- Deciding which products to feature together in our TV advertising campaigns
- Using these relationships to inform similar product buys for next year’s ranges.
And so, armed with our list of questions to infiltrate the ambiguity, the next step was to sit down with the team to discuss what we would work towards and agree on our approach.
Spoiler: Apriori was released from the cage shortly afterwards.
Until next time,
Scott