Classifying products and improving margins through the industrialized analysis of cash receipts

ActiveViam |
October 31, 2019

This article presents elements of a study carried out by the ActiveViam Data Lab. The ActiveViam Data Lab is composed of data science experts who rely on the latest technologies in this field or develop their own tools to help companies answer complex and interdisciplinary issues.

Through the analysis presented here, we wanted to propose some credible and concrete ways to solve one of the recurring problems encountered in our customers:
How to create a classification system for their products that is not based only on obvious characteristics but rather takes into account customers’ expectations, all while managing tens or even hundreds of thousands of items? How can the client use these results to improve its pricing and merchandising strategy? How can they price new products for which there is no sales history or comparison with competitors? The Data Lab has tried to find an innovative approach to these questions.

The Data Lab’s approach

The premise of the Data Lab is that modern data science tools, capable of automatically and rapidly analyzing data volumes on a industrial scale can replace or supplement long and costly qualitative or statistical studies through the exploitation of a treasure available to all distributors: cash receipts.
For this use case, we relied on data from the retail drug store industry, a vertical characterized by the particularly large range of product references to manage, and the difficulty for retailers to compare each products’ price to competitors, because of the large number of packaging variations, quantity, brands, etc. We have worked on tens of millions of anonymized receipts, representing about three months of a leading retailer sales.
Our data engineers created and tested an original classification algorithm. They relied on tools they have developed themselves, which allow interactive and instant visualization of data within the Python notebooks. The model was later run on data samples from other verticals in order to validate its reliability and relevancy.


The first layer of the algorithm classifies the products and assigns them a score based on whether they are purchased:
• Always with at least one other product (score of 0)
• Always alone (score of 1)
• In between (from 0 to 1)
The second layer goes one step further: it ranks products more or less high on a scale depending on whether they are most often purchased
• With a wide range of other products (lower)
• With a limited set of products (higher)

classifier les produits tickets de caisse

In this illustration of how the algorithm works, product A was purchased 12 times, 4 times alone, 4 times with product B, and 4 times with product C. Product C is never purchased alone.
Product D is purchased 10 times, 8 times alone and 2 times with another product, product C.

The third layer combines the first two indicators to define the following three categories:

classifier les produits tickets de caisse
• “Driver” products (in green): These products are bought sometimes alone, but more frequently with complementary products – in the example, product A is one of them. These are the products that bring customers to the store. Their prices are more likely to be compared and their absence from shelves can lead customers to leave the store without any purchase. Some examples in drug stores include baby diapers or shower gel.
• “Complementary” products(in red): in the example, they are products C and B. These products are rarely bought alone, most of the time they are instead purchased in addition to Driver products: either because they are purchased on impulse or because their usage is complementary.
For instance, sponges, purchased with shower gel.
• “Independent” products (in yellow): They are bought alone most of the time – in the example, product D. In drug stores this is the case with exclusive beauty or health products for instance.

An almost instant Return On Investment

During this study, most of our data scientists’ time has been spent conceptualizing the relevant indicators and then analyzing the results. The modern tools they used (and in some cases developed), allowed data loading, visualization and calculations to take a total of just a few minutes, even on such a large volume of data. They were able in this way to draw actionable insights that can be leveraged through several optimization strategies with an almost immediate return on investment:
• Increase the value of the average basket
By increasing the margin of the complementary products, for example with a maximum + 20% price cap compared to the competitors, a brand can retrieve several hundreds of thousands of euros of additional margin without damaging their price-image
• Improve the price-image to develop the volumes
At equal gross margins, by lowering the price of sensitive products and by increasing the price of complementary products, the brand can be made more competitive in the eyes of customers
• Sell additional services
Having identified the primary customer behavior associated with «independent
products,» the retailer can focus its upsell efforts for these items on the sale of services and warranties associated with these products, rather than other products.

As an extension of this analysis, we can imagine several ways to push further optimization:
• Replicate the method at the level of each individual drugstore – we are likely to find different customer behaviors between suburban and downtown stores, for example.
• Identify and classify «Sensitive product + complementary products» groups intracategory and cross-category to improve merchandising
• Cross-reference these elements with data from the retailer’s e-commerce website to further refine results. Online, we can see for example what products are often added first in a basket, which ones are rather seldom added first and finally which ones are often alone in the basket

Towards predictive pricing for new products and marketplaces

This experiment shows how data science can bring valuable, immediately actionable insights to optimize retail pricing.
Combined with a flexible pricing engine able to integrate such results into price rules, data science will allow a predictive rather than reactive approach to pricing. By identifying the relevant product attributes related to a specific buying behavior, and classifying products according to those results, we can set prices for new products or for marketplace products without having to rely on sales history or competitive alignment rules.

Like this post? Please share it on your socials

Schedule a demo


Sorry! We were unable to process your
request. Please try again!


Your request is submitted successfully!
We will keep you up-to-date.