Training machine learning algorithms for ecommerce
The ecommerce machine learning algorithm – up close and personal
Over the past few years, the range of tools that allow you to leverage machine learning (ML) for ecommerce has grown exponentially. At Qubit, we leverage the power of artificial intelligence (AI) and machine learning to drive meaningful personalization, at scale. In this article, we’re going to take a close-up look at what’s required to leverage new technologies such as machine learning (spoiler: without data, it’s not going to work!) and how Qubit is continuously investing in opening up the power of ML to our customers.
First of all, let’s look at the difference between artificial intelligence, machine learning and deep learning.
Artificial intelligence is the overarching concept that involves machines which can imitate human behavior.
Machine learning is a way of achieving AI. It describes the ability to learn without being explicitly programmed. It is a way of training an algorithm so that it can learn how to make decisions.
Deep learning refers to a specific class of machine learning algorithms, where algorithms are stacked together in “deep” layers. Most commonly, this stacking is done using artificial neural networks.
When looking at how ML can support your efforts in providing a more personalized experience to your customers, it’s important to clarify the different types of objective that can be achieved.
There are four major areas where machine learning can provide value:
- Scale out customer understanding, by slicing and dicing them automatically into valuable segments.
- Predict customer preferences. This is usually the objective that is covered by Product Recommendations, but has the potential to be used in many different ways.
- Predict customer intent
- Predict customer value
At Qubit, we believe there are specific use cases and ways to achieve each of these objectives by creating personalized and targeted experiences for consumers, which is why they are the areas we prioritize.
A great example of predicting customer preferences is our real-time category prediction model, which based on real-time visitor behavior and gives us a list of categories we predict a visitor will most likely prefer to interact with next. This is being used in Qubit Aura, our mobile discovery tool, to increase engagement. The same model could also be used to generate product recommendations or tailor email campaigns.
Why is investing in ML important?
Personalization is about understanding your visitors and using that information to make informed decisions that create a personalized experience. Qubit collected 340bn events across our customers last year to get a comprehensive picture of consumers’ shopping habits, which can be used for personalization and segmentation.
The volume of data necessary to make the decisions on how to proceed is vast, and filtering, analysing and using this data becomes harder and harder—and is increasingly beyond the capacity of humans to process in a timely way. But the data is important, so automation and ML are the next steps in personalization, overseen and directed by marketers and merchandisers, but opening up possibilities for scale and insight that have never been seen before.
How do you train an ML model?
Training a machine learning model is done by taking a chunk of data, splitting it up and “feeding” a subset of it to a model. Then you compare the model’s output against the real result of the other chunks of the data. And then repeat over and over, until the model’s accuracy is at an acceptable level.
To get the most accurate model, you need three things:
The first requirement is a dataset to train the model, which comprehensively captures the variation in the phenomenon you are trying to capture. The more variation, the larger the dataset needs to be to train the model sufficiently. One of the most famous examples is Google’s Inception model for image recognition – this was trained on 10 million images.
For personalization, the dataset is usually user behavior. Because user behavior is so variable, more data is required.
The second requirement is a set of ML algorithms where the performance has demonstrably been shown to scale with data. Qubit’s data platform enables us to collect all of the user behavior data in a structured way, which allows us to continuously improve our models for them.
The third requirement is a regime where the distribution of the data in the training set at least approximately matches the distribution of data in the real world. That is, the data you have on a phenomenon need to be a fair representation (albeit at smaller scale) of the patterns in that phenomenon in real life… for example, the data you have about the products sold online needs to be an accurate representation of the real-world data of all products sold online.
All together, this means you need enough of the right data, and a model that works with that data on a large scale.
How is Qubit investing in ML?
Right from our very beginnings, we made the decision to be serious about data. Serious about the infrastructure which enables our customers to collect all the different types of information that were valuable to their businesses. This information has now become the foundation and the fuel for machine learning, in both type, structure and volume.
Some of the different types of data we use to feed our machine learning models are:
- Generic user characteristics: decide type, operating system name, time of activity
- User behavioral information: time on site, average price of products viewed
- Product catalog characteristics: most popular products, increasingly popular products
- Product information: product name, product price, product description
- User and product interaction: user-product co-occurrence, time in product
The more of this data is available, the more accurate and valuable the output of machine learning can be.
To make sure we have the right quantity and quality of data, we have collected over 8 years’ worth of ecommerce data and taken the burden out of “preprocessing” (the various steps you go through to standardise and clean up the source data) by standardizing our data model across our entire customer base, tweaking the schema for each vertical. This means that any data point we collect, regardless of channel is of a known type and structure.
Structured, clean and consistent data is like gold dust for machine learning. It means we can develop a model once, and retrain it across all of our customers without having to start again from scratch, making us faster to build, test and scale our models. Qubit Aura is a great example of this: we can turn on the category preference model used to drive the navbar for any of our customers, because they all use the same standardized data data attributes used to train the model.
Putting ML models to work.
To build and train a machine learning model, you need the right volume of data, the right data structure and the right algorithms. Using it effectively means working at scale (because it is for the things too big or too fast for humans to deal with, reaching all of your customers with personalization in real time). And to do that, you need a pipeline that can collect and provide that data at scale and the right infrastructure to harness the power of these models.
To understand more about how we’ve harnessing ML to deliver 1:1 personalization in Qubit Aura, download our new guide, Rethinking product discovery on mobile.