A report published by McKinsey in November 2021, based on its own research, noted that 71% of users expect a personalised experience when browsing the internet1. In particular, they expect to be shown the right content at the appropriate point in their shopping journey. This principle forms the cornerstone of contextual commerce and marketing, whose overall goal is to increase the conversion rate. However, contextualisation requires an impressive amount of data collection if the online retailer is to make the most accurate product or content recommendations possible. This is the key role played by recommendation engines, which use artificial intelligence and machine learning to perform this complex task.
What is a recommendation engine?
A recommendation engine is a tool that uses artificial intelligence and machine learning to automatically suggest content or products to internet users. In addition to being automatically generated, the recommendations made are also dynamic and personalised through daily or even real-time feeding of large quantities of user data into the engine. Its performance therefore depends on its ability to process this data.
Discover our White Paper on Contextual Commerce
How does a recommendation engine work?
The process of recommending online content is based on a series of steps.
Data collection
A recommendation engine collects 2 types of relevant data to build the user profile:
• Implicit data: This is data the user does not intentionally provide, which is collected in the course of their internet activity. This includes, for example, pages visited, clicks, words used in a search, and purchase history.
• Explicit data: This is information provided by the user. It can include "likes" on social media, reviews of purchases, content published by the user on the internet and exchanges with other users.
Data storage
The performance of a recommendation engine depends in part on its capacity to store the data it collects. The engines rely on extremely powerful technological building blocks, which alone are capable of dealing with the three dimensions of Big Data2 :
• Volume: The engines store gigantic volumes of digital data
• Velocity: They can cope with the enormous speed at which data is generated
• Variety: They store a growing diversity in the types of data stored.
Volume, velocity and variety are the 3 essential aspects of Big Data
Data analysis
The recommendation engine then analyses the data that has been collected and stored. It follows a methodology, a predetermined set of rules, established in line with the strategy adopted by the company that owns the website. Approaches differ depending on many parameters: the goals of the recommendation, the nature of the site, the type of information collected and analysed, etc.
There are 3 common approaches, also called filters, namely:
- collaborative
- content-based
- hybrid
Collaborative filter: A method that consists of finding similarities between the data histories of a panel of users and those of the current user. This approach is based on the assumption that if there are similar preferences for certain items between the panel and the target user, then the probability that the target consumer will like other items the panel has selected is high. These recommendations are based on data collected automatically, without any analysis of its characteristics.
Content-based filter: This method assumes there is a high probability that the target user will like features similar to ones they have liked in the past. It involves a two-pronged approach that analyses the data from both the e-commerce’s site and the consumer. This data could be, for example, the features of products in a catalogue, for an e-commerce site, or the subject matter covered, for an information medium. In the case of the customer, a profile of the target user is generated by cataloguing their preferences based on the attributes contained in the particular site’s data. Here, the recommendations are made in line with matches identified between the features of the data referenced and the user's tastes. These recommendations are drawn from the user's past behaviour, without reference to data on other visitors.
Hybrid filter: A strategy that combines the two previous types of filtering. The target visitor's preferences are therefore cross-referenced with community data, in other words with the attributes of the catalogue items as well as those suggested by other recommendation models.
Filter bubbles and serendipity
Filter bubble is the term applied to the phenomenon affecting a user when an algorithm only retrieves items and information similar to their own preferences. This creates a risk of the user being locked into a single, closed environment, without exploring other spheres of life and ways of thinking. Many have raised strong concerns about the ethics of doing this, since the algorithms carry out the selection in the background, without the control or consent of the user – except in so far as they have agreed to let their own data be used. Hence the need to introduce random content, or what academic studies call serendipity, into recommendation engine algorithms.
Serendipity is "the fact of finding interesting or valuable things by chance"3.
Algorithms can also leave room for chance, by deliberately proposing random content that falls outside the user's area of close or known interest.
Introducing a combination of known user interests and random content allows recommendation engines to function in a way that is both optimal and ethical, as customers are not merely trapped within the bubble of what is familiar to them and are offered the opportunity to make broader choices that are also much better informed.
Recommendation engine algorithms regularly introduce random content to avoid filter bubbles for users.
What data does a recommendation engine rely on?
Socio-demographic data
Collecting socio-demographic data on customers is an essential step for recommendation engines. Age, gender, cultural background, language, family situation and professional background influence not just internet browsing behaviour but also purchasing behaviour. In e-commerce, for example, 30% of men make at least one purchase per week compared to 24% of women. Similarly, millennials account for almost 35% of online shopping, compared to 30% for Generation X and only 15% for baby boomers4. These socio-demographic characteristics also influence interests and categories of items purchased. For example, a Eurostat study found that 16–24-year-olds were the age group that bought the most clothes online, followed closely by 25–54-year-olds. On the other hand, the over-55s are among the biggest buyers of furniture and home furnishings5.
Personal data
Along with socio-demographic data, the collection of personal data by recommendation engines, also known as feedback, is essential for creating a consumer profile. This comes in two main types: explicit feedback, where the customer has willingly volunteered information about themselves, their likes or dislikes; and implicit feedback, generated by observing the shopper’s behaviour, such as number of clicks or time spent on the page. The collection and cross-referencing of these two types of data therefore narrows down and fine-tunes the scope of the content to be recommended, as well as the appropriate time and place for doing so. This is how Amazon's product recommendation functions, for example. The algorithms analyse users' wish lists, the products they are looking for, the time they spend on each product page and their purchase history. As a result, the platform's product suggestions are often relevant.
But it is social media that give recommendation engines access to an exceptional volume, variety and accuracy of user data6. This includes the content viewed and shared by internet users, but also by interactions and exchanges with other users. Social media algorithms precisely define the interests and demographic profiles of each visitor. This means that social commerce recommendations are extremely well-targeted and effective.
Databases and transactions
The contents of databases and various types of transaction history are analysed and ordered to optimise content recommendation. Product characteristics stored in the CRM (Customer Relationship Management), PIM (Product Information Management) and DMP (Data Management Platform) databases are used by e-commerce sites and marketplaces. These product attributes allow the algorithms to create relevant links during the recommendation process. For example, an e-commerce site for kitchen items will suggest a milk frother to a user who is searching the product page for a coffee machine. In addition, these product features are supplemented by transaction histories such as popularity with users, bestsellers, selections of other users’ “also-bought” choices and customer reviews. In the same way, content platforms analyse their visitors’ interactions with their content, such as the frequency of those visits and interactions, reviews posted, time spent reading, etc.