Wang, and Yeung 2014) and autoencoders (Sedhain
et al. 2015).
Our system is built on TensorFlow (Xu et al.
2016), an open sourced version of Google Brain
(Dean and Corrado 2012). Our model learns
approximately 3 million parameters and is trained
over millions of customer transactions.
This work is concerned with the first part of
IKEA's recommender, namely the one that captures
the context and generates lists of candidate products.
It is structured as follows: We present a brief, bird's
eye view of the system in Section 2. Section 3
describes the reasoning and process behind data pre-
processing. Finally, Section 4 illustrates the
inspiration and architecture of the model.
2 SYSTEM OVERVIEW
Figure 1 depicts a high-level overview of the IKEA
recommender system. It consists of two discrete
components; the data pre-processing unit and the
component responsible for producing
recommendations. The customer shopping cart and
location are also sketched, as two other sources of
input.
Figure 1: IKEA skip gram context-aware recommender
conceptual diagram. Data are retrieved by the operational
database and fed to the neural network through the data pre-
processing unit. In real-time the customer's location and
shopping cart are fed as extra input to the model.
Initially, we feed the customer's historical data,
stored in the database, in the pre-processing unit. The
unit must shape them in a form that the neural
network accepts. Moreover, it logically transforms
the data so the network can discover intricate patterns
behind customers' purchases and behaviour (more on
this in Section 3).
The training of the algorithm is done periodically,
offline. In real-time, we feed the customer's location
and shopping cart into the model. By the time a
customer adds something to the shopping cart, or
moves to a new in-store location, the system produces
candidate items. The primary job of the algorithm,
that is in the scope of this paper, is to create a list of
items, that might be of interest to the user and are
located nearby. During the next step a ranking
algorithm sorts these candidate items to create
personalized recommendations. This will be
addressed in a following publication.
The candidate generation algorithm is inspired by
the work done in language models (Collobert and
Weston 2008; Mikolov et al. 2006, 2013),. Its job is
to find the correlations between the items that
customers choose together, as well as the reasoning
behind these purchases (e.g., same color, brand, style,
etc.). In Section 4, we present the model architecture
and logic in detail. During training, we make use of
offline metrics like precision, recall, and ranking loss,
but the real value of a recommender is not only to
predict the held out data in a test set, but also to
discover new items that might be of interest to the
customers, even if they were unaware of their
existence. Thus, we can only draw a safe conclusion
using specifically designed A/B tests in production.
3 DATA PRE-PROCESSING
To take advantage of the work done in language
models, and especially the word2vec notion
(Goldberg and Levy 2014), we need to transform the
products in a way that they can be viewed as words in
a document. This transformation would permit us to
learn informative distributed vector representations,
for each product, in a high dimensional embedding
space. To achieve this, we need a data pre-processing
component. We cannot convey this as feature
engineering because the features that fed into the
algorithm are still raw product IDs.
We view the customers' purchases on a specific
visit, i.e., on a unique date, as a set of purchased
products 𝑃, whose elements are taken from a set of
items 𝐼, which encloses every product. If we assume
a set 𝐶, that contains every product category, we can
filter the set 𝑃 into distinct product categories, thus
creating subsets𝑃
𝑐
, such as 𝑃
𝑐
⊆ 𝑃, that consists of
what each customer bought on a specific date,
grouped by product category. We treat the resulting
sequences of transactions, i.e., the elements of 𝑃
𝑐
, as
"purchase sentences", where each product 𝑖 ∈ 𝐼 is a
"word" composing the "sentence". This way we can
compose "chapters" for each category, that in the end
concatenate into a "book" or "document" to train our
model. Figure 2 depicts the process.
ADITCA 2019 - Special Session on Appliances for Data-Intensive and Time Critical Applications
444