information (Trattner and Elsweiler, 2017). As in
everyday life, they are also a helping hand in e-
commerce and in classic commerce so the businesses
can bring their products as close as possible to the
needs of their customers (Schafer et al., 1999).
Recommender systems in the food industry are
becoming more relevant over the years and the
demand for them is increasing (Trattner and
Elsweiler, 2017).
In this paper, the focus is put on a single person’s
lifestyle and eating habits. By using multimodal
analysis of the users-recipes interactions the
recommendation models will be able to predict
possible future food choices, and with comparison
between the two, we will be able to tell if the results
acquired are similar or a complete match.
With the ubiquity of the Internet, individuals tend
to share everything in their lives nowadays, and food
recipes are not an exception. Currently, there are
many cooking websites, few clicks away from us on
the Internet, which provide various recipes and food
content (e.g., description, meal photos, cooking
videos, how-to guides), as well as useful functions for
searching and filtering.
The dataset used in this research is data from
Food.com, publicly available on Kaggle (Li, 2019).
Following the data, the main goal is to build a
system that can efficiently predict which recipe can
be of interest to a user by learning about the user’s
past choices and preferences and also adding diversity
to the recommendations This would minimize user’s
efforts to search through enormous databases of
recipes on websites, only to find ones that are not to
their liking.
2 RELATED WORK
Recommender systems have a core function to
recommend items that the user would actually take
into consideration (Ricci et al., 2011), with content
filtered recommendations that consider the user’s past
choices and preferences, or collaboratively filtered
recommendations that consider similar users and their
choices and interests (Shokeen and Rana, 2020).
Some researchers say that to give good food
recommendations, it is necessary to take into
consideration the quantity of the ingredients and also
the specificity of the ingredients in the recipes that the
user browsed or cooked (Ueda et al., 2014). Given
this, the algorithms often find similar recipes based
on overlapping ingredients, either treating each
ingredient equally or by identifying key ingredients.
(Geleijnse et al., 2011) built a version of a
personalized recipe recommendation system that
suggests recipes to users based on past food choices
and nutrition intake.
Though there are several choices, to our
knowledge, we chose a hybrid, multimodal approach,
which connects the previous history of recipe usages
for the user and also the ingredients contained, in
order to result in recommendations that are familiar
to the user, but also recommendations that
incorporate new various ingredients. Therefore, in
this paper, we do our first attempt to investigate how
good the recommendations can be if they are based
on the user's previous experience or as we like to say
food choices and preferences and the ingredients in
those food choices.
But in order to map the whole user’s history into
one piece, a specific approach is needed. If
interactions are mapped to a bipartite user-recipe
interaction graph, a recommendation problem can be
transformed into a link prediction problem (Li and
Chen, 2009). The constructed bipartite graph can
capture important information on the relationship
between the users and case recipes (Li and Chen,
2009). However, a weighted network is much more
informative than an unweighted one, so a lot of
techniques can be applied to determine the link's
weights (Zhou et al., 2007).
Even though low-dimensional node embeddings
in large graphs have been proven as useful in link
prediction problems as this one, a big deal of the
approaches require all of the nodes to be present
during embeddings training (Hamilton et al., 2017).
However, due to the low generalization to unseen
nodes of these approaches, they do not seem fit to
recommend problems. GraphSAGE and his
heterogeneous version HinSAGE, efficiently
generate node embeddings for previously unseen
data, where instead of training individual embeddings
for each node, a function that generates embeddings
by looking into the node’s local neighborhood is
learned (Hamilton et al., 2017).
On the other side of the story, in the last few years,
many meta-path-based algorithms are proposed to
carry out data mining tasks over HINs, including
similarity search, personalized recommendation, and
object clustering. In particular, the concept of meta-
paths, which connect two nodes through a sequence
of relations between node types, is widely used to
exploit rich semantics in HINs. Following the
example of the metapaths and their meaning, in this
particular research the metapath P1: U → R → U, will
mean that two users have cooked the same recipe,
whereas the metapath P2: R → U → R, will mean that