Novel Topic Models for Content Based Recommender Systems
Kamal Maanicshah, Manar Amayri and Nizar Bouguila
Concordia Institute for Information Systems Engineering, Concordia University, Montreal, Quebec, Canada
Keywords:
Topic Biterm Models, Generalized Dirichlet Distribution, Beta-Liouville Distribution, Mixture Allocation,
Recommender Systems.
Abstract:
Content based recommender systems play a vital role in applications related to user suggestions. In this paper,
we introduce novel topic models which help tackle the recommendation task. Being one of the prominent
approaches in the field of natural language processing, topic models like latent Dirichlet allocation (LDA) try
to identify patterns of topics across multiple documents. Due to the proven efficiency of generalized Dirichlet
allocation and Beta-Liouville allocation in recent times, we use these models for better performance. In
addition, since it is a known fact that co-occurences of words are commonplace in text documents, the models
have been designed with this reality in mind. Our models follow a mixture based design to achieve better
topic quality. We use variational inference for estimating the parameters. Our models are validated with two
different datasets for recommendation tasks.
1 INTRODUCTION
Recommendation systems have become an insepa-
rable part of a variety of online services like web
search, news articles, movies, etc. in recent years
(Pazzani and Billsus, 2007). Most of the recent ad-
vancement in this field is centred on collaborative fil-
tering (Bobadilla et al., 2011) and content based fil-
tering (Pazzani and Billsus, 2007). While the for-
mer method is based on modelling the activities of
users with similar behaviour in a platform, the later
works on modelling the likes of an individual user in
the platform. Both approaches have their own merits
and are used depending on the task at hand. In this
article, we explore a content based recommender sys-
tem based on novel topic models. Topic modelling
refers to an unsupervised learning approach that ex-
tracts topics from documents and groups the words
belonging to each topic. Latent Dirichlet allocation
(LDA) is one of the most famous topic models used
for this purpose (Blei et al., 2003). Since the intro-
duction of LDA a number of research ideas have been
proposed to improve the vanilla model to suit differ-
ent applications. For example, there is demonstra-
tion of LDA being used to model multilingual top-
ics simultaneously (Mimno et al., 2009). This would
help in tagging documents appropriately irrespective
of language. Another improvement over LDA which
can be used for supervised classification of images
has also proved to be effective (Chong et al., 2009).
LDA has also been used for creating recommenda-
tion systems (Nagori and Aghila, 2011). LDA can
extract topics from the description of user activity
which could help suggest new items that the user
might be interested in. It is well known that models
which take into account, the co-occurrences of words,
tend to give a boost for topic modelling tasks (Opper
and Saad, 2001). This made us to choose a design
which incorporates the possibility of bigram words
such as ‘Thank you’, ‘high school’, etc. Recent re-
search has shown that using mixture models in con-
junction with LDA helps in extracting better topics
(Opper and Saad, 2001). In our models we will in-
tegrate this idea to improve recommendations. There
has also been studies, involving the use of alternative
priors other than Dirichlet for the prior of topic pro-
portions in a document. Generalized Dirichlet (GD)
and Beta-Liouville (BL) distributions have proved to
be efficient substitutes for Dirichlet (Bakhtiari and
Bouguila, 2016; Bakhtiari and Bouguila, 2014). Gen-
eralized Dirichlet (GD) distribution has a general co-
variance structure which might help to better fit the
data, as compared to Dirichlet which has a nega-
tive covariance structure. The drawback of GD how-
ever, is that twice the number of parameters have to
be estimated than Dirchlet distribution. BL distribu-
tion helps to overcome this drawback and also sports
a general covariance matrix. Based on these theo-
138
Maanicshah, K., Amayri, M. and Bouguila, N.
Novel Topic Models for Content Based Recommender Systems.
DOI: 10.5220/0011826700003467
In Proceedings of the 25th International Conference on Enterprise Information Systems (ICEIS 2023) - Volume 1, pages 138-145
ISBN: 978-989-758-648-4; ISSN: 2184-4992
Copyright
c
2023 by SCITEPRESS Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)
retical grounds and experimental proofs, we decided
to use these distributions as priors in our model to
provide a better fit to the data. Parameters estima-
tion plays a crucial role in machine learning models.
Most of the approaches on LDA based models men-
tion variational inference and Gibbs sampling as ef-
fective methods for estimating the parameters (Blei
et al., 2003; Liu et al., 2020). However, in the case of
pure Bayesian approaches such us Gibbs sampling,
computations are not always tractable for complex
priors (Attias, 1999). Variational inference on the
other hand, approximates the posterior probability in-
stead of calculating it which gives guaranteed conver-
gence (Hu et al., 2019). Hence, we choose variational
inference as our parameter estimation method. Fur-
thermore, as opposed to the frequently used method
for inferring the variational solutions as established
in (Blei et al., 2003), we employed the method used
for mixture models in (Fan et al., 2012). This makes
the mathematical computation of variational solutions
easier. We evaluate our model, with two challenging
datasets. One of them is for anime recommendation
and the other is for recommendation of movies from
netflix. We estimate the performance of the model
based on coherence score for both datasets. In addi-
tion, since we had enough ground truth data to vali-
date the netflix dataset, we estimate the accuracy of
predictions as well. The rest of the paper is organized
as follows: The description of the proposed models is
given in Section 2. This is followed by the variational
algorithm to estimate the parameters in Section 3. The
experiments performed on the datasets with our mod-
els are detailed in Section 4. We finally conclude in
Section 5 with our findings.
2 MODEL DESCRIPTION
Let us assume that we have a set of D documents in
a corpus. We denote the number of words in a doc-
ument d = 1, 2,...,D by N
d
. For the n
th
word among
N
d
words in a document d, w
dn
can be represented as
a V dimensional indicator vector, where w
dnv
= 1 if
the word w
dn
is the v
th
word in the vocabulary and
0 otherwise. We also have another latent variable
Z = {~z
dn
} which specifies, to which among the K
topics the word has maximum affinity. z
dnk
follows a
similar convention that z
dnk
= 1 if the word belongs to
the k
th
topic of the K different topics and 0 otherwise.
The distribution of words for a topic k is given by a
multinomial with parameters
~
β
k
which has a Dirichlet
prior with parameter
~
λ
k
. p(
~
θ | Φ) is the distribution of
the prior for the topic proportions for the documents
and takes the form of GD or BL distributions with
parameter Φ. In addition to this, we also have an-
other latent variable Y = (~y
1
,~y
2
,...,~y
D
) correspond-
ing to the mixture model which is an L dimensional
one hot encoded indicator vector showing which com-
ponent the document belongs to. Hence, y
dl
= 1 if
the document is sampled from the l
th
component and
0 if not. Y in turn is sampled from a multinomial
distribution with parameters
~
π = (π
1
,π
2
,...,π
L
) with
the constraints, 0 π
l
1 and
L
l=1
π
l
. According
to these assumptions, for a corpus W containing D
documents, we can write the marginal as,
p(W |
~
π,
~
Φ,
~
β) =
D
d=1
Z
"
y
d
p(
~
θ
d
| y
d
,
~
Φ)p(y
d
|
~
π)
N
d
n=1
z
dn
p(w
dn
,w
d(n1)
| z
dn
,
~
β)
p(z
dn
|
~
θ
d
)
#
d
~
θ
d
(1)
where
~
Φ,
~
β are the parameters of the prior distribu-
tion for document topic proportions and topic word
proportions respectively. Here, w
d(n1)
= v
n1
and
w
dn
= v
n
incorporates the dependency of adjacent
words to the topic latent variable. Based on this gen-
eral structure, we can define the priors based on GD
and BL distributions as mentioned in the following
subsections.
2.1 Latent Generalized Dirichlet
Bi-Term Mixture Allocation
(Bi-LGDMA)
In the case of Bi-LGDMA, the prior for the topic
proportions is generated from a GD distribution.
Let us consider a GD distribution with parameters
(σ
l1
,σ
l2
,...,σ
lK
,τ
l1
,τ
l2
,...,τ
lK
). The probability den-
sity function of the topic proportions can be written
as,
p(
~
θ
d
|
~
σ
l
,
~
τ
l
) =
K
k=1
Γ(τ
lk
+ σ
lk
)
Γ(τ
lk
)Γ(σ
lk
)
θ
σ
lk
1
dk
1
k
j=1
θ
d j
!
γ
lk
(2)
where, γ
lk
= τ
lk
τ
l(k+1)
σ
l(k+1)
for k = 1, 2, ...,K
1 and γ
lk
= σ
lk
1 for k = K. As mentioned ear-
lier, owing to the fact that using a mixture model over
the topic proportions helps improve the model (Chien
et al., 2018), we introduce mixture models as,
p(
~
θ
d
|~y
d
,
~
σ,
~
τ) =
L
l=1
"
K
k=1
Γ(τ
lk
+ σ
lk
)
Γ(τ
lk
)Γ(σ
lk
)
θ
σ
lk
1
dk
1
k
j=1
θ
d j
!
γ
lk
#
y
dl
(3)
Novel Topic Models for Content Based Recommender Systems
139
The latent variable Y is governed by a multinomial
distribution with parameter
~
π which is the mixing
weights for the mixture model. This is denoted by
p(y
d
|
~
π) =
L
l=1
π
y
dl
l
. Contrary to bigrams where the
probability of two words occurring together is consid-
ered, we take into account that logically these bigrams
end up belonging to the same topic and consider them
as bi-terms associated with the same topic. This gives
us,
p(w
d(n1)
,w
dn
| z
dn
,
~
β) =
K
k=1
V
v=1
β
w
d(n1)(v1)
+w
dnv
kv
z
dnk
(4)
The relation between the topic proportions and la-
tent variable ~z
d
is given by the multinomial p(z
dn
|
~
θ
d
) =
K
k=1
θ
z
dnk
dk
. In general, it is well known that
introducing conjugate prior over the undetermined
parameters helps improve parameter estimation (Fan
et al., 2012). However, in the case of GD the conju-
gate priors are intractable. Due to this reason, we in-
troduce Gamma prior for the parameter
~
σ as p(σ
lk
) =
G(σ
lk
| υ
lk
,ν
lk
) =
ν
υ
lk
lk
Γ(υ
lk
)
σ
υ
lk
1
lk
e
ν
lk
σ
lk
, where G(·)
indicates a Gamma distribution. Similarly, follow-
ing the same convention, the prior for
~
τ is given by,
p(τ
lk
) = G(τ
lk
| s
lk
,t
lk
). In addition, we also apply
variational smoothing as mentioned in (Blei et al.,
2003) which helps to eliminate problems that arise
due to sparsity in data. Assuming a Dirichlet prior
over
~
β we can define,
p(
~
β
k
|
~
λ
k
) =
Γ(
V
v=1
λ
kv
)
V
v=1
Γ(λ
kv
)
V
v=1
β
λ
kv
1
kv
(5)
We assume a GD distribution over
~
θ
d
given by the
equation,
p(
~
θ
d
|~g
d
,
~
h
d
) =
K
k=1
Γ(g
dk
+ h
dk
)
Γ(g
dk
)Γ(h
dk
)
θ
g
dk
1
dk
1
k
j=1
θ
d j
!
ζ
dk
(6)
where, ζ
dk
= h
dk
g
d(k1)
h
d(k1)
while k
K 1 and ζ
dk
= h
dk
1 when k = K. This helps us in
deriving the variational solutions. Thus, considering
the parameters Θ = {Z,
~
β,
~
θ,
~
σ,
~
τ,~y}, the joint poste-
rior distribution can be written as,
p(W,Θ) =p(
~
W | Z,
~
β)p(~z |
~
θ)p(
~
θ |
~
σ,
~
τ,~y)p(~y |
~
π)
× p(
~
θ |~g,
~
h)p(
~
β |
~
λ)p(
~
σ |
~
υ,
~
ν)p(
~
τ |
~
s,
~
t)
(7)
2.2 Latent Beta-Liouville Bi-Term
Mixture Allocation (Bi-LBLMA)
By following similar assumptions, we can construct
our Bi-LBLMA model with some changes. The basic
idea here is to replace the prior for topic proportions
with BL distribution. Considering a BL distribution
with parameters (µ
l1
,µ
l2
,...,µ
lK
,σ
l
,τ
l
), we can write
the prior as,
p(
~
θ
d
|~µ,
~
σ,
~
τ) =
L
l=1
K
k=1
Γ(
K
k=1
µ
lk
)
K
k=1
Γ(µ
lk
)
Γ(σ
l
+ τ
l
)
Γ(σ
l
)Γ(τ
l
)
θ
µ
lk
1
dk
×
h
K
k=1
θ
dk
i
σ
l
K
k=1
µ
lk
h
1
K
k=1
θ
dk
i
τ
l
1
(8)
Assuming Gamma priors for the parameters quot-
ing the same reasons for Bi-LGDMA, the priors ar
given by, G(µ
lk
| υ
lk
,ν
lk
), G(σ
l
| s
l
,t
l
) and G(τ
l
|
l
,Λ
l
) respectively. The variational distribution for
the topic proportions in the case of Bi-LBLMA con-
sequently takes the form,
p(
~
θ
d
|
~
f
d
,g
d
,h
d
) =
K
k=1
Γ(
K
k=1
f
dk
)
K
k=1
Γ( f
dk
)
Γ(g
d
+ h
d
)
Γ(g
d
)Γ(h
d
)
θ
f
dk
1
dk
×
h
K
k=1
θ
dk
i
g
d
K
k=1
f
dk
×
h
1
K
k=1
θ
dk
i
h
d
1
(9)
The rest of the equations are the same as men-
tioned in previous subsection. Making these changes,
we can write the posterior joint probability for Bi-
LBLMA as,
p(W,Θ) =p(
~
W | Z,
~
β)p(~z |
~
θ)p(
~
θ |~µ,
~
σ,
~
τ,~y)p(~y |
~
π)
× p(
~
θ |
~
f ,~g,
~
h)p(
~
β |
~
λ)p(~µ |
~
υ,
~
ν)
× p(
~
σ |
~
s,
~
t)p(
~
τ |
~
,
~
Λ) (10)
where Θ = {Z,
~
β,
~
θ,~µ,
~
σ,
~
τ,~y} represents the parame-
ters of the model.
3 VARIATIONAL INFERENCE
Having defined the model, the next step is to estimate
the parameters. In this article we use the variational
method used in (Fan et al., 2012). The basic idea
of variational inference is to assume a distribution
Q(Θ) which is bound to be the approximation of the
true posterior p(W | Θ) and then minimize the differ-
ence between the two distributions until they are sim-
ilar. This is done by calculating the Kullback-Leibler
(KL) divergence between the two distributions. The
equation to find KL divergence between Q(Θ) and
p(W | Θ) can be written as,
KL(Q || P) =
Z
Q
Θ
ln
p
W | Θ
Q
Θ
dΘ (11)
ICEIS 2023 - 25th International Conference on Enterprise Information Systems
140
We can simplify this equation as,
KL(Q || P) = ln p(W ) L(Q) (12)
where, L(Q) =
R
Q
Θ
ln
p
W,Θ
Q
Θ
dΘ is the lower
bound. Theoretically, when KL(Q || P) is 0 the dis-
tributions must be identical. Hence, maximizing the
lower bound L(Q) will minimize the value of KL di-
vergence and consequently bring the value closer to
0. By following mean-field theory (Opper and Saad,
2001) we consider the parameters to be independent
of each other since the true posterior becomes in-
tractable otherwise. Q(Θ) can now be written as the
product of the individual parameters as Q(Θ) =
J
j=1
,
with J being the total number of parameters.The opti-
mal solution for each of the parameters can be found
by calculating the expectations of all the parameters
except the current parameter. This can be expressed
as,
Q
j
Θ
j
=
exp
ln p
W,Θ

6= j
R
exp
ln p
W,Θ

6= j
dΘ
(13)
Once initiated with some random values, the varia-
tional solutions are updated iteratively and thus low-
ering the lower bound. The optimal variational so-
lutions for all the parameters are obtained at conver-
gence. The variational solutions for our models are
given in the following subsections.
3.1 Variational Solutions for
Bi-LGDMA
Calculating the variational solutions for Eq. 7 yields
the following equations:
Q(Y ) =
D
d=1
L
l=1
r
y
dl
dl
,Q(Z) =
D
d=1
N
d
N=1
K
k=1
φ
z
dnk
dnk
(14)
Q(
~
σ) = G(
~
σ |
~
υ
,
~
ν
),Q(
~
τ) = G(
~
τ |
~
s
,
~
t
) (15)
Q(
~
β) =
K
k=1
V
v=1
Γ(
V
v=1
λ
kv
)
V
v=1
Γ(λ
kv
)
β
λ
kv
1
kv
(16)
Q(
~
θ) =
D
d=1
K
k=1
Γ(g
dk
+ h
dk
)
Γ(g
dk
)Γ(h
dk
))
θ
g
dk
1
dk
1
k
j=1
θ
d j
!
ζ
dk
(17)
where,
r
dl
=
ρ
dl
L
l=1
ρ
dl
,φ
dnk
=
δ
dnk
K
k=1
δ
dnk
,π
l
=
1
D
D
d=1
r
dl
(18)
ρ
dl
=exp
(
lnπ
l
+R
l
+
K
k=1
(σ
lk
1)
lnθ
dk
+γ
lk
D
1
k
j=1
θ
d j
E
)
(19)
δ
dnk
= exp
w
d(n1)(v1)
+ w
dnv
lnβ
kv
+
lnθ
dk
(20)
Here, R is the taylor series approximations of
ln
Γ(σ+τ)
Γ(σ)Γ(τ)
and is given by,
R = ln
Γ(σ + τ)
Γ(σ)Γ(τ)
+ σ
Ψ(σ + τ) Ψ(σ)
(
lnσ
ln σ)
+ τ
Ψ(σ + τ) Ψ(τ)
(
lnτ
ln τ)
+ 0.5σ
2
Ψ
0
(σ + τ) Ψ
0
(σ)

(lnσ ln σ)
2
+ 0.5τ
2
Ψ
0
(σ + τ) Ψ
0
(τ)

(lnτ ln τ)
2
+ σ τΨ
0
(σ + τ)(
lnσ
ln σ)(
lnτ
ln τ)
(21)
υ
lk
=υ
lk
+
D
d=1
y
dl
"
Ψ
σ
lk
+ τ
lk
Ψ
σ
lk
+ τ
lk
Ψ
0
σ
lk
+ τ
lk

lnτ
lk
ln τ
lk
#
σ
lk
(22)
s
lk
=s
lk
+
D
d=1
y
dl
"
Ψ
τ
lk
+ σ
lk
Ψ
τ
lk
+ σ
lk
Ψ
0
τ
lk
+ σ
lk

lnσ
lk
ln σ
lk
#
τ
lk
(23)
ν
lk
= ν
lk
D
d=1
y
dl

lnθ
dk
(24)
t
lk
= t
lk
D
d=1
y
dl
*
ln
h
1
K
j=1
θ
d j
i
+
(25)
g
dk
= g
dk
+
N
d
n=1
z
dnk
+
L
l=1
y
dl
σ
lk
(26)
h
dk
= h
dk
+
L
l=1
y
dl
τ
lk
+
K
kk=k+1
φ
dn(kk)
(27)
λ
kv
= λ
kv
+
D
d=1
N
d
n=1
V
v=1
φ
dnk
w
d(n1)v
+ w
dnv
(28)
π
l
=
1
D
D
d=1
r
dl
(29)
In the above equations,
·
indicates the expectation
of the variable, whose values are detailed in (Maan-
icshah et al., 2023). We calculate equations 14 - 17
by repetitively updating the parameters until there is
no considerable change in the lower bound estimates.
At this point of convergence, we will have the optimal
values for the variational solutions.
Novel Topic Models for Content Based Recommender Systems
141
3.2 Variational Solutions for LBLMA
Similar to the previous section, we can derive the fol-
lowing variational solutions for Eq. 10. The only dif-
ference is the apparent change in Q(
~
θ) and some def-
initions of related variables. The variational solutions
are hence given by,
Q(Y ) =
D
d=1
L
l=1
r
y
dl
dl
,Q(Z) =
D
d=1
N
d
N=1
K
k=1
φ
z
dnk
dnk
(30)
Q(~µ) = G(~µ
|
~
υ,
~
ν
),Q(σ
l
) = G(σ
l
| s
l
,t
l
) (31)
Q(τ
l
)=G(τ
l
|
l
,Λ
l
),Q(
~
β)=
K
k=1
V
v=1
Γ(
V
v=1
λ
kv
)
V
v=1
Γ(λ
kv
)
β
λ
kv
1
kv
(32)
Q(
~
θ) =
D
d=1
K
k=1
Γ(
K
k=1
f
dk
)
Γ( f
dk
)
Γ(g
d
+ h
d
)
Γ(g
d
)Γ(h
d
)
θ
f
dk
1
dk
×
h
K
k=1
θ
dk
i
g
d
K
k=1
f
dk
h
1
K
k=1
θ
dk
i
h
d
1
(33)
where,
r
dl
=
ρ
dl
L
l=1
ρ
dl
,φ
dnk
=
δ
dnk
K
k=1
δ
dnk
,π
l
=
1
D
D
d=1
r
dl
(34)
ρ
dl
=exp
(
lnπ
l
+ R
l
+ S
l
+ (µ
lk
1)
lnθ
dk
+
σ
l
K
k=1
µ
lk
D
ln
K
k=1
θ
dk
E
+ (τ
l
1)
D
ln
h
1
K
k=1
θ
dk
iE
)
(35)
Due to intractability, we use Taylor series expansions
for
Γ(
K
k=1
σ
lk
)
Γ(σ
lk
)
and
ln
Γ(σ+τ)
Γ(σ)Γ(τ)
denoted by R and
S respectively. The approximations are given as,
R
l
=ln
Γ(
K
k=1
µ
lk
)
K
k=1
Γ(µ
lk
)
+
K
k=1
µ
lk
h
Ψ
K
k=1
µ
lk
Ψ(µ
lk
)
i

lnµ
lk
ln µ
lk
+
1
2
K
k=1
µ
2
lk
h
Ψ
0
K
k=1
µ
lk
Ψ
0
(µ
lk
)
i
×
(lnµ
lk
ln µ
lk
)
2
+
1
2
K
a=1
K
b=1,a6=b
µ
la
µ
lb
×
h
Ψ
0
K
k=1
µ
lk
lnµ
la
ln µ
la

lnµ
lb
ln µ
lb
i
S =ln
Γ(σ + τ)
Γ(σ)Γ(τ)
+ σ
Ψ(σ + τ) Ψ(σ)
(
lnσ
ln σ)
+ τ
Ψ(σ + τ) Ψ(τ)
(
lnτ
ln τ)
+ 0.5σ
2
Ψ
0
(σ + τ) Ψ
0
(σ)

(lnσ ln σ)
2
+ 0.5τ
2
Ψ
0
(σ + τ) Ψ
0
(τ)

(lnτ ln τ)
2
+ σ τΨ
0
(σ + τ)(
lnσ
ln σ)(
lnτ
ln τ)
(36)
υ
lk
=υ
lk
+
D
d=1
y
dl
µ
lk
h
Ψ
K
k=1
µ
lk
Ψ(µ
lk
)
+ Ψ
K
k=1
K
a6=k

lnµ
la
ln µ
la
µ
la
i
(37)
ν
lk
= ν
lk
D
d=1
y
dl
h
lnθ
dk
D
ln
K
k=1
θ
dk
Ei
(38)
s
l
=s
l
+
D
d=1
y
dl
"
Ψ
σ
l
+ τ
l
Ψ
σ
l
+ τ
l
Ψ
0
σ
l
+ τ
l

lnτ
l
ln τ
l
#
σ
l
(39)
t
l
= t
l
D
d=1
y
dl
*
ln
h
K
k=1
θ
dk
i
+
(40)
l
=
lk
+
D
d=1
y
dl
"
Ψ
τ
l
+ σ
l
Ψ
τ
l
+ σ
l
Ψ
0
τ
l
+ σ
l

lnσ
l
ln σ
l
#
τ
l
(41)
Λ
l
= Λ
l
D
d=1
y
dl
*
ln
h
1
K
k=1
θ
dk
i
+
(42)
ICEIS 2023 - 25th International Conference on Enterprise Information Systems
142
f
dk
= f
dk
+
N
d
n=1
z
dnk
+
L
l=1
y
dl
µ
lk
(43)
g
d
= g
d
+
N
d
n=1
K
k=1
z
dnk
+
L
l=1
y
dl
σ
l
(44)
h
d
= h
d
+
L
l=1
y
dl
τ
l
(45)
The expectations in these equations are defined with
respect to BL distribution in (Maanicshah et al.,
2022). Similar to Bi-LBLMA, we calculate equations
30 - 33 repeatedly until convergence to find the opti-
mal solutions.
4 EXPERIMENTAL RESULTS
To evaluate the performance of our model, we build a
system for anime recommendation based on a dataset
in Kaggle containing information about anime
1
and
another for recommending movies based on data from
netflix prize data
2
. We compare our model with
widely used LDA and examine how our models weigh
up against unmodified latent generalized Dirichlet al-
location (LGDA) and latent Beta-Liouville allocation
(LBLA) models. The idea of our recommendation
system is that we find the Euclidean distance between
the document topic proportions φ
dk
of the query doc-
ument and the rest of the documents. We can then
find the top N recommendations for that query. The
following subsections detail our experiments for the
two datasets.
4.1 Anime Recommendation
This dataset consisted of 2 files containing informa-
tion about anime, reviews of users and user profile
details. The anime file had around 16K anime de-
tails like, title, synopsis, genre, airing date, etc. The
profiles file had details of users and the anime they
have added as favourites. The reviews file has infor-
mation on the reviews the user has written for differ-
ent anime. All these data has been extracted from
https://myanimelist.net. From the anime details file,
the data that helps for content based recommenda-
tion is mainly the synopsis. However, the synopsis
was not available for some of the anime within the
data. Hence we used the myanimelist API to extract
1
https://www.kaggle.com/datasets/marlesson/myanimelist-
dataset-animes-profiles-reviews
2
https://www.kaggle.com/datasets/netflix-inc/netflix-
prize-data
missing synopsis. There were cases in which some
of the titles refer to a parent anime and the descrip-
tion of parent anime was taken in these cases. We
ignore anime where the synopsis is too short. After
applying these constraints we were left with around
1126 anime to use for our content based recommen-
dation system. In the case of this dataset, there were
a very few user profiles who had more than 20 anime
in their favourites list which was not enough to eval-
uate our models. To understand the relevance of the
topics that have been extracted by our model, we cal-
culated UMass coherence score(Mimno et al., 2011)
which takes into account, the probability of two words
within a topic occurring in the corpus. It is given by,
score
UMass
(k) =
M
k
i=2
M
k
1
j=1
log
p(w
i
,w
j
) + 1
p(w
i
)
(46)
U
k
in the above equation indicates the number of top
words taken into consideration for the topic. In our
case this value is 10. The equation basically calcu-
lates the relevancy of words within a topic by finding
the ratio of the probability of two words w
i
and w
j
occurring together to the probability of word w
i
for
which the score is being calculated. Table 1 shows the
coherence scores of topics derived from LDA, latent
generalized Dirichlet allocation (LGDA), latent Beta-
Liouville allocation (LBLA) and Bi-LGDMA and Bi-
LBLMA for different values of L. It can be seen that
using a GD and BL prior helps in obtaining better top-
ics with a higher coherence score. Bi-LBLMA per-
forms better than Bi-LGDMA according to our ex-
periments, which is due to the fact that choosing the
parameters for Bi-LGDMA is a little harder than Bi-
LBLMA. We calculated the coherence scores for dif-
ferent values of K to find the correct number of top-
ics for the model. The best results were observed
when K was set to 5 as observed in Figure 1. Both
Bi-LGDMA and Bi-LBLMA performed well when
L = 3. In the case of Bi-LBLMA we see that the
coherence is very close when L = 3 and L = 4. In
these situations choosing the L as 3 or 4 will give
similar recommendations. This being a quantitative
assessment of the model, to qualitatively see how the
model performs, Table 1 and 2 shows few of the top
ten suggestions for a query anime for the two models.
‘Bleach’ is an anime based on travelling between
worlds through portals in the action genre. The anime
suggested by Bi-LGDMA aligns with this concept
of inter-dimensional portals and magic. Similarly,
the test query for Bi-LBLMA was an anime called
‘Dragon Ball’ which involves super-human fighting.
It is interesting to see that our model identified the
sequel to the original anime followed by a few other
anime like ‘Boku no Hero Academia’ which also falls
Novel Topic Models for Content Based Recommender Systems
143
Figure 1: Coherence score for anime dataset for different
values of K and L.
Table 1: Query results for Anime data with Bi-LGDMA.
S. No. Bleach
1 Fullmetal Alchemist
2 Rosario to Vampire
3 World Trigger
4 FLCL
5 Tenjou Tenge
under the same category.
4.2 Netflix Movie Recommendation
The Netflix dataset is bigger compared to the anime
datset. The dataset consists of details pertaining to
ratings fo different users for around 17000 movies
released before the year 2006. However, the prob-
lem with this dataset is that the synopsis of movies
were not available. Hence, we scraped the data from
wikipedia pages to get this details and then used it
for content based recommendation. We selected the
movies released after 2000 so that we are aware of
them to test qualitatively. This gave us around 4000
movies with description. From the user details, we
consider that a user likes a movie when they rate it as
4 or 5. We selected users who had liked at least 300
movies. This left us with 900 users as ground truth.
These conditions are only to quantitatively access our
models and can be ignored in realtime applications.
When queried with a movie that an user likes, if one
of the top N recommendations by our model is present
in the list of movies liked by that user, then we con-
sider it as a hit. By using this logic, we can calcu-
late the accuracy of our model by calculating the ra-
tio of total number of hits to total number of queries.
We also calculate the coherence score of our topics as
in the previous subsection which is graphed in Fig-
ure 2. We can see that both our models perform the
best when L = 2 and K = 5. The performance im-
provement achieved by our models compared to the
widely used LDA model proves the efficiency of our
Table 2: Query results for Anime data with Bi-LBLMA.
S. No. Dragon Ball
1 Dragon Ball Z
2 Dragon Ball Super Movie: Broly
3 Boku no Hero Academia
4 Yu-Gi-Oh Duel Monsters
5 Fate/stay night
model to represent the topics better. In addition to
Figure 2: Coherence score for anime dataset for different
values of K and L.
these analysis Table 3 shows the accuracy of differ-
ent models. Though both Bi-LGDMA and Bi-LBLA
give comparatively better accuracy for our model, the
improvement for Bi-LGDMA is not that much when
compared to Bi-LBLA. Similar to the last experiment,
Table 3: Accuracy of recommendation at N = 15 for Netflix
Data.
Model Accuracy
LDA 85.59
LGDA 84.40
Bi-LGDMA 86.00
LBLA 86.50
Bi-LBLMA 87.36
we also check the quality of recommendations for two
sample queries. This is shown in Table 4 and 5. We
can see that Bi-LGDMA recommends a set of teenage
and kids action movies like Agent Cody Banks’ when
queried with the movie ‘The Pacifier’ which is a kids
action comedy. In the case of Bi-LGDMA ‘Resident
Evil’ is a zombie movie where the virus causes the
people to attack the non-infected people. The recom-
mendations from our model found similar plot lines
like ‘Dawn of the dead’, ‘Sasquatch’, etc which are
movies based on virus outbreak, hunted by animals
and so on.
ICEIS 2023 - 25th International Conference on Enterprise Information Systems
144
Table 4: Query results for Netflix data.
S. No. The Pacifier
1 Agent Cody Banks
2 Agent Cody Banks 2: Destination London
3 Lilo and Stitch 2
4 101 Dalmations II: Patch’s London
Adventure
5 Mean Creek
Table 5: Query results for Netflix data with Bi-LBLMA.
S. No. Resident Evil
1 Dawn of the Dead
2 Sasquatch
3 Wrong Turn
4 Evil Remains
5 Dead Birds
5 CONCLUSION
We have introduced two novel models for topic mod-
elling and applied it for recommendation tasks. The
models are found to be effective when compared to
widely used models such as LDA. From the exam-
ple queries, we see that our models are able to deliver
promising suggestions that the user might like. The
improvement achieved by using GD and BL distribu-
tions is also clearly seen. Using biterms in conjunc-
tion with our models tend to improve the results con-
siderably. Especially, the Bi-LBLMA model proves
to be a good alternative to LDA based on the results
from both the experiments.
REFERENCES
Attias, H. (1999). A variational baysian framework for
graphical models. In Solla, S., Leen, T., and M
¨
uller,
K., editors, Advances in Neural Information Process-
ing Systems, volume 12, Cambridge, Masschusetts.
MIT Press.
Bakhtiari, A. S. and Bouguila, N. (2014). Online learning
for two novel latent topic models. In Linawati, Ma-
hendra, M. S., Neuhold, E. J., Tjoa, A. M., and You, I.,
editors, Information and Communication Technology,
pages 286–295, Berlin, Heidelberg. Springer Berlin
Heidelberg.
Bakhtiari, A. S. and Bouguila, N. (2016). A latent beta-
liouville allocation model. Expert Systems with Appli-
cations, 45:260–272.
Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent
dirichlet allocation. Journal of machine Learning re-
search, 3(Jan):993–1022.
Bobadilla, J., Hernando, A., Ortega, F., and Bernal, J.
(2011). A framework for collaborative filtering rec-
ommender systems. Expert Systems with Applica-
tions, 38(12):14609–14623.
Chien, J.-T., Lee, C.-H., and Tan, Z.-H. (2018). Latent
dirichlet mixture model. Neurocomputing, 278:12–
22. Recent Advances in Machine Learning for Non-
Gaussian Data Processing.
Chong, W., Blei, D., and Li, F.-F. (2009). Simultaneous im-
age classification and annotation. In 2009 IEEE Con-
ference on Computer Vision and Pattern Recognition,
pages 1903–1910.
Fan, W., Bouguila, N., and Ziou, D. (2012). Variational
learning for finite dirichlet mixture models and appli-
cations. IEEE transactions on neural networks and
learning systems, 23(5):762–774.
Hu, C., Fan, W., Du, J.-X., and Bouguila, N. (2019). A
novel statistical approach for clustering positive data
based on finite inverted beta-liouville mixture models.
Neurocomputing, 333:110–123.
Liu, Y., Du, F., Sun, J., and Jiang, Y. (2020). ilda: An
interactive latent dirichlet allocation model to im-
prove topic quality. Journal of Information Science,
46(1):23–40.
Maanicshah, K., Amayri, M., and Bouguila, N. (2022).
Improving topic quality with interactive beta-liouville
mixture allocation model. In 2022 IEEE Symposium
Series on Computational Intelligence (SSCI), pages
1143–1148.
Maanicshah, K., Amayri, M., and Bouguila, N. (2023).
Interactive generalized dirichlet mixture allocation
model. In Structural, Syntactic, and Statistical Pattern
Recognition: Joint IAPR International Workshops, S+
SSPR 2022, Montreal, QC, Canada, August 26–27,
2022, Proceedings, pages 33–42. Springer.
Mimno, D., Wallach, H. M., Naradowsky, J., Smith, D. A.,
and McCallum, A. (2009). Polylingual topic models.
In Proceedings of the 2009 Conference on Empirical
Methods in Natural Language Processing: Volume 2 -
Volume 2, EMNLP ’09, page 880–889, USA. Associ-
ation for Computational Linguistics.
Mimno, D., Wallach, H. M., Talley, E., Leenders, M., and
McCallum, A. (2011). Optimizing semantic coher-
ence in topic models. In Proceedings of the Confer-
ence on Empirical Methods in Natural Language Pro-
cessing, Proceedings of the Conference on Empirical
Methods in Natural Language Processing; EMNLP
’11, page 262–272, USA. Association for Computa-
tional Linguistics.
Nagori, R. and Aghila, G. (2011). Lda based integrated
document recommendation model for e-learning sys-
tems. In 2011 International Conference on Emerging
Trends in Networks and Computer Communications
(ETNCC), pages 230–233.
Opper, M. and Saad, D. (2001). Advanced mean field meth-
ods: Theory and practice. MIT press, Cambridge,
Masschusetts.
Pazzani, M. J. and Billsus, D. (2007). Content-Based
Recommendation Systems, pages 325–341. Springer
Berlin Heidelberg, Berlin, Heidelberg.
Novel Topic Models for Content Based Recommender Systems
145