with the system can be easily interpreted by analyz-
ing the content previously requested. The assumption
here is that the current item (and/or its genre) influ-
ences the next choice of the user.
Recommender systems have greatly benefited
from probabilistic modeling techniques based on
LDA. Recent works in fact have empirically shown
that probabilistic latent topics models represent the
state-of-the art in the generation of accurate person-
alized recommendations (Barbieri and Manco, 2011;
Barbieri et al., 2011b; Barbieri et al., 2011a). Prob-
abilistic techniques offer some advantages over tra-
ditional deterministic models: notably, they do not
minimize a particular error metric but are designed to
maximize the likelihood of the model given the data
which is a more general approach; moreover, they
can be used to model a distribution over rating values
which can be used to determine the confidence of the
model in providing a recommendation; finally, they
allow the possibility to include prior knowledge into
the generative process, thus allowing a more effective
modeling of the underlying data distribution. Notably,
when preferences are implicitly modeled through se-
lection (that is, when no rating information is avail-
able), the simple LDA best models the probability that
an item is actually selected by a user (Barbieri and
Manco, 2011).
A simple approach to model sequential data
within a probabilistic framework has been proposed
in (Cadez et al., 2000). In this work, authors present
a framework based on mixtures of Markov models
for clustering and modeling of web site navigation
logs, which is applied for clustering and visualiz-
ing user behavior on a web site. Albeit simple, the
proposed model suffers of the limitation that a sin-
gle latent topic underlies all the observation in a sin-
gle sequence. This approach has been overtaken by
other methods based on latent semantic indexing and
LDA. In (Wallach, 2006; X. Wang and Wei, 2007),
for example, the authors propose extension of the
LDA model which assume a first-order Markov chain
for the word generation process. In the resulting Bi-
gram Model (BM) and Topical n-grams, the current
word depends on the current topic and the previous
word observed in the sequence. The LDA Collocation
Model (Griffiths et al., 2007) introduces a new set of
random variables (for bigram status) x which denotes
whether a bigram can be formed with the previous
word token. The bigram status adds a more realistic
than Wallach model which always generates bigrams.
Hidden Markov models (Bishop, 2006, Chapter
13) are a general reference framework for modeling
sequence data. HMMs assume that sequential data are
generated using a Markov chain of latent variables,
with each observation conditioned on the state of the
corresponding latent variable. The resulting likeli-
hood can be interpreted as an extension of a mixture
model in which the choice of mixture components for
each observation is not selected independently but de-
pends on the choice of components for the previous
observation. (Gruber et al., 2007) delve in this di-
rection, and propose an Hidden Topic Markov Model
(HTMM) for text documents. HTTM define a Markov
chain over latent topics of the document. The corre-
sponding generative process assume that all words in
the same sentence share the same topic, while succes-
sive sentences can either rely on the previous topic, or
introduce a new one. The topics in a document form
a Markov chain with a transition probability that de-
pends on a binary topic transition variable ψ. When
ψ = 1, a new topic is drawn for the n-th sentence, oth-
erwise the same previous topic is used.
Following the research direction outlined above,
in this paper we study the effects of “contextual”
information in probabilistic modeling of preference
data. We focus on the case where the context can be
inferred from the analysis of the sequence data, and
we propose a topic model which explicitly makes use
of dependency information for providing recommen-
dations. As a matter of fact, the issue has been dealt
with in similar papers (like, e.g. (Wallach, 2006)).
Here, we resume and extend the approaches in the lit-
erature. by concentrating on the effects of such mod-
eling on recommendation accuracy, as it explicitly re-
flects accurate modeling of user behavior.
In short, the contributions of the paper can be sum-
marized as follows.
1. We propose an unified probabilistic framework to
model dependency in preference data, and instan-
tiate the framework in accordance to a specific as-
sumption on the sequentiality of the underlying
generative process;
2. For the proposed instance, we provide the rela-
tive ranking function that can be used to generate
personalized and context-aware recommendation
lists;
3. We finally show that the proposed sequential mod-
eling of preference data better models the under-
lying data, as it allows more accurate recommen-
dations in terms of precision and recall.
The paper is structured as follows. In Sec. 2 we in-
troduce sequential modeling, and specify in Sec. 2.1
the corresponding item ranking functions for sup-
porting recommendations. The experimental evalu-
ation of the proposed approaches in then presented in
Sec. 3, in which we measure the performance of the
approaches in a recommendation scenario. Section 4
KDIR2012-InternationalConferenceonKnowledgeDiscoveryandInformationRetrieval
76