model with contextual information. Let’s consider the
relation watches(User, Movie, LastMovieWatched-
ByUser, Month) which says that a user watches a
movie in a given month and where we also have infor-
mation about the last movie that the user has watched.
Such a relation can be modeled by a four-way tensor
which would give us, after reconstruction and normal-
ization,
ˆ
P(User, Movie, LastMovieWatchedByUser,
Month). Naturally, the contingency tables for ten-
sors are very sparse, in particular if one considers
that the involved variables often have many thousand
states; the goal of this paper is to exploit structure in
the data, visualized as graphical models, to generate
data-efficient models. Graphical models are a com-
mon approach for exploiting independencies in high-
dimensional domains.
We believe that this new way of the application of
graphical model can lead to quite interesting and pow-
erful models. A particular benefit is the modularity of
the approach which permits a separate optimization of
local models, which, of course, is the benefit of graph-
ical models —in particular of Bayesian networks and
decomposable models— in general (Lauritzen, 1996).
The paper is organized as follows. In the next sec-
tion, we describe related work. In Section 3 we de-
scribe the basic idea and in Section 4 we develop the
approach using data from a social network. We show
that contextual information can improve the predic-
tion. Section 5 contains our conclusions.
2 RELATED WORK
Graphical models have a long history in expert
systems and statistical modeling (Lauritzen, 1996).
Graphical models have also been applied to relational
domains. Prominent examples are Probabilistic Re-
lational Models (Koller and Pfeffer, 1998; Getoor
et al., 2007), Markov Logic Networks (Domingos and
Richardson, 2007), and Infinite Hidden Relational
Models (Kemp et al., 2006; Xu et al., 2006). Al-
though being very general, the application of these
models to a given relational domain might still be
tricky: Probabilistic Relational Models require in-
volved structural optimization, Markov Logic Net-
works depend on the available of rule sets and logical
expressions (approximately) valid in the domain and
Infinite Hidden Relational Models require complex
inference processes. Here, we focus on the modeling
of a single relation which leads to simpler and scal-
able models. The sampling assumptions in this paper
are similar to the ones made in the pLSI model (Hof-
mann, 1999) and the underlying assumptions in some
matrix and tensor decomposition approaches (Ren-
dle et al., 2010; Wermser et al., 2011), although in
these papers, this sampling assumption is not stated
explicitly. The difference is that here we exploit inter-
dependencies in the domain using graphical models
whereas those approaches form a joint clustering and
factorization model, respectively. It might be inter-
esting to note that (Rendle et al., 2010) uses a simpli-
fied factorized model which consists of sums of terms
defined for individual interactions whereas we obtain
products of simple interaction components. The ar-
gument that higher-order tensor models permit the in-
tegration of contextual background information was
also made in (Wermser et al., 2011).
There is a large literature on matrix completion
methods, which we apply to model the interactions
in the graphical model (Cands and Recht, 2008). In
particular, the winning entry to the NETFLIX com-
petition used matrix completion approaches (Takacs
et al., 2007; Bell et al., 2010). Tensor factoriza-
tion has become an area of growing interest. A re-
cent overview has been provided in (Kolda and Bader,
2009).
In (Yu et al., 2006; Salakhutdinov and Mnih,
2007) contextual information was included in matrix
completion approaches. A Gaussian noise model is
employed which is more suitable for modeling contin-
uous and ordinal quantities, such as a user score for a
movie, than for the likelihood of the existence of a re-
lation, as we are doing here. Also, those approaches
often have difficulties in situations where only posi-
tive examples for a relation are available; they need to
distinguish between true negatives (e.g., it is known
that a user does not like a movie) and missing in-
formation (e.g., it is unknown if a user likes a par-
ticular movie). Bernoulli and Gaussian sampling ap-
proaches have been pursued in (Chu et al., 2006; Chu
and Ghahramani, 2009).
3 RELATIONAL POPULATIONS,
GRAPHICAL STRUCTURES,
AND THE MULTINOMIAL
MODEL
In this section we describe the standard object-
centered sampling model and contrast it with the
relation-oriented sampling model used in this paper.
3.1 Standard Object-oriented Sampling
Assumption
Traditionally, statistical units, i.e. data points, are
associated with objects and statistical models con-
GRAPHICAL MODELS FOR RELATIONS - Modeling Relational Context
115