gory, as well as to learn characteristic items for each
user community, or to model community interests and
transitions among topics of interests. Experiments on
both the Netflix and Movielens data show the effec-
tiveness of the proposed model.
2 PRELIMINARIES AND
RELATED WORK
User’s preferences can be represented by using a
M × N rating matrix R, where M is the cardinality of
the user-set U = {u
1
,··· ,u
M
} and N is the cardinal-
ity of the item-set I = {i
1
,··· ,i
N
}. The rating value
associated to the pair hu,ii will be denoted as r
u
i
. Typi-
cally the number of users and items can be very large,
with M >> N, and preferences values fall within a
fixed integer range V = {1,··· ,V}, where 1 denote
the lower interest value. Users tend to express their
interest only on a restricted number of items; thus,
the rating matrix is characterized by an exceptional
sparseness factor (e.g more than 95%). Let δ(u,i) be
a rating-indicator function, which is equals to 1 if the
user u has rated/purchased the item i, zero otherwise.
Let I (u) denote the set of products rated by the user
u: I(u) = {i ∈ I : δ(u, i) = 1}; symmetrically, U(i)
denotes the set of users who have expressed their pref-
erence on the item i.
Latent Factor models are the most representative
and effective model-based approaches for CF. The un-
derlying assumption is that preference value associ-
ated to the pair hu,ii can be decomposed considering
a set of contributes which represent the interaction be-
tween the user and the target item on a set of features.
Assuming that there are a set of K features which de-
termine the user’s interest on an given item. The as-
sumption is that a rating is the result of the influence
of these feature to users and items: ˆr
u
i
=
∑
K
z=1
U
u,z
V
z,i
,
where U
u,z
is the response of the user u to the feature
z and V
z,i
is the response on the same feature of the
item i.
Several learning schema have been proposed to
overcome the sparsity of the original rating matrix
and to produce accurate models. The learning phase
may be implemented in a deterministic way, via gra-
dient descent (Funk, 2006) or, following a proba-
bilistic approach, maximizing the log-likelihood of
the model via the Expectation Maximization algo-
rithm. The latter leads to the definition of the As-
pect Model(Hofmann and Puzicha, 1999), known
also as pLSA. According to the user community vari-
ant, the rating value r is conditionally independent
of the user’s identity given her respective commu-
nity Z; thus, the probability of observing the rat-
ing value r for the pair hu,ii can be computed as
p(r|u, i) =
∑
K
z=1
p(r|i, z)p(z|u), where P(z|u) mea-
sures how much the preference values given by u fits
with the behavior of the community z and p(r|i,z) is
the probability that a user belonging to the community
z assigns a rating value r on i.
Only a few co-clustering approaches have been
proposed for CF data. An application of the weighted
Bregman coclustering (Scalable CC) to rating data is
discussed in (George and Merugu, 2005). The two-
sided clustering model for CF (Hofmann and Puzicha,
1999) is based on the strong assumption that each
person belongs to exactly one user-community and
each item belong to one groups of items, and fi-
nally the rating value is independent of the user and
item identities given their respective cluster member-
ships. Let C =
{
c
1
,··· ,c
k
}
be the user-clusters and
let c(u) : U → C be a function that maps each user to
the respective cluster. Similarly, let D =
{
d
1
,··· ,d
L
}
be a set of disjoint item-clusters, and d(i) : I → D
is the corresponding mapping function. According to
the two-sided clustering model, the probability of ob-
serving the preference value r conditioned to the pair
hu,ii is the following:
p(r|u, i, c(u) = c, d(i) = d) = p(r|c, d)
where p(r|c, d) are Bernoulli parameters and the clus-
ter membership are estimated by employing a varia-
tional inference approach.
The Flexible Mixture Model (FMM) (Jin et al.,
2006) extends the Aspect and the two sided model, by
allowing each user/item to belong to multiple clusters,
which are determined simultaneously, according to a
coclustering approach. Assuming the existence of K
user clusters indexed by c and L item clusters, indexed
by d, and let p(c
k
) be the probability of observing
the user-cluster k with p(u|c
k
) being the probability
of observing the user profile u given the cluster k and
using the same notations for the item-cluster, the joint
probability p(u,i,r) is defined as:
p(u,i,r) =
C
∑
c=1
D
∑
d=1
p(c)p(d)p(u|c)p(i|d)p(r|c,d)
The predicted rating associated to the pair hu,ii is then
computed as:
ˆr
u
i
=
V
∑
r=1
r
p(u,i,r)
∑
V
r
0
=1
p(u,i,r
0
)
The major drawback of the FMM relies on the com-
plexity of the training procedure, which is connected
with the computation of the probabilities p(c,d|u,i,r)
during the Expectation step.
A coclustering extension of the LDA(Blei et al., 2003)
CHARACTERIZING RELATIONSHIPS THROUGH CO-CLUSTERING - A Probabilistic Approach
65