Discussion. The optimal values for k are almost
identical for the three approaches; this is not surpris-
ing for GNMF and wSVD, as they are very similar
methods (their only difference lies in the additional
non-negativity constraints for GNMF). But this is in-
teresting for our ranking approach, and it seems that
explaining the users pairwise preferences is as dif-
ficult as explaining their ratings, as they require the
same number of hidden factors.
Although not equivalent, the ranking error used
for evaluation is closely related to the ranking error
optimized by our approach, while GNMF and wSVD
optimize a squared error measuring how well they
predict the ratings. This is why these primary results
are a bit disappointing, as we would have logically
expected our approach to have the best ranking er-
ror. The good performance of GNMF is not surpris-
ing considering that it already performed well (at least
better than wSVD) with respect to rating prediction
(Pessiot et al., 2006). Concerning wSVD, its ranking
error could be improved by increasing the number of
iterations, but the high algorithmic complexity makes
it difficult to use on real datasets such as MovieLens,
especially when the number of items is high. In our
experiments, we had to stop it after only 20 iterations
due to its extreme slowness. Besides, this wSVD is
also limited by its lack of regularization, which is usu-
ally used to avoid the overfitting problem.
Further directions need to be explored to complete
and improve those primary results. The first direction
concerns user level normalization: when we minimize
the sum of errors (the sum of squared errors for each
rating in GNMF and wSVD, the sum of ranking errors
for each pairwise preference in our approach), users
who have rated lots of items tend to be associated with
higher errors; thus the learning phase focuses on those
users, while ignoring the others. This problem can be
avoided if we give each user the same importance by
considering normalized errors, i.e. by dividing each
user’s error by the number of his pairwise preferences.
The mean ranking error we define for evaluation is in
fact a normalized error, as we only consider one test
pairwise preference for each user. This is why we
expect that learning with normalized errors will give
better experimental results.
A second direction we want to explore is a more
careful study of stopping criteria. We stopped GNMF
and our ranking approach after fixed numbers of iter-
ations, which seemed to correspond to empirical con-
vergence. In future experiments, we will rather stop
them when the training errors stop decreasing, which
will allow us a more thorough comparison of the three
methods with respect to the training time.
Another question we need to study concerns the
regularization. It is an important feature of a learn-
ing algorithm as it is used to prevent overfitting the
training data, thus avoiding bad predictions on unseen
data. In both GNMF and our ranking approach, µ
U
and µ
I
are the regularization terms. Setting µ
U
= µ
I
=
0 means no regularization; and the higher they are, the
more matrix norms are penalized. In our experiments
we fixed µ
U
= µ
I
for simplicity. By doing this, we im-
plicitly gave gave equal importance for each variable
of our model. In future works, we will study the exact
influence of those regularization terms, and how they
should be fixed.
Detailed Results. MRE results for several values of
the rank k:
k 7 8 9 10 11
GNMF 0.2696 0.2688 0.2658 0.2679 0.2684
k 5 6 7 8 9
wSVD 0.2847 0.2862 0.2803 0.2770 0.2786
k 6 7 8 9 10
Ranking 0.2752 0.2744 0.2737 0.2743 0.2753
4 CONCLUSION AND
PERSPECTIVES
The rating prediction approach is still actively used
and studied in collaborative filtering problems. Pro-
posed solutions come from various machine learning
fields such as classification, regression, clustering, di-
mensionality reduction or density estimation. Their
common approach is to decompose the recommen-
dation process into a rating prediction step, and the
recommendation step. But from the recommenda-
tion perspective, we think other alternatives than rat-
ing prediction should be considered. In this paper, we
proposed a new ranking approach for collaborative fil-
tering: instead of predicting the ratings as most meth-
ods do, we predict scores that respect pairwise pref-
erences betweens items, as we think correctly sorting
the items is more important than correctly predicting
their ratings. We proposed a new algorithm for rank-
ing prediction, defined a new evaluation protocol and
compared our approach to two rating prediction ap-
proaches. While the primary results are not as good
as we expected with respect to the mean ranking error,
we are confident they can be explained and improved
by studying user level normalization, convergence cri-
teria and regularization. We are planning to explore
the relations between collaborative filtering and other
tasks such as text analysis (e.g. text segmentation,
(Caillet et al., 2004) ) and multitask learning (Ando
and Zhang, 2005), in order to extend our work to other
ICEIS 2007 - International Conference on Enterprise Information Systems
150