the pairwise similarity computations for the larger
clusters. The data points in the curves are average
over 5 repetitions and labelled with the number of
predefined clusters.
5 CONCLUSIONS AND
DISCUSSION
This paper suggests two methods for speeding up the
building of an item-based CF model for top-N
recommendations over implicit datasets. By splitting
items into clusters, and computing pairwise
similarities only for items within the same cluster,
we reduced the computation time dramatically. The
first approach based on LSH and the second on the
cardinality of the item consumption set. Our
experiments show that the cardinality approach
outperformed the LSH, resulting in no decrease in
precision while reducing the computation time up to
10% for the larger dataset.
The cardinality method somehow claims that
similar items also have similar popularity. Although
it might be true and make sense, it is true in our case
only because the similarity function we choose was
Jaccard coefficient which utilized exactly this
aspect. Means that item with low cardinality will
have very small similarity score to item with high
cardinality, if any, because the intersection between
them, the upper part of equation (1), will be close to
zero.
We hence suggest that in order to obtain an
efficient clustering method as pre-process step for
item-base model computation, there has to be some
resemblance between the clustering metric and the
proximity metric used for the items similarity.
Otherwise results may look a bit arbitrary, unless the
clustering method is completely generic as like the
above suggested LSH. For instance applying
content-based clustering such that movies items will
be grouped together according to their Genre may be
a good idea as a clustering method, if the proximity
metric which is used for the item-item similarity
considers the Genres of a movie as part of the
similarity computation. A successful clustering
method will not only be cheap, but also will
encapsulate a hint from the proximity metric which
is later used to calculate the similarity scores. We
therefore suggest that LSH clustering method, which
is not related at all to the similarity metric, is more
recommended if one cannot define a clustering
method which is somehow correlated with the
similarity metric.
An additional benefit of our methods is that the
item-pairs computation of each cluster can be done
in parallel, to further reduce the actual time required
for computing the item-based model.
REFERENCES
J. S. Breese. D. Heckerman, C. Kadie (1998). Empirical
analysis
of predictive algorithms for collaborative
filtering. UAI-98, 43–52.
D. Bridge, J. Kelleher (2002). Experiments in sparsity
reduction: Using clustering in collaborative
recommenders. In Artificial Intelligence and Cognitive
Science (pp. 144-149). Springer Berlin Heidelberg.
S. H. S Chee.(2000) RecTree: A Linear Collaborative
Filtering Algorithm. M.Sc Thesis. Simon Fraser
University.
P. Cremonesi , Y. Koren, R. Turrin (2010). Performance
of recommender algorithms on top-n recommendation
tasks. In Proc. 4th ACM Conference on Recommender
Systems, 39-46.
A. S. Das, M. Datar, A. Garg, S. Rajaram (2007). Google
news personalization: scalable online collaborative
filtering. In Proceedings of the 16th international
conference on World Wide Web (pp. 271-280). ACM.
P. Gionis, P. Indyk, R. Motwani (1999). Similarity search
in high dimensions via hashing. Proceedings of VLDB,
pp. 518–529.
J. L. Herlocker, J. A. Konstan, L. G Terveen, J. T. Riedl
(2004). Evaluating Collaborative Filtering
Recommender Systems. ACM Trans. Information
Systems, vol. 22, no. 1, pp. 5-53, 2004.
P. Jaccard (1901). Étude comparative de la distribution
florale dans une portion des Alpes et des Jura. Bulletin
de la Société Vaudoise des Sciences Naturelles 37:
547–579.
G. Karypis, V. Kumar (1998). A software package for
partitioning unstructured graphs, partitioning meshes,
and computing fill-reducing orderings of sparse
matrices. University of Minnesota, Department of
Computer Science and Engineering, Army HPC
Research Center, Minneapolis, MN.
C. Lin, G. R., Xue, H. J. Zeng, B. Zhang, and Wang, J.
(2014). U.S. Patent No. 8,738,467. Washington, DC:
U.S. Patent and Trademark Office.
G. Linden, B. Smith, J. York (2003). Amazon.com
recommendations: Item-to-item collaborative filtering.
IEEE Internet Computing, 7, 76-80.
B. Sarwar, G. Karypis, J. Konstan, J. Riedl. (2001). Item-
based collaborative filtering recommendation
algorithms. WWW10.
B. M. Sarwar, G. Karypis, J. Konstan, J. Riedl.(2002)
Recommender systems for large-scale e-commerce:
Scalable neighborhood formation using clustering.
In Proceedings of the fifth international conference on
computer and information technology (Vol. 1).
M. O’Connor, J. Herlocker (1999). Clustering items for
collaborative filtering. In Proceedings of the ACM
SIGIR workshop on recommender systems (Vol. 128).
ICAART2015-InternationalConferenceonAgentsandArtificialIntelligence
462