1M dataset much faster than the Mahout library while
achieves similar prediction error values. However, the
Mahout library works pretty well on the sparse and
large Book-Crossing dataset.
7 CONCLUSIONS
This paper focuses on one of the most popular recom-
mendation algorithm, i.e. k Nearest Neighbors and
similarity measures that could be used with it. We
showed that evaluation measures for a ranking task,
i.e. precision, recall, F1-score or nDCG are not al-
ways a good choice for choosing the best neighbor-
hood size k. Better results could be obtained with
a simple prediction error like MAE. Simultaneously,
we identified differences in the kNN algorithm imple-
mentation between Mahout and LensKit libraries and
show that it influences the k value, which is much big-
ger for the Mahout library.
We compared different similarity measures ac-
cording to the average time of generating recommen-
dations and the prediction error MAE. The euclidean
similarity obtains the best results with the Mahout li-
brary according to considered criteria. The LensKit
library generates recommendations on the MovieLens
1M dataset much faster than the Mahout library while
achieves similar prediction error values. Neverthe-
less, the Mahout library is less time consuming than
the LensKit while run on sparse dataset, even if this
dataset is large.
REFERENCES
Aggarwal, C. C. (2016). Recommender Systems: The Text-
book. Springer Publishing Company, Incorporated,
1st edition.
Breese, J. S., Heckerman, D., and Kadie, C. (1998). Empir-
ical analysis of predictive algorithms for collaborative
filtering. In Proc. of the 14th Conf. on Uncertainty
in Artificial Intelligence, UAI’98, pages 43–52, San
Francisco, CA, USA. Morgan Kaufmann Publishers
Inc.
Desrosiers, C. and Karypis, G. (2011). A Comprehen-
sive Survey of Neighborhood-based Recommendation
Methods, pages 107–144. Springer US, Boston, MA.
Dunning, T. (1993). Accurate methods for the statistics of
surprise and coincidence. COMPUTATIONAL LIN-
GUISTICS, 19(1):61–74.
Good, N., Schafer, J. B., Konstan, J. A., Borchers, A., Sar-
war, B., Herlocker, J., and Riedl, J. (1999). Combining
collaborative filtering with personal agents for better
recommendations. In Proc. of the 16th National Conf.
on Artificial Intelligence and the 11th Conf. Innova-
tive Appl. of Artificial Intelligence, AAAI ’99/IAAI
’99, pages 439–446, Menlo Park, CA, USA. Ameri-
can Association for Artificial Intelligence.
Harper, F. M. and Konstan, J. A. (2015). The movielens
datasets: History and context. ACM Trans. Interact.
Intell. Syst., 5(4):19:1–19:19.
Herlocker, J., Konstan, J. A., and Riedl, J. (2002). An
empirical analysis of design choices in neighborhood-
based collaborative filtering algorithms. Information
Retrieval, 5(4):287–310.
Herlocker, J. L., Konstan, J. A., Borchers, A., and Riedl,
J. (1999). An algorithmic framework for performing
collaborative filtering. In Proc. of the 22Nd Annual
Int. ACM SIGIR Conf. on Research and Development
in Information Retrieval, SIGIR ’99, pages 230–237,
New York, NY, USA. ACM.
Jaccard, P. (1902). Lois de distribution florale dans la zone
alpine. Bulletin de la Soci
´
et
´
e vaudoise des sciences
naturelles, 38:69–130.
Jaccard, P. (1912). The distribution of flora in the alpine
zone. New Phytologist, 11:37 – 50.
Jannach, D., Zanker, M., Felfernig, A., and Friedrich, G.
(2010). Recommender Systems: An Introduction.
Cambridge University Press, New York, NY, USA, 1st
edition.
Kim, J. K. and Cho, Y. H. (2003). Using web usage min-
ing and svd to improve e-commerce recommendation
quality. In Lee, J. and Barley, M., editors, Intelligent
Agents and Multi-Agent Systems, pages 86–97, Berlin,
Heidelberg. Springer Berlin Heidelberg.
Pearson, K. (1895). Note on regression and inheritance in
the case of two parents. Proc. of the Royal Society of
London, 58:240–242.
Singhal, A. (2001). Modern information retrieval: a brief
overview. Bulletin of the IEEE Computer Society
Technical Committee on Data Engineering, 24(4):35–
42.
Soper, H. E., Young, A. W., Cave, B. M., Lee, A., and
Pearson, K. (1917). On the distribution of the corre-
lation coefficient in small samples. appendix ii to the
papers of ”student” and r. a. fisher. a cooperative study.
Biometrika, 11(4):328–413.
Symeonidis, P., Ntempos, D., and Manolopoulos, Y. (2014).
Recommender Systems for Location-based Social Net-
works.
Ziegler, C.-N., McNee, S. M., Konstan, J. A., and Lausen,
G. (2005). Improving recommendation lists through
topic diversification. In Proc. of the 14th Int. Conf. on
World Wide Web, WWW ’05, pages 22–32, New York,
NY, USA. ACM.
Things You Might Not Know about the k-Nearest Neighbors Algorithm
547