Things You Might Not Know about the k-Nearest Neighbors Algorithm

Aleksandra Karpus, Marta Raczyńska, Adam Przybylek


Recommender Systems aim at suggesting potentially interesting items to a user. The most common kind of Recommender Systems is Collaborative Filtering which follows an intuition that users who liked the same things in the past, are more likely to be interested in the same things in the future. One of Collaborative Filtering methods is the k Nearest Neighbors algorithm which finds k users who are the most similar to an active user and then it computes recommendations based on the subset of users. The main aim of this paper is to compare two implementations of k Nearest Neighbors algorithm, i.e. from Mahout and LensKit libraries, as well as six similarity measures. We investigate how implementation differences between libraries influence optimal neighborhood size k and prediction error. We also show that measures like F1-score and nDCG are not always a good choice for choosing the best neighborhood size k. Finally, we compare different similarity measures according to the average time of generating recommendations and the prediction error.


Paper Citation