AToMRS: A Tool to Monitor Recommender Systems

Andr

e Costa, Tiago Cunha and Carlos Soares

Faculdade de Engenharia da Universidade do Porto, Porto, Portugal

Keywords:

Recommender Systems, Collaborative Filtering, Evaluation.

Abstract:

Recommender systems arose in response to the excess of available online information. These systems assign,

to a given individual, suggestions of items that may be relevant. These system’s monitoring and evaluation

are fundamental to the proper functioning of many business related services. It is the goal of this paper to

create a tool capable of collecting, aggregating and supervising the results obtained from the recommendation

systems’ evaluation. To achieve this goal, a multi-granularity approach is developed and implemented in order

to organize the different levels of the problem. This tool also aims to tackle the lack of mechanisms to enable

visually assessment of the performance of a recommender systems’ algorithm. A functional prototype of the

application is presented, with the purpose of validating the solution’s concept.

1 INTRODUCTION

Currently there is a high demand for internet ser-

vices for both personal and professional reasons. This

growth in demand has also fueled the amount of infor-

mation available to meet the user needs. However, the

scale of available data has prevented the easy access

to the relevant information. To tackle this problem,

Recommender Systems (RSs) have been developed to

personalize the user experience in the demand for rel-

evant information. This allowed the reduction of the

user’s query effort, while enabling the enterprises eas-

ier promotion of their products (Wei et al., 2007).

The success of RS depends on the user experience.

This experience is, among other factors, dependent on

the RS performance (Herlocker et al., 2004). There-

fore, researchers have developed several evaluation

metrics to estimate the RS performance in order to

approximate the user experience. Unfortunately, RSs

are not infallible and it is of importance to continu-

ously evaluate their performance in order to enforce

the quality required (Tintarev and Masthoff, 2011).

The evaluation process of RS is not a simple task,

since there are several problems that must be dealt

with. On one hand, it is essential to choose appro-

priate data sampling strategies, algorithms, metrics

and evaluation procedures to ensure a valid evalua-

tion procedure. On the other, it is also important to

allow a continuous evaluation of the RS performance

to assess if the performance deteriorates across time.

In order to do so, a multi-granularity approach is

proposed to extend the current global evaluation pro-

cedure towards a more ﬁne grained analysis. This

means that instead of solely providing the average

performance results that a recommendation algorithm

obtains on the entire dataset, we split the data into

different layers and report the performance values of

all metrics on said layers. The layers used are: (1)

the dataset, (2) the data batch and (3) the user-item

relationship. The data batch is derived from the data

sampling strategy used to evaluate the RS. Although

in this work the focus lies on the analysis of batches

as each fold of the cross-validation procedure, the ap-

proach is designed to also handle online evaluation

procedures. The idea is to use the incremental data to

train and evaluate the models through a new batch.

Despite the existence of several evaluation plat-

forms for RS, none is found that allows such an

”atomic” analysis of the recommendation problem.

And very few provide an intuitive graphical visualiza-

tion tool to further help the RS practitioner to monitor

its performance. Therefore, this paper also presents

the prototype developed to establish proof of concept.

This paper is organized as follows: Section 2

presents the related work on RS and platforms to

evaluate and monitor their performance. Section 3

presents the multi-granularity approach and the pro-

totype. On Section 4 the entire proposal is discussed

and Section 5 presents the conclusions of the study

and highlights tasks for future work.

Costa, A., Cunha, T. and Soares, C.

AToMRS: A Tool to Monitor Recommender Systems.

DOI: 10.5220/0005992801330140

In Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2016) - Volume 1: KDIR, pages 133-140

ISBN: 978-989-758-203-5

133

2 RELATED WORK

2.1 Recommender Systems

RS emerged with the goal to help users navigate

through high volumes of information, by prepar-

ing personalized recommendations, content and ser-

vices (Adomavicius and Tuzhilin, 2005). RS gather

data regarding the user preferences towards the

items (Bobadilla et al., 2013) and use Machine Learn-

ing methods in order to provide the recommenda-

tions (Bagchi, 2015). In this context, the item can be

anything that the system recommends (for instance,

movies, music, ...) (Tintarev and Masthoff, 2011).

In the recommendation process, it is imperative to

take into account (1) the type of data available, (2)

the suitable recommendation techniques for the col-

lected data, (3) the recommendation algorithms, (4)

the recommendation target and (5) the scope of eval-

uation desired (Bobadilla et al., 2013). In this work

the focus lies on the Collaborative Filtering (CF) rec-

ommendation technique. Thorough surveys are avail-

able for the remaining techniques (Adomavicius and

Tuzhilin, 2005; Bobadilla et al., 2013).

2.1.1 Data

The data used in RS are mostly related to the user, the

item and the relationships between them (Tintarev and

Masthoff, 2011). Although there is a large variety of

data useful for RS, in CF the data is simply a value of

preference that the user assigned the item. This pref-

erence can be explicit (a numerical rating) or implicit

(unary or binary variable that ascertains the interest of

the user on the item) (Tintarev and Masthoff, 2011).

The data structure used in CF is known as the rat-

ing matrix R. It is described as R =U ×I, representing

a set of users U, where u ∈ {1...N} and a set of items

I, where i ∈ {1...M}. Each element of this matrix is

the numerical feedback provided by a user u relative

to an item i, represented by r

2.1.2 Strategy

CF methods are organized into memory-based and

model-based. While memory-based methods act only

on the rating matrix using heuristics to obtain the rec-

ommendations, model-based methods induce a model

with the data available to provide the recommenda-

tions (Su and Khoshgoftaar, 2009). Usually, memory-

based methods are based on nearest neighbors and

model-based methods on matrix factorization.

Nearest neighbor strategies have three main steps:

(1) calculate similarity among users/items, (2) ﬁnds

the k most similar neighbors and (3) predict the items

to be recommended accordingly to the preference val-

ues of neighbors (Su and Khoshgoftaar, 2009). Sev-

eral similarity metrics can be used in the process.

Some common metrics are Pearson correlation and

Cosine similarity (Sarwar et al., 2001).

Matrix factorization methods approximate the rat-

ing matrix values by the multiplication of at least two

matrices with latent features that capture the under-

lying data patterns (Koren, 2010). The computation

is iterative and optimizes an error metric. There are

several matrix factorization methods (Singular Value

Decomposition, Alternating Least Squares, Stochas-

tic Gradient Descent) (Hu et al., 2008).

2.1.3 Algorithms

There are several recommendation frameworks avail-

able. Due to its extensive amount and diversity of CF

methods, we chose the MyMediaLite (MML) frame-

work (Gantner et al., 2011). The algorithms are or-

ganized into two major CF tasks: Rating Prediction

(RP) and Item Recommendation (IR). Although the

data used is the same, the recommendation process is

different. The discussion of the algorithms falls out-

side the scope of this paper.

Nearest neighbor methods are available to both

RP and IR and are called ItemKNN and UserKNN.

The similarity metrics available are Pearson’s corre-

lation, Cosine similarity, Jaccard coefﬁcient, Condi-

tional probability, Bi-conditional probability and Co-

occurence.

Matrix Factorization methods are divided into RP

and IR. MML presents a total of 7 algorithms for RP

and other 7 for IR. Despite sharing the same nature,

the processes used are different. The discussion of

these methods is not approached in this document, but

they are presented in Table 1.

The baseline algorithms included in MML for RP

are the GlobalAverage, UserAverage, the Item Aver-

age and the UserItemBaseline. For IR, the only base-

line is MostPopular.

2.1.4 Evaluation

As was stated previously, the evaluation process in RS

is crucial and it must happen in several stages of the

system’s life cycle. During its development, the RS

must be validated to ensure that the best recommenda-

tion algorithm was chosen. This is achieved through

ofﬂine evaluation procedures, where several algo-

rithms are compared on the same dataset (Tintarev

and Masthoff, 2011). Afterwards, it is also critical to

evaluate the RS performance when it is online. Here,

KDIR 2016 - 8th International Conference on Knowledge Discovery and Information Retrieval

134

Table 1: Matrix Factorization methods in MyMediaLite.

Algorithm Type

Matrix Factorization RP

BiasedMatrixFactorization RP

LatentFeatureLogLinearModel RP

SVDPlusPlus RP

SigmoidItemAsymmetricFactor RP

SigmoidUserAsymmetricFactor RP

SigmoidCombinedAsymmetricFactor RP

BPRMF IR

BPRSLIM IR

LeastSquareSLIM IR

MultiCoreBPRMF IR

SoftMarginRankingMF IR

WeightedBPRMF IR

WRMF IR

real users feedback is used to compute online evalua-

tion metrics. Both evaluation strategies should ideally

be performed, since it does not necessarily mean that

a good ofﬂine performance will yield a good online

performance (Tintarev and Masthoff, 2011). How-

ever, this is sometimes unattainable due to the effort

required to collect online user feedback. In this work

the focus lies on ofﬂine evaluation, but prepares the

work for online evaluation.

In ofﬂine evaluation, the dataset must be parti-

tioned into training and test datasets. This is achieved

by splitting the rating matrix into different sets of

users and use their observations to either train or test

the RS model (Hahsler, 2011). There are several data

partitioning methods available: split, hold-out, leave-

one-out and k-fold cross-validation.

Afterwards, the algorithm is trained on the train-

ing data and its predictions are compared to the hid-

den values from the testing dataset. This comparison

is performed by using several evaluation metrics. The

evaluation metrics available are: (1) predictive preci-

sion, (2) classiﬁcation precision and (3) ranking pre-

cision (Herlocker et al., 2004).

Predictive precision measures how close is the

prediction to the actual user rating. It is based on

error metrics, such as Mean Average Error (MAE)

and Root Mean Squared Error (RMSE). Classiﬁcation

precision measures the frequency of correct decisions

made by the RS regarding the item utility. The metrics

are based on standard classiﬁcation precision metrics

such as precision, recall and accuracy. Ranking preci-

sion metrics are used to measure an algorithm’s abil-

ity to produce an ordered list of items that match the

order the user would have chosen (Herlocker et al.,

2004). The standard metrics for this task are Means

Average Precision (MAP), Normalized Discount Cu-

mulative Gain (NDCG) and Mean Reciprocal Rank

(MRR). Classiﬁcation precision metrics can also be

adapted to ranking problems by selecting the top N

elements of both the training and testing datasets, in-

stead of the entire set of items available (Herlocker

et al., 2004).

2.2 Recommendation Platforms

From the large amount of recommendation platform,

this work highlights the most popular: Apache Ma-

hout (Owen et al., 2011), LensKit (Ekstrand et al.,

2011) and MyMediaLite (MML) (Gantner et al.,

2011). Despite containing approximately the same

recommendation algorithms, there are differences re-

garding the data partitioning and evaluation proce-

dures.

Apache Mahout contains a large amount of recom-

mendation algorithms and provides distributed pro-

cessing for a few of them. However, the evaluation

procedure is poor: there is no cross-validation proce-

dure and the evaluation metrics are limited to the clas-

siﬁcation precision category. Lenskit provides only

basic algorithms and the metrics are based on predic-

tive precision. However, it allows a more suitable

evaluation procedure by providing cross-validation.

Lastly, MML arises as the more complete platform:

it contains the largest amount and variety of recom-

mendation algorithms, several data partitioning tech-

niques (including cross-validation) and provides suit-

able metrics for each CF task: predictive precision

metrics for RP and ranking precision metrics for IR.

A comparison of these platforms regarding these

three platforms is available (Said and Bellog

ın, 2014).

The study concludes that despite providing the most

computational resource consuming algorithms, MML

provides a superior performance. Further information

regarding the comparison of these platforms is pre-

sented in Table 2.

Table 2: Comparison of recommendation frameworks.

Mahout MML LensKit

Release date 04/2007 10/2010 03/2011

Last update 05/2015 12/2015 11/2015

Last version 0.10.1 3.11 2.2.1

Language Java C# Java

Platform JVM .NET JVM

Algorithms Classical SotA Classical

Distributed Partial No No

2.3 Monitoring Platforms

Despite the existence of several evaluation platforms

with integrated evaluation procedures, there are few

monitoring platforms for RS.

AToMRS: A Tool to Monitor Recommender Systems

135

A widely used tool to store, distribute and ana-

lyze Data Mining experimental data is OpenML (Van-

schoren et al., 2014). The repository is important

mainly for ofﬂine evaluation procedures, since sev-

eral researchers can provide their experimental data

for the same problem and compare the results with

others. There are also several visualization mecha-

nisms to perform an analysis of the gathered exper-

imental data and it allows an open and collaborative

discussion of the results obtained. However, this work

is not speciﬁcally focused on RS and it is not prepared

for a proper online monitoring of the recommendation

process.

One tool available in the literature regarding RS

monitoring for music recommendation (F

elix et al.,

2014). The RS provides recommendations for the

users, who are responsible to either blacklist them or

to add the musics to the library. Based on this feed-

back, the system performs online evaluation proce-

dures. The system monitors the positive and nega-

tive feedback provided and analyses it through several

facets: date, time of day, user gender and geographi-

cal distribution. These results are provided via visual-

ization mechanisms to better understand the previous

facets. The mechanisms allow the RS practitioner to

adjust the RS in order to maximize the effectiveness

of the recommendation process. The main difference

to the work proposed in this paper is the level of anal-

ysis that the system allows. The multi-granularity ap-

proach proposed will allow a deeper analysis of the

recommendation problem.

A platform for Website automation and monitor-

ing is also available (Domingues et al., 2008). The

platform aims to generically maintain Web pages via

Web adapters. The case study presented uses one of

these adapters as a RS and its performance is mon-

itored continuously. The developed monitoring tool,

EdMate, takes advantage of online evaluation metrics

and displays the information via numerical and graph-

ical mechanisms. The tool reports statistics regard-

ing the website usage, the recommendation adhesion

from the users, the recommendation efﬁcacy and the

time spent by the users on speciﬁc pages. Despite pro-

viding an extensive overview of the recommendation

process, this tool focus on analyzing several indica-

tors. The proposed approach in this paper, focuses

instead on providing a deeper analysis to the standard

evaluation procedure.

3 AToMRS

3.1 Multi-granularity Approach

The approach used in this work to enable a deeper

analysis of the RS evaluation procedure has its foun-

dation in Machine Learning data splitting strategies

used to train and test models. Figure 1 presents the

several levels that can be analyzed.

The process starts in the dataset level, with all the

available instances. Then, data is split to create the

training and testing datasets. The data splitting strat-

egy creates data batches, which can be used to an-

alyze different sub-problems of the original recom-

mendation problem. In the approach designed here,

the batches are folds in the cross-validation. Lastly,

the user-item < u, i > interaction level presents a sin-

gle instance of the original dataset. These are fun-

damentally different for the speciﬁc CF strategy: in

RP, the association is deﬁned by a triplet user, item

and rating value (i.e., < u, i, r >), while in IR a list

l = [i

, i

, ..., i

] of ordered items is assigned to each

user (i.e., < u, l >).

All these different facets of the original dataset

provide the different levels of analysis of the evalu-

ation procedure: the entire dataset, the data batch and

the user-item interaction.

Figure 1: Different levels of the multi-granularity approach.

Despite modeling the data batch as a fold of the of-

ﬂine cross-validation procedure, it is straightforward

to adapt the concept to online evaluation: the data

batches can be deﬁned by incremental data. Each new

incremental dataset is considered a new batch to the

problem. Here, the focus is to understand how new

data changes the RS performance in several levels of

analysis, instead of analyzing the different sub-dataset

used to train and validate the model. However, both

tasks are of the utmost importance and the approach

developed enables both seamlessly.

KDIR 2016 - 8th International Conference on Knowledge Discovery and Information Retrieval

136

3.2 Data Model

Since the multi-granularity approach developed is

based simply on data splitting techniques, the imple-

mentation of said approach is simply focused on the

design of a suitable database structure. The developed

database structure is presented in Figure 2. Please

note that the schema follows the NoSQL methodol-

ogy provided by Apache Cassandra.

For the CF problem, 3 tables were required due

to the differences in RP and IR tasks. One main ta-

ble Experiments describes the general information re-

garding the CF experiments and two tables refer to the

detailed data for the RP and IR tasks (UItem Rating

for RP and UItem Recom for IR).

Figure 2: Data model for multi-granularity approach.

The Experiments table stores the global perfor-

mance results, the start and end dates, the dataset, the

algorithm and its parameters. In this table, the experi-

ment and algorithm identiﬁer code are used as the pri-

mary key. The remaining tables, which are related to

the Experiments table, deﬁne the batch and the active

user identiﬁer codes.

However, due to the data structures used in

each CF task, there are differences in the remaining

ﬁelds. While UItem Rating contains the item identi-

ﬁer code and both the real and predicted ratings, in

UItem Recom the data stored are simply two lists: the

original and predicted item preference lists.

Data is stored in such way to enable to access

each level of the multi-granularity approach. This

means that if the target are the user-item interactions,

then these can be directly loaded from the database.

If, for instance, the goal is to analyze a batch, then

data is queried by a speciﬁc batch and joined to pro-

duce the ﬁnal results. Lastly, to analyze all instances

of a dataset, the query retrieves all instances of that

dataset.

This allows to apply the evaluation metrics dy-

namically to each set of instances upon request on the

prototype. The evaluation metrics implementation are

common for both the experimental work and for the

prototype visualization mechanisms.

3.3 Prototype

The prototype was developed as proof of concept for

the multi-granularity approach proposed. It has two

main modules: the recommendation framework and

the application. The architecture diagram is presented

in Figure 3.

Figure 3: System architecture.

The recommendation framework provides an in-

terface to MML and allows to create the CF exper-

iments. This interface was created by manipulation

of the original MML source code. To create an ex-

periment there are a few items to be addressed: the

dataset, the algorithms and the respective parameters.

At this point, only the cross-validation is enabled, but

future work will include other validation strategies.

The evaluation procedure is performed using in-

ternal MML data structures and methods. The results

from the experimental work are continuously being

stored in the database. These can be queried in the

future by the application to enable the analysis of re-

sults.

AToMRS: A Tool to Monitor Recommender Systems

137

The application provides a dashboard and several

visualization mechanisms that provide access to the

several levels of the multi-granularity approach. The

dashboard presents management options for experi-

ments (list, create and show). When a speciﬁc exper-

iment is selected, then all the dataset level metrics are

provided. By selecting a different metric, the results

are updated accordingly. Figure 4 presents this inter-

face.

Figure 4: Prototype dashboard.

The dataset level performance is displayed via a

bar graphic. The values are shown for a speciﬁc

metric and it allows to compare the average perfor-

mance of all algorithms used in the experiment. Fig-

ure 5 presents the respective visualization mecha-

nism. Here, the horizontal axis present the several al-

gorithms evaluated on a speciﬁc dataset and the verti-

cal axis the corresponding performance metric value.

It is also possible to analyze the results on a batch

level. One must select which algorithm is the target

for evaluation, and a list of all the batches perfor-

mance for the speciﬁc algorithm are provided. Here,

each bar in the graph represents a batch and the ver-

tical axis presents the respective performance metric

value. This allows to understand how the performance

varies depending on the batch. This representation is

also useful for the online evaluation paradigm. Fig-

ure 6 presents this visualization mechanism.

On the atomic level, i.e. the user-item interaction,

the visualization mechanism is different: instead of

using a bar graphic, the performance values are pre-

sented via a point dispersion graph with 3 axis (x,y,z)

representing user, item and performance value. In

this representation, if the values are superior to a pre-

speciﬁed threshold, a different color is assigned to

better distinguish the good from the bad results. Fig-

ure 7 displays this visualization mechanism.

4 DISCUSSION

The presented approach provides a deeper analysis of

the CF evaluation procedure, when compared to other

monitoring platforms. Furthermore, the work focuses

on a new perspective of data analysis, forgotten so far

by related works in RS. This section discusses several

aspects of both the multi-granularity approach and the

prototype.

In terms of the proposed approach to analyze the

recommendation problem on several levels, there are

two main points to discuss: the data batch structure

and the database usage.

While the data batch structure served its purpose

and enabled all the work developed, it has low sci-

entiﬁc value. Its merits lie on a more practical per-

spective. However, it is important to notice that this

structure can be re-used (and even improved) for other

Data Mining tasks that require low level analysis.

Secondly, the proposed method requires high

computational storage resources, especially if the

dataset scales dramatically. Also, in terms of com-

putational time, the approach introduces a signiﬁcant

overhead. This overhead is well worth if the dataset

used is small, because of the deeper analysis power

enabled. But it may become overwhelming and im-

practicable if the data scale increases beyond reason-

able values.

Regarding the prototype, the discussion points

highlighted are the visualization mechanisms and the

adaptation towards online evaluation.

The visualization mechanisms use mostly the bar

graphics to represent the performance values. The au-

thors believe that visualization mechanisms are suit-

able for the task at hand. On one hand, they present

an intuitive way to evaluate the performance results

and, on the other, it ﬁts directly to any level described

by the data model. Therefore, by reusing the same vi-

sualization mechanism, it promotes stability and does

not introduce a learning curve for the user. However,

this is a debatable position: it is possible that many

other visualizations with added value can also be in-

cluded. But the trade-off between usability and ease

of analysis needs to be taken into account.

The online evaluation adaptation, although taken

into consideration when designing the solution, is not

implemented yet. This requires a functional RS that

can receive feedback from real users. However, it is

the authors belief that on a practical perspective, the

problem can be modeled in a similar fashion. If so,

then the same solution developed can be directly ap-

plied.

5 CONCLUSIONS

This work proposed a multi-granularity approach to

allow a deeper analysis of a RS performance. This

approach allows to analyze the recommendation prob-

lem in 3 levels: the dataset, the batch and the user-

item interaction. This was achieved via a database

KDIR 2016 - 8th International Conference on Knowledge Discovery and Information Retrieval

138

Figure 5: Global performance values on dataset level.

Figure 6: Batch level performance values across batches.

Figure 7: User-item level performance values for a speciﬁc batch.

model that enables the storage and access of this hier-

archical data. In this work’s case study, the approach

was applied to ofﬂine evaluation, although it was de-

signed to also handle the online evaluation procedure.

A prototype was also developed to demonstrate

the applicability of the approach designed. The pro-

totype is responsible for experiment management op-

erations of the recommendation framework MyMedi-

AToMRS: A Tool to Monitor Recommender Systems

139

aLite. It also displays the performance results of

the multi-granularity approach using several mecha-

nisms.

Tasks for future work include an optimization of

the experiment management options (include setting

algorithm’s parameters, perform grid search, schedule

experiments), extend the work on visualization mech-

anisms (study other appropriate visualization rep-

resentations for the developed multi-granularity ap-

proach) and create a case study for online evaluation

to validate the assumptions stated in this work.

ACKNOWLEDGEMENTS

This work is ﬁnanced by the ERDF Fund through

the Operational Programme for Competitiveness and

Internationalization - COMPETE 2020 of Portugal

2020 through the National Innovation Agency (ANI)

as part of the project 3506 and also through project

POCI-01-0145-FEDER-006961 via National Funds

through the FCT – Fundac¸

ao para a Ci

encia e a Tec-

nologia as part of project UID/EEA/50014/2013. The

research was also funded from the ECSEL Joint Un-

dertaking, the framework programme for research

and innovation horizon 2020 (2014-2020) under grant

agreement 662189-MANTIS-2014-1.

REFERENCES

Adomavicius, G. and Tuzhilin, A. (2005). Toward the

next generation of recommender systems: A survey

of the state-of-the-art and possible extensions. IEEE

Transactions on Knowledge and Data Engineering,

17(6):734–749.

Bagchi, S. (2015). Performance and Quality Assessment of

Similarity Measures in Collaborative Filtering Using

Mahout. Procedia Computer Science, 50:229–234.

Bobadilla, J., Ortega, F., Hernando, a., and Guti

errez, a.

(2013). Recommender systems survey. Knowledge-

Based Systems, 46:109–132.

Domingues, M. A., Leal, J. P., Jorge, A. M., Soares, C., and

Machado, P. (2008). A platform to support web site

adaptation & monitoring of its effects: a case study.

Proceedings of the 6th Workshop on intelligent tech-

niques for web personalization & recommender sys-

tems (ITWP 08), (Mdc):29–36.

Ekstrand, M., Ludwig, M., Konstan, J., and Riedl, J. (2011).

Rethinking the Recommender Research Ecosystem:

Reproducibility, Openness, and LensKit. In ACM

Conference on Recommender Systems, pages 133–

140.

elix, C., Soares, C., Jorge, A., and Vinagre, J. (2014).

Monitoring Recommender Systems: A Business In-

telligence Approach. Computational Science and Its

Applications–ICCSA 2014, pages 277–288.

Gantner, Z., Rendle, S., Freudenthaler, C., and Schmidt-

Thieme, L. (2011). MyMediaLite: A Free Recom-

mender System Library. In ACM Conference on Rec-

ommender Systems, pages 305–308.

Hahsler, M. (2011). recommenderlab: A Framework

for Developing and Testing Recommendation Algo-

rithms. Nov, pages 1–37.

Herlocker, J. L., Konstan, J. a., Terveen, L. G., and Riedl,

J. T. (2004). Evaluating collaborative ﬁltering recom-

mender systems. ACM Transactions on Information

Systems, 22(1):5–53.

Hu, Y., Koren, Y., and Volinsky, C. (2008). Collaborative

Filtering for Implicit Feedback Datasets. In IEEE In-

ternational Conference on Data Mining, pages 263 –

272.

Koren, Y. (2010). Factor in the Neighbors: Scalable and

Accurate Collaborative Filtering. ACM Transactions

on Knowledge Discovery from Data, 4(1):1–24.

Owen, S., Anil, R., Dunning, T., and Friedman, E. (2011).

Mahout in Action. Manning Publications Co., Green-

wich, CT, USA.

Said, A. and Bellog

ın, A. (2014). RiVal – A Toolkit to Fos-

ter Reproducibility in Recommender System Evalua-

tion. ACM Conference on Recommender Systems.

Sarwar, B., Karypis, G., Konstan, J., and Riedl, J. (2001).

Item-Based Collaborative Filtering Recommendation

Algorithms. In International Conference on World

Wide Web, pages 285–295.

Su, X. and Khoshgoftaar, T. M. (2009). A Survey of Col-

laborative Filtering Techniques. Advances in Artiﬁcial

Intelligence, 2009:1–19.

Tintarev, N. and Masthoff, J. (2011). Recommender Systems

Handbook, volume 54.

Vanschoren, J., van Rijn, J. N., Bischl, B., and Torgo,

L. (2014). OpenML: networked science in machine

learning. ACM SIGKDD Explorations Newsletter,

15(2):49–60.

Wei, K., Huang, J., and Fu, S. (2007). A survey of e-

commerce recommender systems. In International

Conference on Service Systems and Service Manage-

ment, pages 734–738.

KDIR 2016 - 8th International Conference on Knowledge Discovery and Information Retrieval

140