how much the article is relevant to the requesting user.
The relevance score is computed based on the behav-
ior of the requesting user, that is, by keeping track of
the articles read by the user, as well as of the feedback
the user provides to the recommendations received.
The popularity scores and the relevance scores are
then combined, obtaining a final score for each can-
didate article. Those articles with the highest final
scores are recommended to the user.
The basic relevance algorithm works as follows.
For each user, we keep track of the set of articles
she explicitely likes. When a user asks for personal-
ized news, we construct an appropriate data structure
which, based on the set of articles explicitely liked by
the user, is able to model the current interests of the
user. This data structure is compared with the content
of each candidate article, for which a relevance score
is computed.
The basic relevance algorithm gives very high
quality results, but its time complexity is high, since
it is proportional to the product of the number of can-
didate articles with the average number of distinct
words contained in an article. The practical conse-
quence is that the basic relevance algorithm is not
scalable: it is slow when the number of candidate ar-
ticles is high.
To address this scalability problem, we devised an
optimized relevance algorithm that gives the same re-
sults of the basic relevance algorithm, but has a faster
online response at the cost of minimal offline com-
putations. The (online) time complexity of the opti-
mized relevance algorithm is proportional to the prod-
uct of the number of candidate articles with the num-
ber of articles explicitely liked by the user.
The optimized recommendation algorithm has a
faster online response than the basic recommendation
algorithm. However, in practice the optimized rele-
vance algorithm is still not scalable: it is still slow
when the number of candidate articles is high.
To obtain a scalable algorithm, we show how the
optimized relevance algorithm can be turned into an
approximated relevance algorithm that is much faster
and needs the same minimal offline computations.
The (online) time complexity of the approximated rel-
evance algorithm is proportional to the number of ar-
ticles explicitely liked by the user. Consequently, the
approximated relevance algorithm is scalable, since
its response time is not sensitive to the number of can-
didate articles.
We have implemented a prototype of our rec-
ommender system using the Java programming lan-
guage. Such prototypeis currently fed by specific tags
posted on one of the largest European online newspa-
pers.
2 RELATED WORK
Several recommender systems for news articles have
been developed. NewsWeeder (Lang, 1995) is a
content-based recommender system for newsgroup
articles, which allows the user to rate articles on a
scale from 1 to 5. Using such ratings, NewsWeeder
builds a user model for predicting the rating of the
user on unseen articles. The unseen articles with the
highest predicted rating are recommended to the user.
The model is built by applying a combination of naive
Bayes classifiers with linear regression. NewsWeeder
needs to rebuild the user model every night with an
offline computation.
Krakatoa Chronicles (Bharat et al., 1998) is a
content-based recommender system for news articles
delivered as a Java applet. Based on the content of
the articles and past user ratings, Krakatoa Chroni-
cles computes, for each unseen article, a user score
and a community score. A weighted average of the
user score and community score produces a recom-
mendation score. The unseen articles with the highest
recommendation scores are recommended to the user.
The community score of an article is the average of
all the user scores of the article. When the number
of users is in the order of millions, as is common the
case, Krakatoa’s computation of the community score
is computationally expensive.
PersoNews (Banos et al., 2006) is a news reader
which filters unseen articles using a naive Bayes clas-
sifier. The classifier, which can be trained by user
feedback, labels articles as “interesting” or “not inter-
esting”. Interesting articles are recommended to the
user, while not interesting articles are not. PersoNews
also allows the user to monitor topics, which are mod-
elled as sets of keywords. An article belongsto a topic
if it contains one of the keywords of the topic.
Hermes (Borsje et al., 2008) is an ontology-based
news recommender system. A complex ontology
classifies articles in concept categories. The system
recommends to the user those articles belonging to
the concept categories selected by the user.
Google News (Das et al., 2007) is a news aggre-
gator and recommender system. By entering a list of
keywords, the user can retrieve a set of articles match-
ing the keywords. Furthermore, the user can asks for
personalized news, which are computed using algo-
rithms based on collaborative filtering.
3 ARTICLES
We denote with A the set of all articles. We assume
that A is finite.
ICAART 2011 - 3rd International Conference on Agents and Artificial Intelligence
190