A RECOMMENDATION ALGORITHM FOR PERSONALIZED
ONLINE NEWS BASED ON COLLECTIVE INTELLIGENCE AND
CONTENT
Giovanni Giuffrida
Department of Social Sciences, University of Catania, Catania, Italy
Calogero G. Zarba
Neodata Intelligence s.r.l., Catania, Italy
Keywords:
Recommender systems, Text mining, Data mining.
Abstract:
We present a recommendation algorithm for online news based on collective intelligence and content. When
a user asks for personalized news, our algorithm recommends news articles that (i) are popular among the
members of the online community (the collective intelligence part), and (ii) are similar in content to the news
articles the user has read in the past (the content part).
Our algorithm computes its recomendations based on the collective behavior of the online users, as well as on
the feedback the users provide to the algorithm’s recommendations. The users’ feedback can moreover be used
to measure the effectiveness of our recomendation algorithm in terms of the information retrieval concepts of
precision and recall.
The cornerstone of our recommendation algorithm is a basic relevance algorithm that computes how relevant
a news article is to a given user. This basic relevance algorithm can be optimized in order to obtain a faster
online response at the cost of minimal offline computations. Moreover, it can be turned into an approximated
algorithm for an even faster online response.
1 INTRODUCTION
Online newspapers are blooming because the online
medium offers advantages that are simply not avail-
able in the printed medium. Online news can be de-
livered in real time. Online news can be paid indi-
vidually with micro payments. Online news can be
personalized.
With personalization, a user visiting an online
newspaper website can go to a personalized webpage.
Different users will see different versions of the per-
sonalized webpage. The content of the personalized
webpage is generated by using a recommender sys-
tem, which uses a recommendation algorithm (or a
combination thereof) in order to select articles specif-
ically targeted to the interests of the user.
In this paper, we present a recommendation algo-
rithm based on collective intelligence (Segaran, 2007)
and content (Pazzani and Billsus, 2007). When a user
asks for personalized news, we recommend those arti-
cles that (i) are popular among the members of the on-
line community (the collective intelligence part), and
(ii) are similar in content to the articles the user has
read in the past (the content part).
After our recommendation algorithm computes
the recommendations, the user may provide feedback
by stating, for each recommended article, whether she
likes the article or not. This feedback allows us to
measure the effectiveness of our recommendation al-
gorithm in terms of the information retrieval concepts
of precision and recall.
More in detail, our recommendation algorithm
works as follows. For each candidate article, we keep
an up-to-date popularity score, which measures the
current level of interest of the online community to
the article. The popularity score is computed based
on the collective behavior of the online users, that is,
by keeping track of which articles are read by the on-
line users in the recent past.
Each time a user asks for personalized news, we
use a basic relevance algorithm that computes, for
each candidate article, a relevancescore that measures
189
Giuffrida G. and G. Zarba C..
A RECOMMENDATION ALGORITHM FOR PERSONALIZED ONLINE NEWS BASED ON COLLECTIVE INTELLIGENCE AND CONTENT.
DOI: 10.5220/0003115401890194
In Proceedings of the 3rd International Conference on Agents and Artificial Intelligence (ICAART-2011), pages 189-194
ISBN: 978-989-8425-40-9
Copyright
c
2011 SCITEPRESS (Science and Technology Publications, Lda.)
how much the article is relevant to the requesting user.
The relevance score is computed based on the behav-
ior of the requesting user, that is, by keeping track of
the articles read by the user, as well as of the feedback
the user provides to the recommendations received.
The popularity scores and the relevance scores are
then combined, obtaining a final score for each can-
didate article. Those articles with the highest final
scores are recommended to the user.
The basic relevance algorithm works as follows.
For each user, we keep track of the set of articles
she explicitely likes. When a user asks for personal-
ized news, we construct an appropriate data structure
which, based on the set of articles explicitely liked by
the user, is able to model the current interests of the
user. This data structure is compared with the content
of each candidate article, for which a relevance score
is computed.
The basic relevance algorithm gives very high
quality results, but its time complexity is high, since
it is proportional to the product of the number of can-
didate articles with the average number of distinct
words contained in an article. The practical conse-
quence is that the basic relevance algorithm is not
scalable: it is slow when the number of candidate ar-
ticles is high.
To address this scalability problem, we devised an
optimized relevance algorithm that gives the same re-
sults of the basic relevance algorithm, but has a faster
online response at the cost of minimal offline com-
putations. The (online) time complexity of the opti-
mized relevance algorithm is proportional to the prod-
uct of the number of candidate articles with the num-
ber of articles explicitely liked by the user.
The optimized recommendation algorithm has a
faster online response than the basic recommendation
algorithm. However, in practice the optimized rele-
vance algorithm is still not scalable: it is still slow
when the number of candidate articles is high.
To obtain a scalable algorithm, we show how the
optimized relevance algorithm can be turned into an
approximated relevance algorithm that is much faster
and needs the same minimal offline computations.
The (online) time complexity of the approximated rel-
evance algorithm is proportional to the number of ar-
ticles explicitely liked by the user. Consequently, the
approximated relevance algorithm is scalable, since
its response time is not sensitive to the number of can-
didate articles.
We have implemented a prototype of our rec-
ommender system using the Java programming lan-
guage. Such prototypeis currently fed by specific tags
posted on one of the largest European online newspa-
pers.
2 RELATED WORK
Several recommender systems for news articles have
been developed. NewsWeeder (Lang, 1995) is a
content-based recommender system for newsgroup
articles, which allows the user to rate articles on a
scale from 1 to 5. Using such ratings, NewsWeeder
builds a user model for predicting the rating of the
user on unseen articles. The unseen articles with the
highest predicted rating are recommended to the user.
The model is built by applying a combination of naive
Bayes classifiers with linear regression. NewsWeeder
needs to rebuild the user model every night with an
offline computation.
Krakatoa Chronicles (Bharat et al., 1998) is a
content-based recommender system for news articles
delivered as a Java applet. Based on the content of
the articles and past user ratings, Krakatoa Chroni-
cles computes, for each unseen article, a user score
and a community score. A weighted average of the
user score and community score produces a recom-
mendation score. The unseen articles with the highest
recommendation scores are recommended to the user.
The community score of an article is the average of
all the user scores of the article. When the number
of users is in the order of millions, as is common the
case, Krakatoa’s computation of the community score
is computationally expensive.
PersoNews (Banos et al., 2006) is a news reader
which filters unseen articles using a naive Bayes clas-
sifier. The classifier, which can be trained by user
feedback, labels articles as “interesting” or “not inter-
esting”. Interesting articles are recommended to the
user, while not interesting articles are not. PersoNews
also allows the user to monitor topics, which are mod-
elled as sets of keywords. An article belongsto a topic
if it contains one of the keywords of the topic.
Hermes (Borsje et al., 2008) is an ontology-based
news recommender system. A complex ontology
classifies articles in concept categories. The system
recommends to the user those articles belonging to
the concept categories selected by the user.
Google News (Das et al., 2007) is a news aggre-
gator and recommender system. By entering a list of
keywords, the user can retrieve a set of articles match-
ing the keywords. Furthermore, the user can asks for
personalized news, which are computed using algo-
rithms based on collaborative filtering.
3 ARTICLES
We denote with A the set of all articles. We assume
that A is finite.
ICAART 2011 - 3rd International Conference on Agents and Artificial Intelligence
190
We model time with the set N. Given an article a,
we denote with date(a) the instant of time in which
a is published. It t date(a), then we let age
t
(a) =
t date(a).
Let t be an instant of time. We denote with A
t
the set of articles published at the instant of time t or
before. Formally, A
t
= {a A | date(a) t}.
3.1 Content of Articles
We denote with W the set of all words. We assume
that W is finite.
Let w be a word, and let a be an article. The TERM
FREQUENCY tf(w,a) of w in a is the number of times
w occurs in a.
Let A A be a set of articles, and let w be a word.
The INVERSE DOCUMENT FREQUENCY idf(w, A) of
w with respect to A is
idf(w, A) = log
|A|
1+ |A
w
|
,
where A
w
is the set of articles in A such that w occurs
in a.
Let w be a word, let a be an article, and let A be
a set of articles. The TERM FREQUENCY-INVERSE
DOCUMENT FREQUENCY tfidf(w, a,A) of w in a with
respect to A is
tfidf(w,a, A) = tf(w,a)idf (w, A).
Let a be an article, and let t be an instant of time.
We model the content of the article a relative to the
instant of time t with the function f
a
: W [0,1] de-
fined by letting
f
a
(w) = tfidf(w,a, A
date(a)
)
and
f
a
(w) =
f
a
(w)
| f
a
|
,
where
| f
a
| =
s
w
W
f
a
(w
)
2
.
3.2 Similarity between Articles
We model the similarity between articles with the
function σ : A × A [0, 1] defined by
σ(a,b) = f
a
× f
b
=
wW
f
a
(w) f
b
(w).
Notice that 0 σ(a,b) 1. Thus, when σ(a,b) is
close to 1, articles a and b are similar. When instead
σ(a,b) is close to 0, articles a and b are not similar.
4 USERS
In this section we present a model for describing the
behavior of online users, and for determining a set
of articles that are liked by a single user. The set of
articles liked by a single user will be then used by our
recommendation algorithm in order to compute the
recommendations.
We denote with U the set of all users. We assume
that U is finite.
Users may perform severalactions while browsing
online. In particular, they may read articles and they
may provide an explicit feedback about the articles
they have read or that have been suggested to them by
a recommender system.
4.1 Reading Actions
At each instant of time t, a user may read an article a.
More formally, there is a function reading : U × A ×
N { 0,1} such that reading(u,a,t) = 1 if and only
if the user u reads the article a at the instant of time t.
4.2 Feedback Actions
At each instant of time t, a user u may provide a feed-
back to an article a. More formally, there is a function
feedback : U × A × N {1,0, 1} such that:
if feedback(u,a,t) = 1 then, at the instant of
time t, the user u states that she does not like arti-
cle a;
if feedback(u,a,t) = 0 then, at the instant of time
t, the user u does not state any preference about
article a;
if feedback(u,a,t) = 1 then, at the instant of time
t, the user u states that she likes article a.
4.3 Liking Set
From the reading actions and the feedback actions, it
is possible to define, for each user u, the set liking
u,t
of
all articles that are explicitely liked by u at the instant
of time t.
Formally, we have a liking
u,t
if and only if there
exists an instant of time t
t such that:
feedback(u,a,t
) = 1 or reading(u,a,t
) = 1;
feedback(u,a,t
′′
) 6= 1, for all instants of time t
′′
such that t
< t
′′
t.
A RECOMMENDATION ALGORITHM FOR PERSONALIZED ONLINE NEWS BASED ON COLLECTIVE
INTELLIGENCE AND CONTENT
191
4.4 Profile
The liking sets can be used in order to construct pro-
files of users. These profiles are used by our recom-
mendation algorithm in order to compute the recom-
mendations.
Let u be a user, and let t be an instant of time. We
model the profile of the user u relative to the instant
of time t with the function g
u,t
: W [0,1] defined
by letting
g
u,t
(w) =
aliking
u,t
f
a
(w),
and
g
u,t
(w) =
g
u,t
(w)
|g
u,t
|
,
where
|g
u,t
| =
s
w
W
g
u,t
(w
)
2
.
5 RECOMMENDER SYSTEMS
Intuitively, at each instant of time t, a recommender
system recommends to a user u a set of articles in
A. Formally, we model a recommender system as
a computable function RS : U × N 2
A
such that
RS(u,t) A
t
.
In this section, we describe one such computable
function, that is, we describe our recommendation al-
gorithm. Furthermore, we show how the effectiveness
of a recommender system can be measured in terms
of the information retrieval concepts of precision and
recall.
5.1 The Algorithm
At each instant of time t, our recommendation algo-
rithm recommends articles from a set C
t
A
t
of can-
didate articles.
Given a user u and an instant of time t, our recom-
mendation algorithm performs the following steps:
Popularity Step. For each candidate article a C
t
,
compute popularity
t
(a).
Relevance Step. For each candidate article a C
t
,
compute relevance
t
(a,u).
Score Step. Normalize all popularity
t
(a), so that
they all lie from 0 to 1. Normalize all
relevance
t
(a,u), so that they all lie from 0 to 1.
For fixed weights p and r, compute score
t
(a,u) =
p× popularity
t
(a)+ q×relevance
t
(a,u), for each
candidate article a C
t
.
Recommendation Step. Recommend to user u the
top n candidate articles in C
t
with the highest
score.
5.2 Precision
To measure the precision of a recommender system
RS, we perform the following test repeatedly. At
some instant of time t, for a user u, we compute
RS(u,t). Then, at the instant of time t + 1 we mon-
itor the feedback given by the user u, that is, we ask
for feedback(u,a,t + 1), for each article a RS(u,t).
Formally, the precision of the recommender system
RS with respect to the user u and relative to the in-
stant of time t is given by
precision(RS,u,t) =
|RS(u,t) liking
u,t+1
|
|RS(u,t)|
.
5.3 Recall
To measure the recall of a recommender system RS
we perform the following test repeatedly. At some in-
stant of time t, for a user u, we compute RS(u,t), and
we compare the result obtained with the set liking
u,t
.
Formally, the recall of a recommender system RS
with respect to the user u and relative to the instant
of time t is given by
recall(RS,u,t) =
|RS(u,t) liking
u,t
|
|liking
u,t
|
.
6 POPULARITY
Given an article a and an instant of time t, we want to
compute a number popularity
t
(a) that measures the
level of interest of the community to article a at the
instant of time t.
We have developed three measures of popularity:
decaying popularities, discounted popularities, and
sliding window popularities.
6.1 Decaying Popularities
A reading event e consists of a user u reading an ar-
ticle a at an instant of time t. The instant of time in
which the event e occurs is denoted with date(e). If
t date(e), then we let age
t
(e) = t date(e).
We denote with R
a,t
the set of all events e in which
a user u reads the article a at instant of time t or be-
fore.
We assume that, for each instant of time t, a user
can read at most one article. It follows that R
a,t
is
a finite set. We also assume that an article may be
ICAART 2011 - 3rd International Conference on Agents and Artificial Intelligence
192
read only after it is published, that is, R
a,t
=
/
0 if t
date(a).
The decaying popularity π
h,t
(a) of an article a at
the instant of time t with respect to a decay half-time
h > 0 is defined by
π
h,t
(a) =
eR
a,t
1
2
age
t
(e)/h
.
6.2 Discounted Popularities
The discounting popularity π
t
(a) of an article a at the
instant of time t is defined by
π
t
(a) =
|R
a,t
|
age
t
(a)
.
6.3 Sliding Window Popularities
The sliding window popularity π
δ
t
(a) of an article a at
the instant of time t with respect to a sliding window
δ > 0 is defined by
π
δ
t
(a) =
|R
a,t
\ R
a,tδ
|
min(δ,age
t
(a))
.
7 RELEVANCE
Given a user u and an instant of time t, we want
to devise an algorithm that computes a number
relevance
t
(a,u), for each candidate article a C
t
. The
number relevance
t
(a,u) measures how relevant the
candidate article a is to user u at the instant of time
t.
We compute relevance
t
(a,u) using the age
age
t
(a) of article a at the instant of time t, the func-
tion f
a
representing the content of article a, and the
function g
u,t
representing the interests of user u at the
instant of time t.
We have developed three measures of relevance:
basic relevance, optimized relevance, and approxi-
mated relevance.
The optimized relevance is equal, modulo a posi-
tive factor that depends only on u and t, to the basic
relevance. The optimized relevance can be computed
faster than the basic relevance at the cost of some of-
fline computations. The approximated relevance can
be computed even faster and, as the name says, ap-
proximates the optimized relevance.
7.1 Basic Relevance
The DECAY FACTOR of an article a at the instant of
time t with respect to a decay half-time k is defined as
ω
k,t
(a) =
1
2
age
t
(a)/k
.
Let a be an article, let u be a user, and let t be an
instant of time. The BASIC RELEVANCE ρ
t
(a,u) of
article a with respect to user u at the instant of time t
is
ρ
t
(a,u) = ( f
a
× g
u,t
)ω
k,t
(a)
=
wW
( f
a
(w)g
u,t
(w))ω
k,t
(a).
Notice that 0 ρ
t
(a,u) 1. Thus, when ρ
t
(a,u)
is close to 1, article a is relevant to user u at the instant
of time t. When instead ρ
t
(a,u) is close to 0, article a
is not relevant to user u at the instant of time t.
Next, we analyze the time complexity of the ba-
sic relevance algorithm, that is, the time complexity
needed for computing the basic relevances of all arti-
cles in the candidate set C
t
. This computation can be
performed in two steps: first g
u,t
is computed, and
then ρ
t
(a,u) is computed, for all candidate articles
a C
t
.
Denote with µ the average number of words oc-
curring in an article. Then, the first step takes aver-
age time O(| liking
u,t
|µ). Given a candidate article a,
computing ρ
t
(a,u) takes average time O(µ). There-
fore, the second step takes average time O(|C
t
|µ).
Therefore, the average case time complexity of the
basic relevance algorithm is O((|liking
u,t
| + |C
t
|)µ).
Since in practice |liking
u,t
| < |C
t
|, this complexity can
be simplified to O(|C
t
|µ).
7.2 Optimized Relevance
Let a be an article, let u be a user, and let t be an in-
stant of time. The OPTIMIZED RELEVANCE ρ
t
(a,u)
of article a with respect to user u at the instant of time
t is
ρ
t
(a,u) =
bliking
u,t
σ(a,b)
ω
k,t
(a).
The optimized relevance is equal to the basic rel-
evance, modulo a factor that depends only on u and t.
More precisely, it is easy to verify that
ρ
t
(a,u) = βρ
t
(a,u),
where β = |g
u,t
|
1
.
Next, we analyze the time complexity of the opti-
mized relevance algorithm, that is, the time complex-
ity needed for computing the optimized relevances of
A RECOMMENDATION ALGORITHM FOR PERSONALIZED ONLINE NEWS BASED ON COLLECTIVE
INTELLIGENCE AND CONTENT
193
all articles in the candidate set C
t
. Optimized rele-
vances can be computed efficiently online if the sim-
ilarity σ(a,b) between articles has already been com-
puted offline. Indeed, given an article a, the compu-
tation of ρ
t
(a,u) can be done in time O(|liking
u,t
|).
Therefore, the time complexity of the optimized rele-
vance algorithm is O(|C
t
||liking
u,t
|).
In practice, |liking
u,t
| < µ, and therefore the op-
timized relevance algorithm is faster than the basic
relevance algorithm.
7.3 Approximated Relevance
In order to compute approximated relevances, we fix
a small positive constant m, so that m << |C
t
|. At
each instant of time t, for each article b A
t
, we keep
a set N
m
t
(b) containing the m articles a in A
t
with the
highest value of σ(a,b)ω
k,t
(a).
Let a be an article, let u be a user, and let t be an
instant of time. The m-APPROXIMATED RELEVANCE
ρ
m
t
(a,u) of article a with respect to user u at the in-
stant of time t is
ρ
m
t
(a,u) =
bliking
u,t
aN
m
t
(b)
σ(a,b)
ω
k,t
(a).
Note that, as m tends to infinite, the m-
approximated relevance tends to the optimized rele-
vance, that is
lim
m
ρ
m
t
(a,u) = ρ
t
(a,u).
Next, we analyze the time complexity of the
m-approximated relevance algorithm, that is, the
time complexity needed for computing the m-
approximated relevances of all articles in the candi-
date set C
t
.
Clearly, this complexity is equal to
O(m|liking
u,t
|). If m is assumed to be a small
constant, this complexity reduces to O(|liking
u,t
|).
It follows that the m-approximatedrelevance algo-
rithm is much faster than both the optimized relevance
algorithm and the basic relevance algorithm.
8 CONCLUSIONS
We have presented a recommendation algorithm for
online news based on collective intelligence and con-
tent. When a user asks for personalized news, we rec-
ommend those articles that (i) are popular among the
members of the online community (the collective in-
telligence part), and (ii) are similar in content to the
articles the user has read in the past (the content part).
Our algorithm computes its recomendations based
on the collective behavior of the online users, as well
as on the feedback the users provide to the algorithm’s
recommendations. The users’ feedback can moreover
be used to measure the effectiveness of our recomen-
dation algorithm in terms of the information retrieval
concepts of precision and recall.
The cornerstone of our recommendation system is
a basic relevance algorithm that computes how rel-
evant an article is to a given user. The basic rele-
vance algorithm gives very high quality results, but
is not scalable because it takes time proportional to
the product of the number of candidate articles with
the average number of distinct words contained in an
article.
To address the scalability problem, we devised an
optimized relevance algorithm that gives the same re-
sults of the basic relevance algorithm at the cost of
minimal offline computations. The optimized rele-
vance algorithm takes time proportional to product
of the number of candidate articles with the number
of articles explicitly liked by the user. It is therefore
faster than the basic relevance algorithm, but not yet
scalable.
To obtain a scalable algorithm, we have developed
an m-approximated relevance algorithm, where m is a
fixed constant. The m-approximated relevance algo-
rithm is scalable, and it takes time proportional to the
number of articles explicitly liked by the user.
REFERENCES
Banos, E., Katakis, I., Bassiliades, N., Tsoumakas, G., and
Vlahavas, I. P. (2006). PersoNews: A personalized
news reader enhanced by machine learning and se-
mantic filtering. In Ontologies, DataBases, and Ap-
plications of Semantics, pages 975–982.
Bharat, K., Kamba, T., and Albers, M. (1998). Personal-
ized, interactive news on the web. Multimedia Sys-
tems, 6(5):349–358.
Borsje, J., Levering, L., and Frasincar, F. (2008). Hermes: A
semantic web-based news decision support system. In
Symposium on Applied Computing, pages 2415–2420.
Das, A., Datar, M., Garg, A., and Rajaram, S. (2007).
Google news personalization: Scalable online collab-
orative filtering. In International Conference on World
Wide Web, pages 271–280.
Lang, K. (1995). Newsweeder: Learning to filter netnews.
In International Conference on Machine Learning,
pages 331–339.
Pazzani, M. J. and Billsus, D. (2007). Content-based rec-
ommendation systems. In The Adaptive Web, pages
325–341.
Segaran, T. (2007). Programming Collective Intelligence:
Building Smart Web 2.0 Applications. O’Reilly.
ICAART 2011 - 3rd International Conference on Agents and Artificial Intelligence
194