A RECOMMENDATION ALGORITHM FOR PERSONALIZED

ONLINE NEWS BASED ON COLLECTIVE INTELLIGENCE AND

CONTENT

Giovanni Giuffrida

Department of Social Sciences, University of Catania, Catania, Italy

Calogero G. Zarba

Neodata Intelligence s.r.l., Catania, Italy

Keywords:

Recommender systems, Text mining, Data mining.

Abstract:

We present a recommendation algorithm for online news based on collective intelligence and content. When

a user asks for personalized news, our algorithm recommends news articles that (i) are popular among the

members of the online community (the collective intelligence part), and (ii) are similar in content to the news

articles the user has read in the past (the content part).

Our algorithm computes its recomendations based on the collective behavior of the online users, as well as on

the feedback the users provide to the algorithm’s recommendations. The users’ feedback can moreover be used

to measure the effectiveness of our recomendation algorithm in terms of the information retrieval concepts of

precision and recall.

The cornerstone of our recommendation algorithm is a basic relevance algorithm that computes how relevant

a news article is to a given user. This basic relevance algorithm can be optimized in order to obtain a faster

online response at the cost of minimal ofﬂine computations. Moreover, it can be turned into an approximated

algorithm for an even faster online response.

1 INTRODUCTION

Online newspapers are blooming because the online

medium offers advantages that are simply not avail-

able in the printed medium. Online news can be de-

livered in real time. Online news can be paid indi-

vidually with micro payments. Online news can be

personalized.

With personalization, a user visiting an online

newspaper website can go to a personalized webpage.

Different users will see different versions of the per-

sonalized webpage. The content of the personalized

webpage is generated by using a recommender sys-

tem, which uses a recommendation algorithm (or a

combination thereof) in order to select articles specif-

ically targeted to the interests of the user.

In this paper, we present a recommendation algo-

rithm based on collective intelligence (Segaran, 2007)

and content (Pazzani and Billsus, 2007). When a user

asks for personalized news, we recommend those arti-

cles that (i) are popular among the members of the on-

line community (the collective intelligence part), and

(ii) are similar in content to the articles the user has

read in the past (the content part).

After our recommendation algorithm computes

the recommendations, the user may provide feedback

by stating, for each recommended article, whether she

likes the article or not. This feedback allows us to

measure the effectiveness of our recommendation al-

gorithm in terms of the information retrieval concepts

of precision and recall.

More in detail, our recommendation algorithm

works as follows. For each candidate article, we keep

an up-to-date popularity score, which measures the

current level of interest of the online community to

the article. The popularity score is computed based

on the collective behavior of the online users, that is,

by keeping track of which articles are read by the on-

line users in the recent past.

Each time a user asks for personalized news, we

use a basic relevance algorithm that computes, for

each candidate article, a relevancescore that measures

189

Giuffrida G. and G. Zarba C..

A RECOMMENDATION ALGORITHM FOR PERSONALIZED ONLINE NEWS BASED ON COLLECTIVE INTELLIGENCE AND CONTENT.

DOI: 10.5220/0003115401890194

In Proceedings of the 3rd International Conference on Agents and Artiﬁcial Intelligence (ICAART-2011), pages 189-194

ISBN: 978-989-8425-40-9

 2011 SCITEPRESS (Science and Technology Publications, Lda.)

how much the article is relevant to the requesting user.

The relevance score is computed based on the behav-

ior of the requesting user, that is, by keeping track of

the articles read by the user, as well as of the feedback

the user provides to the recommendations received.

The popularity scores and the relevance scores are

then combined, obtaining a ﬁnal score for each can-

didate article. Those articles with the highest ﬁnal

scores are recommended to the user.

The basic relevance algorithm works as follows.

For each user, we keep track of the set of articles

she explicitely likes. When a user asks for personal-

ized news, we construct an appropriate data structure

which, based on the set of articles explicitely liked by

the user, is able to model the current interests of the

user. This data structure is compared with the content

of each candidate article, for which a relevance score

is computed.

The basic relevance algorithm gives very high

quality results, but its time complexity is high, since

it is proportional to the product of the number of can-

didate articles with the average number of distinct

words contained in an article. The practical conse-

quence is that the basic relevance algorithm is not

scalable: it is slow when the number of candidate ar-

ticles is high.

To address this scalability problem, we devised an

optimized relevance algorithm that gives the same re-

sults of the basic relevance algorithm, but has a faster

online response at the cost of minimal ofﬂine com-

putations. The (online) time complexity of the opti-

mized relevance algorithm is proportional to the prod-

uct of the number of candidate articles with the num-

ber of articles explicitely liked by the user.

The optimized recommendation algorithm has a

faster online response than the basic recommendation

algorithm. However, in practice the optimized rele-

vance algorithm is still not scalable: it is still slow

when the number of candidate articles is high.

To obtain a scalable algorithm, we show how the

optimized relevance algorithm can be turned into an

approximated relevance algorithm that is much faster

and needs the same minimal ofﬂine computations.

The (online) time complexity of the approximated rel-

evance algorithm is proportional to the number of ar-

ticles explicitely liked by the user. Consequently, the

approximated relevance algorithm is scalable, since

its response time is not sensitive to the number of can-

didate articles.

We have implemented a prototype of our rec-

ommender system using the Java programming lan-

guage. Such prototypeis currently fed by speciﬁc tags

posted on one of the largest European online newspa-

pers.

2 RELATED WORK

Several recommender systems for news articles have

been developed. NewsWeeder (Lang, 1995) is a

content-based recommender system for newsgroup

articles, which allows the user to rate articles on a

scale from 1 to 5. Using such ratings, NewsWeeder

builds a user model for predicting the rating of the

user on unseen articles. The unseen articles with the

highest predicted rating are recommended to the user.

The model is built by applying a combination of naive

Bayes classiﬁers with linear regression. NewsWeeder

needs to rebuild the user model every night with an

ofﬂine computation.

Krakatoa Chronicles (Bharat et al., 1998) is a

content-based recommender system for news articles

delivered as a Java applet. Based on the content of

the articles and past user ratings, Krakatoa Chroni-

cles computes, for each unseen article, a user score

and a community score. A weighted average of the

user score and community score produces a recom-

mendation score. The unseen articles with the highest

recommendation scores are recommended to the user.

The community score of an article is the average of

all the user scores of the article. When the number

of users is in the order of millions, as is common the

case, Krakatoa’s computation of the community score

is computationally expensive.

PersoNews (Banos et al., 2006) is a news reader

which ﬁlters unseen articles using a naive Bayes clas-

siﬁer. The classiﬁer, which can be trained by user

feedback, labels articles as “interesting” or “not inter-

esting”. Interesting articles are recommended to the

user, while not interesting articles are not. PersoNews

also allows the user to monitor topics, which are mod-

elled as sets of keywords. An article belongsto a topic

if it contains one of the keywords of the topic.

Hermes (Borsje et al., 2008) is an ontology-based

news recommender system. A complex ontology

classiﬁes articles in concept categories. The system

recommends to the user those articles belonging to

the concept categories selected by the user.

Google News (Das et al., 2007) is a news aggre-

gator and recommender system. By entering a list of

keywords, the user can retrieve a set of articles match-

ing the keywords. Furthermore, the user can asks for

personalized news, which are computed using algo-

rithms based on collaborative ﬁltering.

3 ARTICLES

We denote with A the set of all articles. We assume

that A is ﬁnite.

ICAART 2011 - 3rd International Conference on Agents and Artificial Intelligence

190

We model time with the set N. Given an article a,

we denote with date(a) the instant of time in which

a is published. It t ≥ date(a), then we let age

(a) =

t − date(a).

Let t be an instant of time. We denote with A

the set of articles published at the instant of time t or

before. Formally, A

= {a ∈ A | date(a) ≤ t}.

3.1 Content of Articles

We denote with W the set of all words. We assume

that W is ﬁnite.

Let w be a word, and let a be an article. The TERM

FREQUENCY tf(w,a) of w in a is the number of times

w occurs in a.

Let A ⊆ A be a set of articles, and let w be a word.

The INVERSE DOCUMENT FREQUENCY idf(w, A) of

w with respect to A is

idf(w, A) = log

|A|

1+ |A

where A

is the set of articles in A such that w occurs

in a.

Let w be a word, let a be an article, and let A be

a set of articles. The TERM FREQUENCY-INVERSE

DOCUMENT FREQUENCY tﬁdf(w, a,A) of w in a with

respect to A is

tﬁdf(w,a, A) = tf(w,a)idf (w, A).

Let a be an article, and let t be an instant of time.

We model the content of the article a relative to the

instant of time t with the function f

: W → [0,1] de-

ﬁned by letting

′

(w) = tﬁdf(w,a, A

date(a)

)

and

(w) =

′

(w)

| f

′

where

| f

′

| =

∑

′

∈W



′

)



3.2 Similarity between Articles

We model the similarity between articles with the

function σ : A × A → [0, 1] deﬁned by

σ(a,b) = f

× f

∑

w∈W

(w) f

(w).

Notice that 0 ≤ σ(a,b) ≤ 1. Thus, when σ(a,b) is

close to 1, articles a and b are similar. When instead

σ(a,b) is close to 0, articles a and b are not similar.

4 USERS

In this section we present a model for describing the

behavior of online users, and for determining a set

of articles that are liked by a single user. The set of

articles liked by a single user will be then used by our

recommendation algorithm in order to compute the

recommendations.

We denote with U the set of all users. We assume

that U is ﬁnite.

Users may perform severalactions while browsing

online. In particular, they may read articles and they

may provide an explicit feedback about the articles

they have read or that have been suggested to them by

a recommender system.

4.1 Reading Actions

At each instant of time t, a user may read an article a.

More formally, there is a function reading : U × A ×

N → { 0,1} such that reading(u,a,t) = 1 if and only

if the user u reads the article a at the instant of time t.

4.2 Feedback Actions

At each instant of time t, a user u may provide a feed-

back to an article a. More formally, there is a function

feedback : U × A × N → {−1,0, 1} such that:

• if feedback(u,a,t) = −1 then, at the instant of

time t, the user u states that she does not like arti-

cle a;

• if feedback(u,a,t) = 0 then, at the instant of time

t, the user u does not state any preference about

article a;

• if feedback(u,a,t) = 1 then, at the instant of time

t, the user u states that she likes article a.

4.3 Liking Set

From the reading actions and the feedback actions, it

is possible to deﬁne, for each user u, the set liking

u,t

all articles that are explicitely liked by u at the instant

of time t.

Formally, we have a ∈ liking

u,t

if and only if there

exists an instant of time t

′

≤ t such that:

• feedback(u,a,t

′

) = 1 or reading(u,a,t

′

) = 1;

• feedback(u,a,t

′′

) 6= −1, for all instants of time t

′′

such that t

′

< t

′′

≤ t.

A RECOMMENDATION ALGORITHM FOR PERSONALIZED ONLINE NEWS BASED ON COLLECTIVE

INTELLIGENCE AND CONTENT

191

4.4 Proﬁle

The liking sets can be used in order to construct pro-

ﬁles of users. These proﬁles are used by our recom-

mendation algorithm in order to compute the recom-

mendations.

Let u be a user, and let t be an instant of time. We

model the proﬁle of the user u relative to the instant

of time t with the function g

u,t

: W → [0,1] deﬁned

by letting

′

u,t

(w) =

∑

a∈liking

u,t

(w),

and

u,t

(w) =

′

u,t

(w)

′

u,t

where

′

u,t

| =

∑

′

∈W



′

u,t

′

)



5 RECOMMENDER SYSTEMS

Intuitively, at each instant of time t, a recommender

system recommends to a user u a set of articles in

A. Formally, we model a recommender system as

a computable function RS : U × N → 2

such that

RS(u,t) ⊆ A

In this section, we describe one such computable

function, that is, we describe our recommendation al-

gorithm. Furthermore, we show how the effectiveness

of a recommender system can be measured in terms

of the information retrieval concepts of precision and

recall.

5.1 The Algorithm

At each instant of time t, our recommendation algo-

rithm recommends articles from a set C

⊆ A

of can-

didate articles.

Given a user u and an instant of time t, our recom-

mendation algorithm performs the following steps:

Popularity Step. For each candidate article a ∈ C

compute popularity

(a).

Relevance Step. For each candidate article a ∈ C

compute relevance

(a,u).

Score Step. Normalize all popularity

(a), so that

they all lie from 0 to 1. Normalize all

relevance

(a,u), so that they all lie from 0 to 1.

For ﬁxed weights p and r, compute score

(a,u) =

p× popularity

(a)+ q×relevance

(a,u), for each

candidate article a ∈ C

Recommendation Step. Recommend to user u the

top n candidate articles in C

with the highest

score.

5.2 Precision

To measure the precision of a recommender system

RS, we perform the following test repeatedly. At

some instant of time t, for a user u, we compute

RS(u,t). Then, at the instant of time t + 1 we mon-

itor the feedback given by the user u, that is, we ask

for feedback(u,a,t + 1), for each article a ∈ RS(u,t).

Formally, the precision of the recommender system

RS with respect to the user u and relative to the in-

stant of time t is given by

precision(RS,u,t) =

|RS(u,t) ∩ liking

u,t+1

|RS(u,t)|

5.3 Recall

To measure the recall of a recommender system RS

we perform the following test repeatedly. At some in-

stant of time t, for a user u, we compute RS(u,t), and

we compare the result obtained with the set liking

u,t

Formally, the recall of a recommender system RS

with respect to the user u and relative to the instant

of time t is given by

recall(RS,u,t) =

|RS(u,t) ∩ liking

u,t

|liking

u,t

6 POPULARITY

Given an article a and an instant of time t, we want to

compute a number popularity

(a) that measures the

level of interest of the community to article a at the

instant of time t.

We have developed three measures of popularity:

decaying popularities, discounted popularities, and

sliding window popularities.

6.1 Decaying Popularities

A reading event e consists of a user u reading an ar-

ticle a at an instant of time t. The instant of time in

which the event e occurs is denoted with date(e). If

t ≥ date(e), then we let age

(e) = t − date(e).

We denote with R

a,t

the set of all events e in which

a user u reads the article a at instant of time t or be-

fore.

We assume that, for each instant of time t, a user

can read at most one article. It follows that R

a,t

a ﬁnite set. We also assume that an article may be

ICAART 2011 - 3rd International Conference on Agents and Artificial Intelligence

192

read only after it is published, that is, R

a,t

0 if t ≤

date(a).

The decaying popularity π

h,t

(a) of an article a at

the instant of time t with respect to a decay half-time

h > 0 is deﬁned by

h,t

(a) =

∑

e∈R

a,t





age

(e)/h

6.2 Discounted Popularities

The discounting popularity π

∞

(a) of an article a at the

instant of time t is deﬁned by

∞

(a) =

a,t

age

(a)

6.3 Sliding Window Popularities

The sliding window popularity π

(a) of an article a at

the instant of time t with respect to a sliding window

δ > 0 is deﬁned by

(a) =

a,t

\ R

a,t−δ

min(δ,age

(a))

7 RELEVANCE

Given a user u and an instant of time t, we want

to devise an algorithm that computes a number

relevance

(a,u), for each candidate article a ∈ C

. The

number relevance

(a,u) measures how relevant the

candidate article a is to user u at the instant of time

We compute relevance

(a,u) using the age

age

(a) of article a at the instant of time t, the func-

tion f

representing the content of article a, and the

function g

u,t

representing the interests of user u at the

instant of time t.

We have developed three measures of relevance:

basic relevance, optimized relevance, and approxi-

mated relevance.

The optimized relevance is equal, modulo a posi-

tive factor that depends only on u and t, to the basic

relevance. The optimized relevance can be computed

faster than the basic relevance at the cost of some of-

ﬂine computations. The approximated relevance can

be computed even faster and, as the name says, ap-

proximates the optimized relevance.

7.1 Basic Relevance

The DECAY FACTOR of an article a at the instant of

time t with respect to a decay half-time k is deﬁned as

k,t

(a) =





age

(a)/k

Let a be an article, let u be a user, and let t be an

instant of time. The BASIC RELEVANCE ρ

(a,u) of

article a with respect to user u at the instant of time t

(a,u) = ( f

× g

u,t

)ω

k,t

(a)

∑

w∈W

( f

(w)g

u,t

(w))ω

k,t

(a).

Notice that 0 ≤ ρ

(a,u) ≤ 1. Thus, when ρ

(a,u)

is close to 1, article a is relevant to user u at the instant

of time t. When instead ρ

(a,u) is close to 0, article a

is not relevant to user u at the instant of time t.

Next, we analyze the time complexity of the ba-

sic relevance algorithm, that is, the time complexity

needed for computing the basic relevances of all arti-

cles in the candidate set C

. This computation can be

performed in two steps: ﬁrst g

u,t

is computed, and

then ρ

(a,u) is computed, for all candidate articles

a ∈ C

Denote with µ the average number of words oc-

curring in an article. Then, the ﬁrst step takes aver-

age time O(| liking

u,t

|µ). Given a candidate article a,

computing ρ

(a,u) takes average time O(µ). There-

fore, the second step takes average time O(|C

|µ).

Therefore, the average case time complexity of the

basic relevance algorithm is O((|liking

u,t

| + |C

|)µ).

Since in practice |liking

u,t

| < |C

|, this complexity can

be simpliﬁed to O(|C

|µ).

7.2 Optimized Relevance

Let a be an article, let u be a user, and let t be an in-

stant of time. The OPTIMIZED RELEVANCE ρ

∞

(a,u)

of article a with respect to user u at the instant of time

t is

∞

(a,u) =





∑

b∈liking

u,t

σ(a,b)





k,t

(a).

The optimized relevance is equal to the basic rel-

evance, modulo a factor that depends only on u and t.

More precisely, it is easy to verify that

(a,u) = βρ

∞

(a,u),

where β = |g

′

u,t

−1

Next, we analyze the time complexity of the opti-

mized relevance algorithm, that is, the time complex-

ity needed for computing the optimized relevances of

A RECOMMENDATION ALGORITHM FOR PERSONALIZED ONLINE NEWS BASED ON COLLECTIVE

INTELLIGENCE AND CONTENT

193

all articles in the candidate set C

. Optimized rele-

vances can be computed efﬁciently online if the sim-

ilarity σ(a,b) between articles has already been com-

puted ofﬂine. Indeed, given an article a, the compu-

tation of ρ

∞

(a,u) can be done in time O(|liking

u,t

|).

Therefore, the time complexity of the optimized rele-

vance algorithm is O(|C

||liking

u,t

|).

In practice, |liking

u,t

| < µ, and therefore the op-

timized relevance algorithm is faster than the basic

relevance algorithm.

7.3 Approximated Relevance

In order to compute approximated relevances, we ﬁx

a small positive constant m, so that m << |C

|. At

each instant of time t, for each article b ∈ A

, we keep

a set N

(b) containing the m articles a in A

with the

highest value of σ(a,b)ω

k,t

(a).

Let a be an article, let u be a user, and let t be an

instant of time. The m-APPROXIMATED RELEVANCE

(a,u) of article a with respect to user u at the in-

stant of time t is

(a,u) =







∑

b∈liking

u,t

a∈N

(b)

σ(a,b)







k,t

(a).

Note that, as m tends to inﬁnite, the m-

approximated relevance tends to the optimized rele-

vance, that is

lim

m→∞

(a,u) = ρ

∞

(a,u).

Next, we analyze the time complexity of the

m-approximated relevance algorithm, that is, the

time complexity needed for computing the m-

approximated relevances of all articles in the candi-

date set C

Clearly, this complexity is equal to

O(m|liking

u,t

|). If m is assumed to be a small

constant, this complexity reduces to O(|liking

u,t

|).

It follows that the m-approximatedrelevance algo-

rithm is much faster than both the optimized relevance

algorithm and the basic relevance algorithm.

8 CONCLUSIONS

We have presented a recommendation algorithm for

online news based on collective intelligence and con-

tent. When a user asks for personalized news, we rec-

ommend those articles that (i) are popular among the

members of the online community (the collective in-

telligence part), and (ii) are similar in content to the

articles the user has read in the past (the content part).

Our algorithm computes its recomendations based

on the collective behavior of the online users, as well

as on the feedback the users provide to the algorithm’s

recommendations. The users’ feedback can moreover

be used to measure the effectiveness of our recomen-

dation algorithm in terms of the information retrieval

concepts of precision and recall.

The cornerstone of our recommendation system is

a basic relevance algorithm that computes how rel-

evant an article is to a given user. The basic rele-

vance algorithm gives very high quality results, but

is not scalable because it takes time proportional to

the product of the number of candidate articles with

the average number of distinct words contained in an

article.

To address the scalability problem, we devised an

optimized relevance algorithm that gives the same re-

sults of the basic relevance algorithm at the cost of

minimal ofﬂine computations. The optimized rele-

vance algorithm takes time proportional to product

of the number of candidate articles with the number

of articles explicitly liked by the user. It is therefore

faster than the basic relevance algorithm, but not yet

scalable.

To obtain a scalable algorithm, we have developed

an m-approximated relevance algorithm, where m is a

ﬁxed constant. The m-approximated relevance algo-

rithm is scalable, and it takes time proportional to the

number of articles explicitly liked by the user.

REFERENCES

Banos, E., Katakis, I., Bassiliades, N., Tsoumakas, G., and

Vlahavas, I. P. (2006). PersoNews: A personalized

news reader enhanced by machine learning and se-

mantic ﬁltering. In Ontologies, DataBases, and Ap-

plications of Semantics, pages 975–982.

Bharat, K., Kamba, T., and Albers, M. (1998). Personal-

ized, interactive news on the web. Multimedia Sys-

tems, 6(5):349–358.

Borsje, J., Levering, L., and Frasincar, F. (2008). Hermes: A

semantic web-based news decision support system. In

Symposium on Applied Computing, pages 2415–2420.

Das, A., Datar, M., Garg, A., and Rajaram, S. (2007).

Google news personalization: Scalable online collab-

orative ﬁltering. In International Conference on World

Wide Web, pages 271–280.

Lang, K. (1995). Newsweeder: Learning to ﬁlter netnews.

In International Conference on Machine Learning,

pages 331–339.

Pazzani, M. J. and Billsus, D. (2007). Content-based rec-

ommendation systems. In The Adaptive Web, pages

325–341.

Segaran, T. (2007). Programming Collective Intelligence:

Building Smart Web 2.0 Applications. O’Reilly.

ICAART 2011 - 3rd International Conference on Agents and Artificial Intelligence

194