ADAPTIVE PREDICTIONS IN A USER-CENTERED

RECOMMENDER SYSTEM

Anne Boyer and Sylvain Castagnos

LORIA, Universit´e Nancy 2

Campus Scientiﬁque B.P. 239

54506 Vandoeuvre-l

es-Nancy, France

Keywords:

Distributed Collaborative Filtering, Recommender Systems, Personalization, Grid Computing, Scalability,

Privacy.

Abstract:

The size of available data on Internet is growing faster than human ability to treat it. Therefore, it becomes

more and more difﬁcult to identify the most relevant information, even for skilled people using efﬁcient search

engines. A way to cope with this problem is to automatically recommend data in accordance with users’

preferences. Among others, collaborative ﬁltering processes help users to ﬁnd interesting items by comparing

them with users having the same tastes. This paper describes a new user-centered recommender system relying

on collaborative ﬁltering and grid computing. Our model has been implemented in a Peer-to-Peer architecture.

It has been especially designed to deal with problems of scalability and privacy. Moreover, it adapts its

prediction computations to the density of the user neighborhood.

1 INTRODUCTION

In order to face the exponential growth of data on the

Internet, intelligent information retrieval assistants

becomes more and more popular and improve the

interactions between users and software. Those as-

sistants, such as recommendersystems relying on col-

laborative ﬁltering, help efﬁciently people to ﬁnd in-

teresting items by modeling their preferences and by

comparing them with users having the same tastes.

Nevertheless, there are a lot of problems to deal

with, when implementing a collaborative ﬁltering al-

gorithm. In this paper, we particularly pay attention

to the following signiﬁcant limitations for industrial

use:

• scalability and system reactivity: recommender

systems must take into account real industrial con-

straints. There are potentially several thousand

users and items to manage. Despite the large num-

ber of parameters, algorithms must compute vir-

tual communities of interests in real time;

• intrusions into privacy: by modeling user actions

and preferences in order to compute recommenda-

tions, intelligent systems access intimate informa-

tion about users. So, we have to be careful to be

as unintrusive as possible and at least to guaran-

tee the anonymity of users. Moreover, because of

the conﬁdential nature of some data, users must be

aware of the prediction computation process and

explicitly choose the part of their proﬁle to take

into account;

• novelty in predictions: according to the context,

users want to have more or less new recommen-

dations. This is why we introduce an adaptive

minimum-correlation threshold of neighborhood

which evolves in accordance with user expecta-

tions.

We propose an algorithm which is based on an

analysis of usage. It relies on a distributed user-based

collaborative ﬁltering technique. Our model has been

integrated in a document sharing system called ”So-

foS”.

Our algorithm is implemented on a Peer-to-Peer

architecture because of the document platform con-

text. In a lot of companies, documents are referenced

using a common codiﬁcation that may require a cen-

SofoS is the acronym for ”Sharing Our Files On the

System”.

Boyer A. and Castagnos S. (2007).

ADAPTIVE PREDICTIONS IN A USER-CENTERED RECOMMENDER SYSTEM.

In Proceedings of the Third International Conference on Web Information Systems and Technologies - Web Interfaces and Applications, pages 51-58

DOI: 10.5220/0001274300510058

 SciTePress

tral server

but are stored on users’ devices. The dis-

tribution of computations and contents matches the

constraints of scalability and reactivity.

In this paper, we will ﬁrst present the related work

on collaborativeﬁltering approaches. We will then in-

troduce our Peer-to-Peer user-centered model which

offers the advantage of being fully distributed. We

called this model ”Adaptive User-centered Recom-

mender Algorithm” (AURA). It provides a service

which builds a virtual community of interests cen-

tered on the active user by selecting his/her near-

est neighbors. As the model is ego-centered, the

active user can deﬁne the expected prediction qual-

ity by specifying the minimum-correlation threshold.

AURA is an anytime algorithm which furthermore re-

quires very few computation time and memory space.

As we want to constantly improve our model and the

document sharing platform, we are incrementally and

modularly developing them on a JXTA platform

2 STATE-OF-THE-ART

In centralized collaborative ﬁltering approaches, ﬁnd-

ing the closest neighbors among several thousands of

candidates in real time may be unrealistic (Sarwar

et al., 2001). On the contrary, decentralization of data

is practical to comply with privacy rules, as long as

anonymity is fulﬁlled (Canny, 2002). This is the rea-

son why more and more researchers investigate var-

ious means of distributing collaborative ﬁltering al-

gorithms. This also presents the advantage of giving

the ownership of proﬁles to users, so that they can

be re-used in several applications.

We can mention

research on P2P architectures, multi-agents systems

and decentralized models (client/server, shared data-

bases).

There are several ways to classify collaborativeﬁl-

tering algorithms. In (Breese et al., 1998), authors

have identiﬁed, among existing techniques, two ma-

jor classes of algorithms: memory-based and model-

based algorithms. Memory-based techniques offer

the advantage of being very reactive, by immedi-

ately integrating modiﬁcations of users proﬁles into

the system. They also guarantee the quality of rec-

ommendations. However, Breese et al. (Breese et al.,

1998) are unanimous in thinking that their scalabil-

ity is problematic: even if these methods work well

This allows to have document IDs and to identify them

easily.

http://www.jxta.org/

As the owner of the proﬁle, the user can apply it to dif-

ferent pieces of software. In centralized approaches, there

must be as many proﬁles as services for one user.

with small-sized examples, it is difﬁcult to change to

situations characterized by a great number of docu-

ments or users. Indeed, time and space complexities

of algorithms are serious considerations for big data-

bases. According to Pennock et al. (Pennock et al.,

2000), model-based algorithms constitute an alterna-

tive to the problem of combinatorial complexity. Fur-

thermore, these models highlight some correlations in

data, thus proposing an intuitive reason for recom-

mendations or simply making the hypotheses more

explicit. However, these methods are not dynamic

enough and they react badly to insertion of new con-

tents into the database. Moreover, they require a pe-

nalizing learning phase for the user.

Another way to classify collaborative ﬁltering

techniques is to consider user-based methods in op-

position to item-based algorithms. For example,

we have explored a distributed user-based approach

within a client/server context in (Castagnos and

Boyer, 2006). In this model, implicit criteria are used

to generate explicit ratings. These votes are anony-

mously sent to the server. An ofﬂine clustering al-

gorithm is then applied and group proﬁles are sent to

clients. The identiﬁcation phase is done on the client

side in order to cope with privacy. This model also

deals with sparsity and scalability. The authors high-

light the added value of a user-based approach in the

situation where users are relativelystable, whereas the

set of items may often vary considerably. On the con-

trary, Miller et al.(Miller et al., 2004) show the great

potential of distributed item-based algorithms. They

propose a P2P version of the item-item algorithm.

In this way, they address the problems of portability

(even on mobile devices), privacy and security with

a high quality of recommendations. Their model can

adapt to different P2P conﬁgurations.

Beyond the different possible implementations,

we can see there are a lot of open questions

raised by industrial use of collaborative ﬁltering.

Canny (Canny, 2002) concentrates on ways to provide

powerful privacy protection by computing a ”pub-

lic” aggregate for each community without disclos-

ing individual users’ data. Furthermore, his approach

is based on homomorphic encryption to protect per-

sonal data and on a probabilistic factor analysis model

which handles missing data without requiring default

values for them. Privacy protection is provided by

a P2P protocol. Berkovsky et al. (Berkovsky et al.,

2006) also deal with privacy concern in P2P recom-

mender systems. They address the problem by elect-

ing super-peers whose role is to compute an average

proﬁle of a sub-population. Standard peers have to

contact all these super-peers and to exploit these aver-

age proﬁles to compute predictions. In this way, they

WEBIST 2007 - International Conference on Web Information Systems and Technologies

never access the public proﬁle of a particular user.

We can also cite the work of Han et al.(Han et al.,

2004), which addresses the problem of privacy pro-

tection and scalability in a distributed collaborative

ﬁltering algorithm called PipeCF. Both user database

management and prediction computation are split be-

tween several devices. This approach has been im-

plemented on Peer-to-Peer overlay networks through

a distributed hash table method.

In this paper, we introduce a new hybrid method

called AURA. It combines the reactivity of memory-

based techniques with the data correlation of model-

based approaches by using an iterative clustering al-

gorithm. Moreover, AURA is a user-based model

which is completely distributed on the user scale. It

has been integrated in the SofoS document platform

and relies on a P2P architecture in order to distrib-

ute either prediction computations, content or pro-

ﬁles. We design our model to tackle, among others,

the problems of scalability, and privacy.

3 MODEL AND

IMPLEMENTATION

SofoS is a document platform, using a recommender

system to provide users with content. Once it is in-

stalled, users can share and/or search documents, as

they do on P2P applications like Napster. We con-

ceive it in such a way that it is as open as possible to

different existing kinds of data: hypertext ﬁles, docu-

ments, music, videos, etc. The goal of SofoS is also

to assist users to ﬁnd the most relevant sources of in-

formation efﬁciently. This is why we add the AURA

recommender module to the system. We assume that

users can get pieces of information either by using our

system or by going surﬁng on the web. SofoS conse-

quently enables to take visited websites into account

in the prediction computations.

We are implementing SofoS in a generic envi-

ronment for Peer-to-Peer services, called JXTA. This

choice is motivated by the fact it is greatly used in our

research community.

In (Miller et al., 2004), the authors highlight the

fact that there are several types of possible architec-

tures for P2P systems. We can cite those with a cen-

tral server (such as Napster), random discovery ones

(such as Gnutella or KaZaA), transitive traversal ar-

chitectures, content addressable structures and secure

blackboards.

Some of these architectures are totally distributed. Oth-

ers mixed centralized and distributed approaches but elect

super-peers whose role is to partially manage subgroups of

peers in the system.

We conceived our model with the idea that it could

be adapted to different types of architectures. How-

ever, in this paper, we will illustrate our claims by

basing our examples on the random approach even if

others may have an added value. The following sub-

section aims at presenting the AURA Algorithm.

3.1 Recommender Module

3.1.1 Privacy

We presume that each peer in SofoS corresponds to

a single user on a given device.

For this reason, we

have conceived the platform in such a way that users

have to open a session with a login and a password be-

fore using the application. In this way, several persons

can use the same computer (for example, the different

members of a family) without disrupting their respec-

tive proﬁles. That is why each user on a given peer

of the system has his/her own proﬁle and a single ID.

The session data remain on the local machine in order

to enhance privacy. There is no central server required

since sessions are only used to distinguish users on a

given peer.

For each user, we use a hash function requiring the

IP address and the login in order to generate his/her

ID on his/her computer. This use of a hash function H

is suitable, since it has the following features:

• non-reversible: knowing ”y”, it is hard to ﬁnd ”x”

such as H(x) = y;

• no collision: it is hard to ﬁnd ”x” and ”y” such as

H(x) = H(y);

• knowing ”x” and ”H”, it is easy to compute H(x);

• H(x) has a ﬁxed size.

In this way, an ID does not allow identiﬁcation of

the name or IP address of the corresponding user. The

communication module uses a IP multicast address to

broadcast the packets containing addressees’ IDs. In

order to reduce the information ﬂow, we can option-

ally elect a super-peer which keeps a list of IDs whose

session is active: before sending a message, a peer can

ask if the addressee is connected. If the super-peer has

no signal from a peer for a while, it removes the cor-

responding ID from the list.

3.1.2 User-centered Predictions

Users can both share items on the platform and inte-

grate a feedback about websites they consult. Each

item has a proﬁle on the platform. In addition to

We can easily distinguish devices since SofoS has to be

installed on users’ computers.

ADAPTIVE PREDICTIONS IN A USER-CENTERED RECOMMENDER SYSTEM

Adapted

Chan

formula:

Interest(item) = 1 + 2 . IsFavorite(item) + Recent(item) + Frequency(item) . Duration(item)

With: Recent(item) =

date(last visit) − date(log beginning)

date(present) − date(log beginning)

And: Duration(item) = max

consultations



time spent reading item

size of the item



(1)

Interest(item) must be rounded up to the nearest integer.

IsFavorite(item) equals 1 if the item has been explicitly and positively voted by the user (non-numerical vote) and 0 otherwise.

Frequency(item) . Duration(item) must be normalized so that the maximum is 1.

the available documents, each peer owns 7 pieces of

information: a personal proﬁle, a public proﬁle, a

group proﬁle and 4 lists of IDs (list ”A” for IDs of

peers belonging to its group, list ”B” for those which

exceed the minimum-correlation threshold s

as ex-

plained below, list ”C” for the black-listed IDs and

list ”O” for IDs of peers which have added the active

user proﬁle to their group proﬁle). An example of the

system run is shown on ﬁgure 1.

Figure 1: Exchanges of proﬁles p

between peers U

with

different IDs.

In order to build the personal proﬁle of the ac-

tive user u

, we use both explicit and implicit crite-

ria. The active user can always check the list of items

that he/she shares or has consulted. He/She can ex-

plicitly rate each of these items on a scale of values

from 1 to 5. The active user can also initialize his/her

personal proﬁle with a set of criteria

proposed in the

interface in order to partially face the cold start prob-

lem. This offers the advantage of completing the pro-

ﬁle with more consistency and of ﬁnding similarities

with other users more quickly, since everyone can ﬁll

the same criteria rating form.

We assume that, despite the explicit voluntary

completion of proﬁles, there are a lot of missing data.

We consequently add to AURA a user modeling func-

tion based on the Chan formula (cf. formula 1).

This fonction relies on an analysis of usages. It tem-

Ideally, the set of items in the criteria set should cover

all the implicit categories that users can ﬁnd on the platform.

porarly collects information about the active user’s

actions (frequency and duration of consultations for

each item, etc.) and transforms them into numeri-

cal votes. In order to preserve privacy, all pieces of

data as regards user’s actions remain on his/her peer.

The explicit ratings and the estimated numerical votes

constitute the active user’s personal proﬁle. The pub-

lic proﬁle is the part of the personal proﬁle that the

active user accepts to share with others.

The algorithm also has to build a group proﬁle. It

represents the preferences of a virtual community of

interests, and has been especially designed to be as

close as possible to the active user’s expectations. In

order to do that, the peer of the active user asks for the

public proﬁles of all the peers it can reach through the

platform. Then, for each of these proﬁles, it computes

a similarity measure with the personal proﬁle of the

active user. The active user can indirectly deﬁne a

minimum-correlation threshold which corresponds to

the radius of his/her trust circle.

If the result is lower than this ﬁxed threshold

which is speciﬁc to each user, the ID of the peer is

added to the list ”A” and the corresponding proﬁle is

included in the group proﬁle of the active user, using

the procedure of table 1.

Table 1: Add a public proﬁle to the group proﬁle.

Proc AddToGroupProﬁle(public proﬁle of u

)

W = W + |w(u

for each item i do

l,i

) = (u

l,i

) ∗ (W − |w(u

)|)

l,i

) = ((u

l,i

) + w(u

) ∗ (u

n,i

))/W

end for

With: (u

l,i

) the rating for item i in the group proﬁle;

n,i

) the rating of user n for item i;

W the sum of |w(u

)|, which is stored;

w(u

) the correlation coefﬁcient between

the active user u

and u

We used the Pearson correlation coefﬁcient to

compute similarity, since the literature shows it works

WEBIST 2007 - International Conference on Web Information Systems and Technologies

Figure 2: Example of user interactions.

well (Shardanand and Maes, 1995). Of course, if this

similarity measure is higher than the threshold, we

add the ID of the peer to the list ”B”. The list ”C”

is used to systematically ignore some peers. It en-

ables to improve trust – that is to say the conﬁdence

that users have in the recommendations – by identify-

ing malicious users. The trust increasing process will

not be considered in this paper.

When his/her personal proﬁle changes, the active

user has the possibility to update his/her public proﬁle

. In this case, the active peer has to contact every

peer

whose ID is in the list ”O”. Each of these peers

re-computes the similarity measure. If it exceeds the

threshold, the proﬁle p

has to be removed from the

group proﬁle, using the procedure of table 2. Other-

wise, p

has to be updated in the group proﬁle, that

is to say the peer must remove the old proﬁle and add

the new one.

Table 2: Remove a public proﬁle from the group proﬁle.

Proc RemoveToGroupProﬁle(old proﬁle of u

)

W = W − |w(u

for each item i do

l,i

) = (u

l,i

) ∗ (W + |w(u

)|)

l,i

) = ((u

l,i

) − w(u

) ∗ (u

n,i

))/W

end for

By convention, we use the notation < id, p > for

the peer-addition packet, that is to say new arrivals.

< id, p,s > corresponds to the packet of a peer which

is already connected and sends data to a new arrival.

”s” is the threshold value. There is no need to spec-

ify the threshold value in the peer-addition packet,

since there is a default value (|correlation| >= 0). At

last, < id, p

t−1

, p

,s > is the notation for the update

packet. In each of these packets, the ﬁrst parameter

A packet is broadcasted with an heading containing

peers’ IDs, the old proﬁle and the new public proﬁle.

corresponds to the ID of the source of the message.

In order to simplify the notation, we do not include

the addressees’ ID in ﬁgure 2.

Figure 2 illustrates how the system works. In this

example, we consider 3 of the 5 users from ﬁgure 1.

We show the registers of the active user u

and the

user u

. At time t

, the active user u

tries to con-

tact, for the ﬁrst time, other peers by sending his/her

public proﬁle and his/her ID to neighbors. This is the

packet < id

, p

>. u

receives the packet and an-

swers at t

. u

computes the distance between the

public proﬁles p

and p

. As the Pearson coefﬁ-

cient is inevitably within the default threshold limit,

adds id

to his/her list ”A”. If the computed cor-

relation coefﬁcient is higher than ”s

” which is the

threshold of u

, u

adds id

to his/her list ”O”. Mean-

while, some of the reached peers will add p

to their

list ”A” if the correlation is higher than their thresh-

old (this is the case for u

). At time t

, u

arrives

on the platform and sends a packet to u

. At time t

replies to u

and sends the packet of u

to peers

that he/she already knows. u

receives it and adds

to his/her list ”A”. He/She also adds id

to the

list ”O”, since u

is a new arrival and has a default

threshold. At time t

, u

consequently gives his/her

public proﬁle to u

. At the same time, u

has changed

his/her threshold and considers that u

is too far in

the user/item representation space, that is to say the

correlation coefﬁcient between u

and u

exceeds the

limit. Thus, u

adds id

in the list ”B”. In the packet

< id

, p

>, ”s

” allows u

to know that he/she

must complete the list ”O” with id

. At last, u

up-

dates his/her public proﬁle. Afterwards, he/she noti-

ﬁes the change to the IDs in the list ”O”. This is the

packet < id

, p

4,t

, p

4,t

>. p

4,t

and p

4,t

are re-

spectively the old and new public proﬁles of u

. When

receives this packet, he/she updates the list ”O” by

removing id

since s

is too high for him/her.

ADAPTIVE PREDICTIONS IN A USER-CENTERED RECOMMENDER SYSTEM

3.2 Adaptive Minimum-correlation

Threshold

As shown in the previous subsection, the active user

can indirectly deﬁne the minimum-correlationthresh-

old that other people must reach in order to be a mem-

ber of his/her community. Concretely, a high corre-

lation threshold means that users taken into account

in prediction computations are very close to the ac-

tive user. Recommendations will be consequently ex-

tremely similar to his/her own preferences. On the

contrary, a low correlation threshold sets forth the will

of the active user to stay aware of generalist informa-

tion by integrating distant users’ preferences. In this

way, the user avoids freezed suggestions by accept-

ing novelty. In the SofoS interface, a slide bar allows

the active user to ask for personalized or generalist

recommendations. This allows AURA to know the

degree to which it can modify the threshold

. The de-

fault threshold value is 0, which means that we take all

the peers into account. The default step of threshold is

0.1, but it can be adapted to the density of population.

As shown in ﬁgure 3, we split the interval of

the Pearson coefﬁcient’s possible values [−1;+1] into

subsets. For each subset, we keep the count of peers

which have got in touch with the active user and

whose correlation coefﬁcient is contained in the in-

terval corresponding to the subset. Thus, when a user

sends a packet to u

, the Pearson coefﬁcient is com-

puted in order to know if the active user’s group pro-

ﬁle has to be updated according to the threshold value.

At the same time, we update the corresponding values

in the population distribution histogram. For exam-

ple, if u

receives an update packet and the Pearson

coefﬁcient changes from 0.71 to 0.89, we decrement

the register of the interval [0.7;0.8) and we increment

the register of the interval [0.8;0.9). In this way, we

constantly have the population density for each inter-

val.

When the total number of users whose Pearson

coefﬁcient is higher than (threshold + 0.1) exceeds a

given limit (dashed line on ﬁgure 3), we increase the

threshold. If there are too many users in the next sub-

set, the threshold increase is lower. For the moment,

the maximum threshold value is 0.2 for users who

want a high degree of novelty and 0.9 for those who

expect recommendations close to their preferences.

These values have been arbitrarily chosen. We plan

By ”threshold”, we mean the minimum absolute value

of Pearson coefﬁcients to consider in the group proﬁle com-

putation. For example, if the system sets the threshold to

0.1, it means that only peers u

whose correlation coefﬁ-

cient |w(u

)| is higher than 0.1 will be included in the

group proﬁle of the active user.

That is to say they want to retrieve items that they have

high-rated

Figure 3: Adaptive threshold based on density.

to do statistical tests to automatically determine the

ideal thresholds according to the context.

4 DISCUSSION

In order to deﬁne the degree of privacy of our rec-

ommender system, we refer to 4 axes of personal-

ization (Cranor, 2005). Cranor assumes that an ideal

system should be based on an explicit data collection

method, transient proﬁles, user-initiated involvment

and non-invasive predictions. In our system, the users

have complete access to their preferences. They have

an effect on what and when to share with others. Only

numerical votes are exchanged and the logs of user

actions are transient. Even when the active user did

not want to share his/her preferences, it is possible to

do predictions since public proﬁles of other peers are

temporarily available on the active user device. Each

user has a single ID, but the anonymity is ensured by

the fact that there is no table linking IDs and iden-

tities. This privacy-enhanced process requires more

network trafﬁc than in (Berkovsky et al., 2006), but

it allows the system to perform user-centered rather

than community-centered predictions.

As regards scalability, our model no longer suffers

from limitations since the algorithms used to compute

group proﬁles and predictions are in o(b), where b is

the number of commonly valuated items between two

users, since computations are made incrementally in

a stochastic context. In return, AURA requires quite

a lot of network trafﬁc. This is particularly true if

we use a random discovery architecture. Other P2P

structures can improve communications (Miller et al.,

2004).

Furthermore, we assume that quality of predic-

tions in real situation should be better – providing

that we found enough neighbors – since the virtual

community of interests on each peer is centered on

the active user. We can inﬂuence the degree of per-

sonalization by adjusting the threshold according to

the density of the active user’s neighborhood. The

system just has to increase the threshold in order to

ensure users to retrieve the items that they have high-

WEBIST 2007 - International Conference on Web Information Systems and Technologies

rated among their recommendations. To highlight this

phenomenon, we generated a rating matrix of 1,000

users and 1,000 items. The votes follow a gaussian

law and we can see the average number of neighbors

as regards Pearson coefﬁcient scaling on ﬁgure 4. We

randomly removed 20% of these votes and applied

the AURA algorithm. Then, we compute the Recall

which measures how often a list of recommendations

contains an item that the user have already rated in

his/her top 10. When increasing the threshold in the

system, this measure becomes higher.

Figure 4: On the left, average distribution of users as re-

gards Pearson coefﬁcient. On the right, recall as threshold

grows.

We have also evaluated our model in terms of pre-

diction relevancy. We used the Mean Absolute Error

(

MAE

is a widely used metric which shows the

deviation between predictions and real user-speciﬁed

values. Consequently, we computed the average er-

ror between the predictions and 100,000 ratings of the

GroupLens test set

as shown in formula 2.

MAE =

∑

i=1

− q

(2)

We simulate arrivals of peers by progressively

adding new proﬁles. As shown on ﬁgure 5, we get

predictions as good as using the PocketLens algo-

rithm (Miller et al., 2004). PocketLens relies on a dis-

tributed item-based approach. This comparison con-

sequently demonstrates that AURA provides as rele-

vant results as a performant item-based collaborative

ﬁltering.

Figure 5: MAE as neighborhood size grows.

http://www.grouplens.org/

At last, we compared our recommender system

with two centralized algorithms (Item-Item (Sarwar

et al., 2001) and the Correlation-based Collaborative

Filter CorrCF (Resnick et al., 1994)) to illustrate the

added value of the distributed approach. In order to

determine the computation times of these algorithms,

we have generated random public proﬁles with differ-

ent numbers of items. In this simulation, the votes

of each user follow a Gaussian distribution centered

on the middle of the representation space. Moreover,

only 1% of data in the generated proﬁles is missing.

Since the Item-Item and CorrCF are centralized, we

ﬁrst aggregate the proﬁles in a vote matrix.

The results of the tests in term of computation

time are shown in the table 3. The announced times

for the AURA algorithm do not include the duration

required to scan the network in search of public pro-

ﬁles. Of course, the difference between AURA and

the two others is mainly due to the fact that we use

as many peers as users for computations. However,

these results illustrate the considerable gain in com-

parison with centralized techniques. AURA allows

to do real-time predictions. There is no need to do

ofﬂine computations since we can take into account

10,000 proﬁles and 150 items in less than an half-

second. Moreover, the system does not have to wait

until all similarity measures end. As the algorithm is

incremental, we can stop considering other peers at

any moment.

5 CONCLUSION

SofoS is a document sharing platform including a

recommender system. To cope with numerous prob-

lems speciﬁc to information retrieval, we proposed a

Peer-to-Peer collaborative ﬁltering model which is to-

tally distributed. It allows real-time personalization

and manages the degree of personalization that users

want. We implement it on a JXTA platform which

has been used by researchers all over the world. We

show in this paper that we can deal with important

problems such as scalability, privacy and quality. We

highlight the beneﬁts of our system by doing ofﬂine

performance analysis. We plan on validating these

points by testing our model with real users in real con-

ditions.

Our algorithm is anytime and incremental. Con-

trary to PocketLens, our model is user-based because

we consider that the set of items can change. Even

if an item is deleted, we can continue to exploit its

Only 1% of missing data is not realistic but can poten-

tially increase the computation time what is interesting in

this case.

ADAPTIVE PREDICTIONS IN A USER-CENTERED RECOMMENDER SYSTEM

Table 3: Computation times of three collaborative ﬁltering algorithms.

Items 100 150 1000

Users AURA CorrCF It-It AURA CorrCF It-It AURA CorrCF It-It

200 0”01 2”60 2”14 0”01 3”17 2”71 0”07 11”09 52”74

400 0”02 6”09 3”87 0”02 7”62 5”29 0”12 32”24 1’22”

600 0”02 11”78 5”59 0”03 15”21 7”34 0”18 1’04” 2’05”

800 0”03 19”98 7”23 0”04 25”67 10”53 0”27 1’52” 2’33”

1,000 0”03 30”22 8”56 0”05 40”68 12”84 0”30 3’06” 3’25”

1,400 0”04 1’00” 11”50 0”06 1’17” 18”10 0”42 6’04” 4’29”

10,000 0”31 7:30’ 1’22” 0”48 - 2’05” 1”90 - 49’28”

100,000 3”04 - - - - - - - -

ratings in the prediction computations. Moreover, the

stochastic context of our model allows the system to

update the modiﬁed proﬁles instead of resetting all

the knowledge about neighbors. At last, our model is

very few memory-consumingbecause it does not need

to store any neighbors’ ratings, similarity matrix, dot

product matrix and so on. It only requires the sum of

pearson coefﬁcients and four lists of user IDs.

Currently, we are developing our protocols further

to cope with other limitations, such as trust and secu-

rity aspects by using speciﬁc communication proto-

cols as in (Polat and Du, 2004).

REFERENCES

Berkovsky, S., Eytani, Y., Kuﬂik, T., and Ricci, F.

(2006). Hierarchical neighborhood topology for pri-

vacy enhanced collaborative ﬁltering. In in CHI

2006 Workshop on Privacy-Enhanced Personalization

(PEP2006), Montreal, Canada.

Breese, J. S., Heckerman, D., and Kadie, C. (1998). Em-

pirical analysis of predictive algorithms for collabo-

rative ﬁltering. In Proceedings of the fourteenth An-

nual Conference on Uncertainty in Artiﬁcial Intelli-

gence (UAI-98), San Francisco, CA.

Canny, J. (2002). Collaborative ﬁltering with privacy. In

IEEE Symposium on Security and Privacy, pages 45–

57, Oakland, CA.

Castagnos, S. and Boyer, A. (2006). A client/server user-

based collaborative ﬁltering algorithm: Model and im-

plementation. In Proceedings of the 17th European

Conference on Artiﬁcial Intelligence (ECAI2006),

Riva del Garda, Italy.

Cranor, L. F. (2005). Hey, that’s personal! In the Interna-

tional User Modeling Conference (UM05).

Han, P., Xie, B., Yang, F., Wang, J., and Shen, R. (2004).

A novel distributed collaborative ﬁltering algorithm

and its implementation on p2p overlay network. In

Proc. of the Eighth Paciﬁc-Asia Conference on Knowl-

edge Discovery and Data Mining (PAKDD04), Syd-

ney, Australia.

Miller, B. N., Konstan, J. A., and Riedl, J. (2004). Pock-

etlens: Toward a personal recommender system.

In ACM Transactions on Information Systems, vol-

ume 22.

Pennock, D. M., Horvitz, E., Lawrence, S., and Giles, C. L.

(2000). Collaborative ﬁltering by personality diag-

nosis: a hybrid memory- and model-based approach.

In Proceedings of the sixteenth Conference on Uncer-

tainty in Artiﬁcial Intelligence (UAI-2000), San Fran-

cisco, USA. Morgan Kaufmann Publishers.

Polat, H. and Du, W. (2004). Svd-based collaborative ﬁl-

tering with privacy. In Proc. of ACM Symposium on

Applied Computing, Cyprus.

Resnick, P., Iacovou, N., Suchak, M., Bergstorm, P., and

Riedl, J. (1994). Grouplens: An open architecture for

collaborative ﬁltering of netnews. In Proceedings of

ACM 1994 Conference on Computer Supported Co-

operative Work, pages 175–186, Chapel Hill, North

Carolina. ACM.

Sarwar, B. M., Karypis, G., Konstan, J. A., and Reidl, J.

(2001). Item-based collaborative ﬁltering recommen-

dation algorithms. In World Wide Web, pages 285–

295.

Shardanand, U. and Maes, P. (1995). Social information ﬁl-

tering: Algorithms for automating “word of mouth”.

In Proceedings of ACM CHI’95 Conference on Hu-

man Factors in Computing Systems, volume 1, pages

210–217.

WEBIST 2007 - International Conference on Web Information Systems and Technologies