EFFICIENT NEIGHBOURHOOD ESTIMATION FOR

RECOMMENDATION MAKING

Li-Tung Weng, Yue Xu, Yuefeng Li and Richi Nayak

Faculty of Information Technology, Queensland University of Technology, 4001 Queensland, Australia

Keywords: Recommender System, Neighbourhood Formation, Taxonomic Information.

Abstract: Recommender systems produce personalized product recommendations during a live customer interaction,

and they have achieved widespread success in e-commerce nowadays. For many recommender systems,

especially the collaborative filtering based ones, neighbourhood formation is an essential algorithm

component. Because in order for collaborative-filtering based recommender to make a recommendation, it is

required to form a set of users sharing similar interests to the target user. “Best-k-neighbours” is a popular

neighbourhood formation technique commonly used by recommender systems, however as tremendous

growth of customers and products in recent years, the computation efficiency become one of the key

challenges for recommender systems. Forming neighbourhood by going through all neighbours in the

dataset is not desirable for large datasets containing million items and users. In this paper, we presented a

novel neighbourhood estimation method which is both memory and computation efficient. Moreover, the

proposed technique also leverages the common “fixed-n-neighbours” problem for standard “best-k-

neighbours” techniques, therefore allows better recommendation quality for recommenders. We combined

the proposed technique with a taxonomy-driven product recommender, and in our experiment, both time

efficiency and recommendation quality of the recommender are improved.

1 INTRODUCTION

Recommender systems are designed to benefit

humans’ information extracting experiences by

giving information recommendations according to

their information needs. User based collaborative

filtering is the most fundamental and widely applied

recommendation technique(Schafer et al., 2000), it

generates recommendations based on finding items

that are commonly preferred by the neighbourhoods

of the target users. Specifically, a target user’s

neighbourhood is a set of users sharing similar

preferences to the target user(Awerbuch et al.,

2005). Neighbourhood formation in collaborative

filtering techniques requires comparing the target

users’ preferences to the preferences of all users in

the dataset, and such preference comparison process

can become a major computation efficiency

bottleneck for recommenders. For large datasets,

neighbourhood formation process requires a large

amount of I/O to retrieve user profiles, and each user

profile may be represented by a very high dimension

vector, hence the similarity computation between the

vectors can be very expensive.

Our main contribution in this paper is a novel

neighbourhood estimation method called “relative

distance filtering” (RDF), it is based on pre-

computing a small set of relative distances between

users, and using the pre-computed distances to

eliminate most unnecessary similarity comparisons

between users. The proposed RDF method is also

capable of dynamic handling frequent data update;

whenever the user preferences in the dataset are

added, deleted or modified, the pre-computed

structure cache can also be efficiently updated.

Part of our research is to develop a novel

collaborative filtering based recommender that

utilizes the item taxonomy information for its user

preference representation. Our work is based on a

well-known taxonomy recommender, namely

taxonomy product recommender (TPR), proposed by

Ziegler (Ziegler et al., 2004) which utilizes the

taxonomy information of the products to solve the

data sparsity and cold-start problems. TPR

outperforms standard collaborative filtering systems

with respect to the recommendation accuracy when

producing recommendations for sites with data

sparsity. However, the time efficiency of TPR drops

Weng L., Xu Y., Li Y. and Nayak R. (2008).

EFFICIENT NEIGHBOURHOOD ESTIMATION FOR RECOMMENDATION MAKING.

In Proceedings of the Tenth International Conference on Enterprise Information Systems - SAIC, pages 12-19

DOI: 10.5220/0001695000120019

 SciTePress

significantly when dealing with huge number of

users, because the user preferences in TPR are

represented by high dimensional vectors. We

applied the proposed RDF technique to the TPR and

the experiment results show that by utilizing the

proposed technique, both the accuracy and

efficiency of TPR are significantly improved.

2 RELATED WORK

Neighbourhood formation is a process required by

most collaborative filtering based recommenders to

find users with similar interests to the target user.

Sarwar (Sarwar et al., 2002) proposed an efficient

neighbourhood selection method by pre-computing

users into clusters. However, clustering is an

expensive process and can only be done offline.

Datasets keep changing over time. Therefore the

overall quality of the result neighbourhood based on

existing clusters will degrade until the next

clustering update. Moreover, clustering based

neighbourhood selection favours target users nearby

cluster centres, and for other users located at

surrounding cluster edges the quality of their result

neighbourhoods are usually poor because their

actual neighbours are very likely in other clusters

(Sarwar et al., 2002). There are also several

neighbourhood formation algorithms developed

specifically for high dimensional data, such as

RTree (Manolopoulos et al., 2005), kd-Tree

(Bentley, 1990), etc. The basic idea behind these

algorithms is to index these high dimensional data

into a search tree structure, and within each level,

the children nodes subdivides the cluster their parent

node holds into finer clusters and each tree node

holds one of the cluster spaces. The search

efficiency of these algorithms is very impressive,

because the search space are quadratically reduced

in each tree level (i.e. O(logN)). However, they

suffer from similar problems to cluster based

neighbourhood search, which is “loss of precision”.

In fact, these algorithms usually produce worse

result than clustering based method. Moreover,

because the internal tree structures for indexing the

data are fairly complex, therefore these algorithms

are usually memory intensive and slow in

initialization. The proposed RDF technique is not as

good as these tree-structure based methods in terms

of computation efficiency, however it is still more

efficient than cluster based search method. In terms

of accuracy, the proposed method produces much

better result than these tree-structure based methods

because it does not constrain neighbourhood search

within local clusters. The internal structure of the

proposed RDF technique can be updated

dynamically in real time and requires only very

small amount of physical memory.

3 TAXONOMY PRODUCT

RECOMMENDER

An overview of taxonomy-driven product

recommender (TPR) proposed by Ziegler (Ziegler et

al., 2005, Ziegler et al., 2004) is given in this

section.

3.1 Item Taxonomy Model

We envision a world with a finite set of users





,



,…,



 and a finite set of items





,



,…,



. For each user 



, he or she

is associated with a set of corresponding implicit

ratings 



, where



. Unlike explicit ratings in

which users are asked to supply their perceptions to

items explicitly in a numeric scale, implicit ratings

such as transaction histories, browsing histories, etc.,

are more common and obtainable for e-commerce

sites and communities.

In standard collaborative filtering

recommenders, user profiles are represented by -

dimensional vectors, where || and each

dimension represents an explicit item rating.

However, for many systems,  can be very large

and the number of ratings made by each user can be

very small. This problem is often addressed as cold

start problem or data sparsity problem.

Data sparsity problem is relieved with TPR,

because instead of using the product-rating vectors

with || dimensionalities as user profiles, TPR uses

taxonomy vectors with  dimensionalities, where 

is the number of topics in the product taxonomy

space. Specifically, we denote the taxonomy vector

for 



as v





v









,…,v





, and each dimension of

v





indicates the degree of 



’s interest to the

corresponding topic. The taxonomy vector in TPR

has three advantages over standard product rating

vector. Firstly, for most e-commerce sites  is much

smaller than ||, and therefore it can yield better

computational performances. Secondly, because the

taxonomy vector records the user taxonomy

preferences instead of item preference, and different

items can share their descriptors entirely or partially,

thus, even for users with no common item interests,

their profiles can still be correlated. Thirdly, the

construction of the taxonomy vector can be done

EFFICIENT NEIGHBOURHOOD ESTIMATION FOR RECOMMENDATION MAKING

with only implicit ratings, and therefore it

effectively solved the data sparsity problem.

3.2 Recommendation Generation

In this paper, the distances between user taxonomy

vectors are computed by Euclidean distance,

specifically:





,







∑





















 (1)

Based on the distance measure, target user u



‘s

neighbourhood cliqueu



 can be formed by

selecting n users from u



U\u



 with shortest

distances to u



. By extracting the items implicitly

rated by the neighbourhood, a candidate item list is

formed for u



’s personalized recommendation list,

formally:















|







\





(2)

The items in the candidate list B



need to be

ranked according to their closeness to the target

user’s personal interest. The ranking equation to

weight u



’s possible interest towards t



’s shown

below:













1









,







∑









,



















|











(3)

,where A









u



 cliqueu



|t



R



.

In equation (3), the computed score is negated

because the proximity measure is distance based (i.e.

small value indicates strong similarity), thus, by

negating the result score we allow larger weight

values of w









indicating higher item interests.

ut



 creates a dummy user for item t



, so the

proximity of the taxonomy vectors between u



and



can be measured. The conversion process simply

creates a user u



with R



t



.

Finally, after the candidate item weights are

computed, the top m items with highest weight

values are recommended to the target user.

4 PROPOSED APPROACH

In this paper, we identified two aspects in TPR that

can be improved.

Firstly, even though the product rating vectors

are compressed into taxonomy vectors with smaller

numbers of dimensionalities, however, for datasets

with a large amount of users and extensive

taxonomy structures, the neighbourhood formation

will become one of the computation efficiency

bottlenecks in TPR, because it requires an extensive

amount of I/O to retrieve user profiles (i.e.

taxonomy vectors) from the database, and the

proximity computation (i.e. equation (1)) for high

dimensional vectors is expensive as well.

Next, in Ziegler’s TPR implementation (Ziegler

et al., 2004), the “best-n-neighbours” is applied as

the neighbourhood selection method since “best-n-

neighbours” performs better than “correlation-

threshold” for sparse dataset (Ziegler et al., 2004).

However, because the value of  is pre-specified in

“best-n-neighbours”, it means that the resulting

neighbourhoods will be biased for users with true

neighbours of less than  (Li et al., 2003). This issue

is particularly sensible for users with unusual tastes,

as it is likely that a portion of their neighbourhoods

formed by “best-n-neighbours” might contain

neighbours that are dissimilar to them. For example,

if a user has distinct tastes, then he or she might only

share similar tastes with only 2 other users, the

recommendation result for this user might be biased

if a neighbourhood with 20 users are used.

In this paper, we propose a novel neighbourhood

estimation method which is both memory and

computation efficient. By substituting the proposed

technique with the standard “best-n-neighbours” in

TPR, the following two improvements are achieved:

z The computation efficiency of TPR is greatly

improved.

z The recommendation quality of TPR is also

improved as the impact of the “fixed 

neighbours” problem has been reduced. That is,

the proposed technique can help TPR locate the

true neighbours for a given target user (the

number of true neighbours might be smaller than

), therefore the recommendation quality can be

improved as only these truly closed neighbours

of the target user can be included into the

computation.

4.1 Relative Distance Filtering

Forming neighbourhood for a given user 





with standard “best-n-neighbours” technique

involves computing the distances between 



and all

other users and selecting the top  neighbours with

shortest distances to 



. However, unless the

distances between all users can be pre-computed

offline or the number of users in the dataset is small,

forming neighbourhood dynamically can be an

expensive operation.

Clearly, for the standard neighbourhood

formation technique described above, there is a

significant amount of overhead in computing

distances for users that are obviously far away (i.e.,

dissimilar users). The performance of the

ICEIS 2008 - International Conference on Enterprise Information Systems

neighbo

improve

dissimil

comput

exclusio

simple

very cl

distance

space sh

In F

dimensi

dot on t

and the

neighbo

randoml

set, and

comput

users).

Bas

to obse

distance

forming

comput







whi





whe









In e

distance

Modus

of an i

implicat

then 



distance

user 



neighbo

rhood for

if we e

r users f

tion. In the

or filterin

eometrical i

se to each

to a given

ould be simil

gure 1, a use

nal plane w

e plane. In

ots embrace

rs of 



selec

ing a

then 



’s d

d and sorte

Figure 1: P

d on the tria

ve that all



to 



. T





’s neigh

distances b

h is defined













e 



is an







uation (4),





fro







olens infere

plication is

on must be

on men

ion

nd 



are n

threshold. I

 can

rhood. If



ation can

clude most

om the

proposed R

process is

mplication: i

other in a

andomly sel

set  is proj

ere each use

he figure, 



by small cir

The RDF

reference u

stances to a

(



and 



ojected user p

gle inequalit





’s neighb

is means, i

ourhood,

tween 











|







abbreviate











is the





and 





ce rule, i.e.,

false, the

false, from

d above, if

t close to e









is l

excluded



is set to a

be drasti

of these

etailed dist

F method,

achieved wi

f two points

pace, then

cted point i

ected onto a

r is depicted

is the target

les are the to

ethod starts





in the

l other user

re also refer

ofiles.

theme, it is

urs have si

the proces

e only nee

the users i















denotation

difference o





. Accordi

if the conse

ntecedent o

the geomet









is l

ach other.

arger than

from the



larger value,

ally

ery

nce

this

h a

are

heir

the

wo-

as a

ser,

user

are

nce

easy

ilar

4

for

the

g to

uent

the

ical

rge,

is a

the





’s

the

dis

inc

wil

dis

nei

est

obt







)

fin

red

(i.e

tha

‘s

int





‘

int

Fig

use

ord

est

ord

ref

the

lar

rad

for

ance th

esho

uded in the

ormance wil

be included

ur experime

ance

etwee

hbour 







To furthe

mation, we

mple 



in more esti

)

. With multi

l estimated

ced by

.

















, the intersec

the entire s

most close

rsection are

‘

s neighbour

putations o

rsected spa

roved.

re 2: Estimat

Refere

reference us

er to optimiz

mated search

s small as p

er to achie

rence users

ause if the re

ring border

e overlap (si

uses). More

uld be kept s

all our expe

eference user

d is relaxed,

neighbourh

l be decreas

in the actual

nt,  is set t

the referen



optimize

an select mo





) into the

ated search

le estimate

searching sp

intersecti





). It can be

ed searching

t, and most i

users. O

need to be

ood. The a

ly need to b

e, thus the

d searching s

ce User S

r selection i

the perfor

ng space (i.e

ssible for an

e it, the d

eed to be a

erence users

of their se

ce they all h

ver, the nu

all (we onl

iments), bec

s increases, t

hus more us

od. In this

d because

istance com

the one te

e user and it

the neig

e reference

estimation

ng spaces (i.

searching s

ce can be

g these

observed in

space is mu

portantly, it

ly the user

hecked for

tual I/O an

conducted

efficiency i

ace with thre

lection

important fo

ance of TPR



















)

given targe

istances bet

far as poss

re close to e

rch spaces

ve similar c

ber of refere

use 3 refere

use when th

e time requir

rs can be

case, the

ore users

utations.

th of the

s furthest

borhood

sers (for

rocess to

. 





and

aces, the

rastically

spaces

Figure 2

h smaller

covers 



in the

etermine

distance

ithin the

greatly

reference

RDF. In

the final

)

needs to

users. In

een the

ble. It is

ch other,

ill result

ntres and

ce users

nce users

number

d for the

EFFICIENT NEIGHBOURHOOD ESTIMATION FOR RECOMMENDATION MAKING

offline r

required

too.

In o

initializ

first ref

comput

with th

second

argmax

the furt





ar

and set

method,

kept si

users ar

4.3

This se

of the p

4.1 and

ower o

First

distance

meant

comput

expensi

distance

offline

cache, a

the me

recomm

searchin

formati

depicte

In t

with a

user 





asicall

eference use

for caching

r implemen

d with a si

rence user 



its distance

computed

reference















est neighbo

max















as the t

it is ensured

ple and effi

also very di

roposed

tion describe

oposed RD

4.2. With th

RDF is max

of all, it i

between us

to be co

tion efficie

e than the o

are comp

nto a data s

nd the searc

ory in the i

ndation p

cache is

n processes

in Figure 3.

e searching

ata structur



, 



denot

stores two t

er ID: Inst

files or th

mory, only

red in the c

ntify and ret

database.



stances to

tances from

r to the re

ctor. In our

ee referenc

refore the d





,





,



.

initializatio

the sorted d

ation, the re

ple two-



is chosen r

to all other

distances w

user 









,



. Fina

r 



for 







,





ird referenc

hat the initia

ient, and th

tant from eac

DF Imple

s in detail th

method dis

proposed i

mized.

important

rs and refer

puted onli

cy of this

e by one sea

ted, structu

ructure calle

ing cache w

itialization s

ocess. Thi

hared by a

The detai

cache, each

called “use

s 



’s user

pes of infor

ad of fittin

user taxon

the user id

che. The us

ieve the actu

the Refere

the user nod

erence users

mplementati

users 



stance vecto

e denote the

and the me

stances inc

ference user

s technique.

ndomly, an

users in .

can obtain

ch that



lly, we again

nd 



such









user. With

ization proc

result refer

other.

entation

e implement

ussed in sec

plementation

to note that

nce users ar

e, because

rocess is

ch. Instead, t

ed and ind

RDF searc

ll be loaded

age of the o

pre-comp

l neighbour

led structur

ser is assoc

node”. For

ode. A user

ation for a u

the entire

my vector

s required t

r ids are us

al user profil

ce Users:

’s correspon

are stored

n, we have





and 



for user no

distance vect

ory

ease

are

The

ext,

the







find

that





 ,

this

ss is

nce

tion

ions

the

not

the

ore

ese

xed

ing

into

line

uted

ood

ated

any

ode

er:

user

into

d to

s in

The

ding

n a

only

and

e 



r of

sea

use

are

use

set

tre





dis

use

for

use



as 











, 





corr





respecti

In order to

ching space

ry tree stru

nodes. The

the distance

s, that is, the

h the three

be efficient

ings, that is,

of the three

Figure 3: Str

Because the

structure,

ation (4) i



||. Note,

cess is very

putation ca

ory (thus n

ause each i

parison of t

ances betwe

s are ne

ation proce

s are requir

ory require

rivial, beca









,





,





 w

sponds to 





ely.



efficiently

as describe

ture is used

index keys u

between the

index keys f

ifferent inde

y sorted wi

he user node

ndex keys.

cture for the R

user nodes a

the compu

optimized

this estimate

fficien

, not

be done wit

database I/

dex key lo

o double v

n the target

during

s, the user pr

d to be sto

ent for the

se there are

ere 





corre

and 





corre

etrieve the

d in equati

to index an

ed for each

user and the

r 



are 







keys, the u

h different i

s can be sort

F searching

e stored in t

ation effici

to 

d user space

nly because

in a small

is required),

kup involve

lues. Finally

sers and the

the neigh

files for the

ed in the c

eference use

only three

ponds to

stimate

n (4), a

sort the

ser node

reference







and 





er nodes

dex key

d by any

ache.

is binary

ncy for

, where

retrieval

he whole

mount of

it is also

s only a

because

reference

ourhood

reference

che. The

r profiles

reference

ICEIS 2008 - International Conference on Enterprise Information Systems

Given that the RDF searching cache is properly

initialized, the detailed RDF procedure is described

below:

RDF Algorithm

1) Let 



be the target user, n be the pre-specified

number of neighbours for 



2) Use the indexed tree structure to locate the

minimal user nodes set within the given

boundary:









|



,































where 







,



,



 which achieves

minimal search space. Note, the actual

implementation of 



’s computation can be

very efficient. By utilizing the pre-computed

searching cache, the estimation of user nodes

size does not involve looping through the user

nodes one by one.

3) Based on step 2, 



is the primary index key

used to sort and retrieve 



, and it is one of 







and 



. The rest two index keys (also in





,



,



) are denoted as 



and 



4) We refine the searching space 



by using

reference users 



and 



. This process is

similar to finding the intersected space



















as described in section 4.2

FOR 



 





















or 



































or 

















THEN

remove





from 



END IF

END FOR

5) Do the standard “best- n-neighbours” search

against the estimated searching space 



, and

return the result neighbourhood for 



5 EXPERIMENTS

This section presents empirical results obtained from

our experiment.

5.1 Experiment Setup

The dataset used in this experiment is the “Book-

Crossing” dataset (http://www.informatik.uni-

freiburg.de/~cziegler/BX/), which contains 278,858

users providing 1,149,780 ratings about 271,379

books. Because the TPR uses only implicit user

ratings, therefore we further removed all explicit

user ratings from the dataset and kept the remaining

716,109 implicit ratings for the experiment.

The goal of our experiment in this paper is to

compare the recommendation performance and

computation efficiency between standard TPR

(Ziegler et al., 2004) and the RDF-based TPR

proposed in this paper.

The k-folding technique is applied (where k is

set to 5 in our setting) for the recommendation

performance evaluation. With k -folding, every

user u



’s implicit rating list R



is divided into 5

equal size portions. With these portions, one of them

is selected asu



’s training set 





, and the rest 4

portions are combined into a test set 









\





Totally we have five combinations 





,





 ,

15 for user 



. In the experiment, the

recommenders will use the training set 





learn



’s interest, and the recommendation list 





generated for 



will then be evaluated according to







. Moreover, the size for the neighbourhood

formation is set to 20 and the number of items

within each recommendation list is set to 20 too.

For the computation efficiency evaluation, we

implemented four different versions of TPRs, each

of them is equipped with different neighbourhood

formation algorithms. The four TPR versions are:

z Standard TPR: the neighbourhood formation

method is based on comparing the target user to

all users in the dataset.

z RDF based TPR: the proposed RDF method is

used to find the neighbourhood.

z RTree based TPR: the RTree (Manolopoulos et

al., 2005) is used to find the neighbourhood.

RTree is a tree structure based neighbourhood

formation method, and it has been widely applied

in many applications.

z Random TPR: this TPR forms its

neighbourhood with randomly chosen users. It is

used as the baseline for the recommendation

quality evaluation.

The average time required by standard, RTree

based and the RDF based TPRs to make a

recommendation will be compared. We

incrementally increase the number of users in the

dataset (from 1000, 2000, 3000 until 14000), and

observe how the computation times are affected by

the increments.

In this paper, the precision and recall metric is

used for the evaluation of TPR, and its formulas are

listed below:

  100|











|/|





|



(5)

  100|











|/|





|(6)

EFFICIENT NEIGHBOURHOOD ESTIMATION FOR RECOMMENDATION MAKING

5.2

Figure

etween

based T

The ho

charts i

the use

coordin

the eval

RDF b

oth re

when t

neighbo

recomm

erform

and the

allocate

Fig

esult Ana

4 shows t

the standar

R using th

izontal axis

dicates the

’s profile (i

tes imply tha

ation. It can

sed TPR ou

all and preci

e dissimilar

rhood, th

ndations

much wors

standard TP

eighbours f

re 4: Recomm

sis

e perform

TPR and t

precision a

for both pre

inimum nu

.e. |



|). Th

fewer users

be observed

tperformed s

sion. The re

users are r

quality

ome better.

than both th

, as it is un

r target users

ndation preci

nce compa

e proposed

d recall me

cision and r

ber of ratin

refore large

re considere

that the prop

tandard TP

ult confir

moved from

of the r

Tree based

RDF based

ble to accur

ion and recall.

ison

rics.

call

s in

for

sed

for

that

the

sult

tely

for

acc

rec

inc

use

dro

lev

est

rec

dif

The efficien

be seen fro

standard TP

sers in the

00 users, th

duce a reco

eptable for

parison, the

it only nee

mmendation

ee based TP

hod when th

er 8000. H

eases in th

F and RTre

F starts outp

s in the dat

ee is only e

ever, as t

ber of use

s drasticall

ensional

dratically in

l. The propo

hod because

ed, and it r

ensional vec

Figure 5:

CONCL

his

aper, w

mation meth

embeddin

mmender, n

the system

lity is also

erent from t

y evaluation

Figure 5 t

drops drasti

dataset incre

system nee

mendation f

most com

RDF based

s less than 4

for dataset

greatly out

e number of

wever, as

dataset, th

based TPR

rforms RTre

set is over

ficient when

e tree level

s increases)

because

vector co

accordance

ed RDF met

its indexing

duces the p

or correlatio

verage reco

SIONS

presented

d for recom

RDF w

t only the

s improved,

improved.

e clus

ering

s shown in F

, the time

ally when th

ses. For dat

s about 14 s

r a user, an

ercial syst

PR is much

seconds to

ith 15000

erforms the

users in the

he number

differences

becomes sm

e when the

000. This i

the

ree leve

increases (

RTree’s pe

he chance

parison

o the numb

od outperfor

trategy is si

ssibility for

computation

mendation tim

novel neigh

enders, na

th a TP

omputation

the recom

The RDF

based neigh

gure 5. It

fficienc

e number

aset with

conds to

it is not

ms. By

efficient,

roduce a

sers. The

proposed

ataset is

of users

between

ller, and

umber of

because

is small.

.e. when

formance

for high

increases

r of tree

s RTree

gle value

the high

ourhood

ely RDF.

based

fficiency

endation

ethod is

ourhood

ICEIS 2008 - International Conference on Enterprise Information Systems

formation methods that use offline computed

clusters as the neighbourhoods. Instead, our method

forms neighbourhood for any given target users

dynamically from scratch (thus is more accurate than

cluster based approaches) in an efficient manner. In

our experiment, it is shown that the proposed

method improves both recommendation quality and

computation efficiency for the standard TPR

recommender.

REFERENCES

awerbuch, B., Patt-Shamir, B., Peleg, D. & Tuttle, M.

(2005) Improved recommendation systems.

Proceedings of 16th Annual ACM-SIAM symposium

on Discrete algorithms. Vancouver, British Columbia.

Bentley, J. L. (1990) K-d Trees for Semidynamic Point

Sets. 6th Annual Symposium on Computational

Geometry Berkley, California, United States, ACM

Press.

Li, B., Yu, S. & Lu, Q. (2003) An Improved k-Nearest

Neighbor Algorithm for Text Categorization.

Proceedings of the 20th International Conference on

Computer Processing of Oriental Languages.

Shenyang, China.

Manolopoulos, Y., Nanopoulos, A., Papadopoulos, A. N.

& Theodoridis, Y. (2005) R-Trees: Theory and

Applications, Springer.

Sarwar, B., Karypis, G., Konstan, J. & Riedl, J. (2002)

Recommender systems for large-scale e-commerce:

Scalable neighborhood formation using clustering.

Proceedings of 5th International Conference on

Computer and Information Technology.

Schafer, J. B., Konstan, J. A. & Riedl, J. (2000) E-

Commerce Recommendation Applications. Journal of

Data Mining and Knowledge Discovery, 5, 115-152.

Ziegler, C.-N., Lausen, G. & Schmidt-Thieme, L. (2004)

Taxonomy-driven Computation of Product

Recommendations International Conference on

Information and Knowledge Management

Washington D.C., USA

Ziegler, C.-N., Mcnee, S. M., Konstan, J. A. & Lausen, G.

(2005) Improving Recommendation Lists Through

Topic Diversification. Proceedings of 14th

International World Wide Web Conference. Chiba,

Japan.

EFFICIENT NEIGHBOURHOOD ESTIMATION FOR RECOMMENDATION MAKING