significantly when dealing with huge number of
users, because the user preferences in TPR are
represented by high dimensional vectors. We
applied the proposed RDF technique to the TPR and
the experiment results show that by utilizing the
proposed technique, both the accuracy and
efficiency of TPR are significantly improved.
2 RELATED WORK
Neighbourhood formation is a process required by
most collaborative filtering based recommenders to
find users with similar interests to the target user.
Sarwar (Sarwar et al., 2002) proposed an efficient
neighbourhood selection method by pre-computing
users into clusters. However, clustering is an
expensive process and can only be done offline.
Datasets keep changing over time. Therefore the
overall quality of the result neighbourhood based on
existing clusters will degrade until the next
clustering update. Moreover, clustering based
neighbourhood selection favours target users nearby
cluster centres, and for other users located at
surrounding cluster edges the quality of their result
neighbourhoods are usually poor because their
actual neighbours are very likely in other clusters
(Sarwar et al., 2002). There are also several
neighbourhood formation algorithms developed
specifically for high dimensional data, such as
RTree (Manolopoulos et al., 2005), kd-Tree
(Bentley, 1990), etc. The basic idea behind these
algorithms is to index these high dimensional data
into a search tree structure, and within each level,
the children nodes subdivides the cluster their parent
node holds into finer clusters and each tree node
holds one of the cluster spaces. The search
efficiency of these algorithms is very impressive,
because the search space are quadratically reduced
in each tree level (i.e. O(logN)). However, they
suffer from similar problems to cluster based
neighbourhood search, which is “loss of precision”.
In fact, these algorithms usually produce worse
result than clustering based method. Moreover,
because the internal tree structures for indexing the
data are fairly complex, therefore these algorithms
are usually memory intensive and slow in
initialization. The proposed RDF technique is not as
good as these tree-structure based methods in terms
of computation efficiency, however it is still more
efficient than cluster based search method. In terms
of accuracy, the proposed method produces much
better result than these tree-structure based methods
because it does not constrain neighbourhood search
within local clusters. The internal structure of the
proposed RDF technique can be updated
dynamically in real time and requires only very
small amount of physical memory.
3 TAXONOMY PRODUCT
RECOMMENDER
An overview of taxonomy-driven product
recommender (TPR) proposed by Ziegler (Ziegler et
al., 2005, Ziegler et al., 2004) is given in this
section.
3.1 Item Taxonomy Model
We envision a world with a finite set of users
,
,…,
and a finite set of items
,
,…,
. For each user
, he or she
is associated with a set of corresponding implicit
ratings
, where
. Unlike explicit ratings in
which users are asked to supply their perceptions to
items explicitly in a numeric scale, implicit ratings
such as transaction histories, browsing histories, etc.,
are more common and obtainable for e-commerce
sites and communities.
In standard collaborative filtering
recommenders, user profiles are represented by -
dimensional vectors, where || and each
dimension represents an explicit item rating.
However, for many systems, can be very large
and the number of ratings made by each user can be
very small. This problem is often addressed as cold
start problem or data sparsity problem.
Data sparsity problem is relieved with TPR,
because instead of using the product-rating vectors
with || dimensionalities as user profiles, TPR uses
taxonomy vectors with dimensionalities, where
is the number of topics in the product taxonomy
space. Specifically, we denote the taxonomy vector
for
as v
v
,v
,…,v
, and each dimension of
v
indicates the degree of
’s interest to the
corresponding topic. The taxonomy vector in TPR
has three advantages over standard product rating
vector. Firstly, for most e-commerce sites is much
smaller than ||, and therefore it can yield better
computational performances. Secondly, because the
taxonomy vector records the user taxonomy
preferences instead of item preference, and different
items can share their descriptors entirely or partially,
thus, even for users with no common item interests,
their profiles can still be correlated. Thirdly, the
construction of the taxonomy vector can be done
EFFICIENT NEIGHBOURHOOD ESTIMATION FOR RECOMMENDATION MAKING
13