CONTENT-BOOSTED COLLABORATIVE FILTERING USING

SEMANTIC SIMILARITY MEASURE

Ugur Ceylan

Department of Computer Engineering, Baskent University, Ankara, Turkey

Aysenur Birturk

Department of Computer Engineering, METU, Ankara, Turkey

Keywords:

Content-boosted collaborative ﬁltering, Semantic similarity, Ontology, Sparsity, Item cold-start.

Abstract:

Collaborative ﬁltering is one of the most used recommendation approaches in recommender systems. How-

ever, collaborative ﬁltering systems have some major problems such as sparsity, scalability and cold-start

problems. In this paper we focus on the sparsity and item cold-start problems in collaborative ﬁltering in order

to improve the quality of recommendations. We propose an approach that uses semantic similarities between

items based on a priori deﬁned ontology-based metadata in the movie domain. According to the semantic sim-

ilarities between items and past user preferences, recommendations are made. The results of the evaluation

phase show that our approach improves the quality of recommendations.

1 INTRODUCTION AND

RELATED WORK

The aim of recommender systems is to predict the

valuable information/items for a user and recommend

these items. Some examples of items that are rec-

ommended by recommender systems are web pages,

movies, music, books, restaurants, etc.

One of the most commonly used recommendation

approaches in recommender systems is collaborative

ﬁltering. The idea behind collaborativeﬁltering is that

similar users almost have the same opinion about an

item. Collaborative ﬁltering systems try to ﬁnd the

similarities between active user and other users in the

system, and then recommend items to the active user

by taking into account these similarities. But, col-

laborative ﬁltering systems suffer from some prob-

lems such as sparsity and item cold-start problems

(Melville et al., 2002; Claypool et al., 1999).

Content-based ﬁltering is another recommenda-

tion approach that is used widely. In content-based

ﬁltering, the system tries to recommend items which

have similar contents with the items that are liked

by the users. But also, content-based ﬁltering sys-

tems have some major problems (Balabanov´ıc and

Shoham, 1997). For some domains in which extract-

ing content of items is difﬁcult and content of items

are insufﬁcient to express items, content-based ﬁlter-

ing is not a suitable recommendation approach. An-

other problem in content-based approach is that it

tends to recommend items that are similar to those

already highly rated. This problem is called over-

specialization problem.

Some techniques are used in order to cope with

sparsity and item cold-start problems. The simplest

technique, which is used to overcome the sparsity

problem, is called default voting (Breese et al., 1998).

In this technique, a default rating is inserted for items

which don’t have rating values given by either one of

them or the other. Thus, the number of overlapping

rated items by both users is increased. The other tech-

nique for dealing with the sparsity problem is using

dimensionality reduction techniques such as Singu-

lar Value Decomposition (SVD) (Sarwar et al., 2000).

By applying SVD, user-item rating matrix may be-

come less sparse.

Hybrid recommendation approach is gener-

ally implemented by combining collaborative and

content-based ﬁltering approaches to cope with the

drawbacks of these two ﬁltering methods (Bala-

banov´ıcand Shoham, 1997). Some hybridapproaches

are as follows. One approach to use both content-

based and collaborative ﬁltering approaches is com-

bining them. In this approach, system generates rec-

ommendations by using content-based and collabo-

rative ﬁltering approaches and then combines these

366

Ceylan U. and Birturk A..

CONTENT-BOOSTED COLLABORATIVE FILTERING USING SEMANTIC SIMILARITY MEASURE.

DOI: 10.5220/0003402403660371

In Proceedings of the 7th International Conference on Web Information Systems and Technologies (WEBIST-2011), pages 366-371

ISBN: 978-989-8425-51-5

 2011 SCITEPRESS (Science and Technology Publications, Lda.)

independent recommendations (Pazzani, 1999). An-

other approach gives some weight values to collab-

orative and content-based predictions. Then hybrid

approach takes the weighted sum of the predicted rat-

ings. And ﬁnally items that are recommended are se-

lected based on the calculated weighted sum (Clay-

pool et al., 1999). Also content-based ﬁltering is used

to complete the missing data in the user-item rating

matrix. Then collaborative ﬁltering approach is used

to recommend items to users. This hybrid approach

is also called content-boosted collaborative ﬁltering

(Melville et al., 2002).

In this paper, we propose a hybrid approach based

on content-boosted collaborative ﬁltering presented

in (Melville et al., 2002). The purpose of our ap-

proach is to cope with sparsity and item cold-start

problems of collaborative ﬁltering. Our approach uti-

lizes both content-based and collaborative ﬁltering.

The crucial point in this study is that the content-

based ﬁltering in our approach uses semantic sim-

ilarity measures on ontology-based metadata, based

on the studies in (Maedche and Zacharias, 2002) and

(Lula and Paliwoda-Pekosz, 2008), instead of naive

Bayesian classiﬁer (Mitchell, 1997) which is men-

tioned in (Melville et al., 2002). After that, col-

laborative ﬁltering is performed using enhanced data

to overcome over-specialization problem in content-

based ﬁltering.

2 PROPOSED APPROACH

The ﬂowdiagram of our approach, called SEMCBCF,

is shown in the Figure 1. SEMCBCF consists of three

phases:

1. Generating ontology-based metadata

2. Finding enhanced user-item rating matrix by us-

ing content-based ﬁltering

3. Using collaborative ﬁltering on enhanced user-

item matrix

2.1 Generating Ontology-based

Metadata

In order to ﬁnd semantic similarities between items,

ontology model and metadata model have to be de-

ﬁned. Ontology and metadata models are deﬁned as

follows (Maedche and Zacharias, 2002):

O := {

A,H

, prop,att} (1)

MD := {O,

L,inst,instl,instr} (2)

For ontology model,

P and

A are sets which con-

sist of concepts, relations and attributes’ identiﬁers

User-Item

Rating

Matrix

Content-based

Filtering using

Semantic Similarity

Ontology-based

Metadata

Enhanced

User-Item

Rating

Matrix

Collaborative

Filtering

Ontology-based

Metadata Creation

Movie

Ontology

IMDb

Active User

Ratings

Vector

Recommendations

Figure 1: System Overview.

respectively. H

is called concept taxonomy

which deﬁnes the hierarchical relations between con-

cepts. prop and att are functions that deﬁne non-

taxonomical relations. For metadata model,

I and

are sets which consist of instances and literal values

respectively. inst, instr, instl are functions that de-

ﬁne concept instantiation, relation instantiation and

attribute instantiation. Predicates and their meanings

that are used in the following sections are shown in

the Table 1.

Table 1: Predicates and Meanings.

Predicate Meaning

) C

is a subconcept of C

P(C

) P is a relation with

domain C

and range C

A(C

) A is an attribute of C

C(I) I is a instance of concept C

P(I

) Instance I

has a P relation

to instance I

A(I

,L) Instance I

has an A attribute

with value of L

To deﬁne movie ontology, we use a free, open

source ontology editor and knowledge-base frame-

work called Protege (http://protege.stanford.edu). By

using Protege, we create a number of different movie

ontologies manually. Then, the metadata are gener-

ated based on the deﬁned movie ontology and the

content of movies extracted from IMDb (The Inter-

net Movie Database, htt p : //www.imdb.com). Con-

CONTENT-BOOSTED COLLABORATIVE FILTERING USING SEMANTIC SIMILARITY MEASURE

367

tent of a movie is represented as 10 dimensions which

consist of cast, director, writer, language, genre, run-

time, release date, country, color and average rating

given by IMDb users. A feature of a dimension can

be an instance or a concept or a literal in ontology.

In order to evaluate our approach, four different

ontologies are used. Ontology1 is the basic ontology

used in our approach. All features of genre dimension

are sub-concept of Movie concept. For each remain-

ing dimensions, a concept exists in Ontology1. And

the features of a dimension are instances of its corre-

sponding concept. Ontology2 is similar to Ontology1.

Only difference between them is that concepts repre-

sent dimensions except genre has more sub-concepts

in Ontology2 than it has in Ontology1. For example

the concept that represents runtime dimension has a

number of sub-conceptsthat deﬁnes the runtime inter-

vals. The only difference between Ontology1 and On-

tology3 is that runtime, release date and average rat-

ing dimensions are represented as attributes in Ontol-

ogy3. In Ontology4, the features of genre dimensions

are grouped into six sets and a set represents a con-

cept. A feature of genre dimension is a sub-concept

of its corresponding concept.

2.2 SEMCBF:Content-based Filtering

using Semantic Similarity

In order to recommend items, a similarity measure be-

tween items has to be deﬁned ﬁrst. And then, by us-

ing the active user model, which is a vector that con-

sists of user’s ratings, and similarities between items,

we predict the ratings of unrated items which will

be given by the active user. In SEMCBF, to calcu-

late similarities between items which are described by

ontology-based metadata, we use three types of sim-

ilarity measures(Maedche and Zacharias, 2002): tax-

onomy similarity (TS), relation similarity (RS), and

attribute similarity (AS).

Taxonomy similarity between two instances (TS)

is based on their corresponding concepts’ positions in

concept taxonomy (H

) which is deﬁned in ontology

model. Basically, the idea behind taxonomy similar-

ity is that closer concepts in taxonomy are more sim-

ilar.

An instance can be instance-of two different con-

cepts in ontology. So, to ﬁnd taxonomy similarities

between instances (TS), ﬁrst, taxonomy similarities

between concepts (TSC) have to be deﬁned.

In order to calculate TSC, four different meth-

ods, TSC

, TSC

Wu&Palmer

, TSC

Lin

and TSC

Mclean

the studies (Maedche and Zacharias, 2002), (Wu and

Palmer, 1994), (Lin, 1998), (Li et al., 2003) respec-

tively, are used.

After ﬁnding the taxonomy similarity between

concepts, calculating taxonomy similarity between

instances is reduced to calculating the similarity of

two sets. So TS is deﬁned as follows:

TS(I

) = SSIM(CSET(I

),CSET(I

)) (3)

where CSET(I) = {C ∈

C|C(I)}.

Similarity between two sets can be found using

the similarities between their elements, in this case

TSC of concepts, and using different methods. These

methods are mentioned later in this section.

The second type of similarity measure using

ontology-based metadata is relation similarity. Rela-

tion similarity (RS) between two instances is based

on their relations to other instances in ontology-based

metadata. For relation similarity measure, we use

a modiﬁed version of relation similarity measure in

(Maedche and Zacharias, 2002). RS between I

and

can be calculated as follows:

RS(I

) =

∑

p∈P

co−I

OR(I

, p, IN)

co−I

| + |P

co−O

∑

p∈P

co−O

OR(I

, p, OUT)

co−I

| + |P

co−O

(4)

co−I

stands for ’incoming relations’ and is

the set of relations that allows UC(C(I

),H

) and

UC(C(I

),H

) as range where UC(C

) = {C

∈

C|H

) ∨ C

= C

}. P

co−O

stands for ’outgo-

ing relations’ and is the set of relations that allows

UC(C(I

),H

) and UC(C(I

),H

) as domain.

OR(I

, p, DIR) stands for the similarity for re-

lation p and direction DIR between instances I

and

where DIR ∈ {IN,OUT}. OR(I

, p, DIR) can

be calculated by considering associated instances of

and I

with respect to relation P and direction DIR.

Associated instances (A

) of instance I

with respect

to relation P and direction DIR is as follows:

(P, I

,DIR) =



: I

∈

I ∧ (P(I

)}, if DIR = IN

: I

∈

I ∧ (P(I

)}, if DIR = OUT

(5)

After deﬁning A

, calculating OR(I

, p, DIR) is re-

duced to calculating the similarity of two sets that

contains associated instances. So OR is deﬁned as

follows:

OR(I

, p,DIR) = SSIM(A

(P, I

,DIR),A

(P, I

,DIR))

(6)

If A

(P,I

,DIR) =

0 or A

(P,I

,DIR)) =

0 then

OR(I

, p, DIR) is 0. The point is that to calculate

SSs of instances, RS is used and to calculate RSs of in-

stances, SSs of associated instances are used. In order

to avoid inﬁnite cycles, a maximum depth of recur-

sion has to be deﬁned.

WEBIST 2011 - 7th International Conference on Web Information Systems and Technologies

368

The advantage of using relation similarity is that

the similarities of features are taken into account.

Suppose that, in a system, movies have only one fea-

ture which is an actor played in movie and we try to

ﬁnd the similarity between two movies, MovieX and

MovieY. MovieX has a feature ActorA and MovieY

has a feature ActorB. If a user only rated movies in

which only ActorA played, it is unable to predict the

rating of MovieY by using naive Bayesian classiﬁer.

But in SEMCBF, similarity of MovieX and MovieY

depends of the similarity between ActorA and Ac-

torB. In a recursive manner, the similarity of ActorA

and ActorB depends on similarity of other instances

which have relations with ActorA and ActorB. Thus,

we can calculate a similarity value between these two

movies and a prediction can be made.

Attribute similarity is the third similarity mea-

sure that is used to calculate semantic similarities in

ontology-based metadata. AS between instances I

and I

is as follows:

AS(I

) =

∑

a∈P

OA(I

,a)

(7)

where P

represents the set of attributes that are at-

tributes of both UC(C(I

),H

) and UC(C(I

),H

OA(I

,a) is the similarity for attribute a be-

tween instances I

and I

. Like calculation of OR,

OA(I

,a) is calculated by considering associated

literals of I

and I

with respect to the attribute a. As-

sociated literal (A

) of I

with respect to the attribute

A is the following:

(A,I

) =



, if L

∈

L∧ A(I

)

0, otherwise

(8)

The difference between A

and A

is that A

can con-

tain at most one literal unlike A

. Thus, rather than

calculating similarity of two sets, similarity between

attribute values is focused in order to calculate OA

which is as follows:

OA(I

,a) = LSIM(L

,a) (9)

where L

= A

(a,I

) and L

= A

(a,I

). If L

0 or

0 then OA(I

,a) is 0. In our approach, all

of the attributes used are numeric features of the in-

stances like release date (as a year), runtime etc. So,

the similarity between two numeric values (L

and L

)

of an attribute (a) is as follows:

LSIM(L

,a) = 1−

− L

)

MDIF(a)

(10)

where

MDIF(A) = max{(L

− L

) : A(I

) ∧

A(I

) ∧ I

∈

I} (11)

In order to calculate TS and RS, we have to de-

ﬁne the similarity between sets of elements. Elements

of these sets are, concepts for TS calculation and in-

stances for RS calculation. Similarities of elements

are TSCs for TS and SSs for RS. The ﬁrst three meth-

ods SSIM

, SSIM

and SSIM

used for calculating the

similarity between sets are the methods in the studies

(Maedche and Zacharias, 2002), (Tintarev and Mas-

thoff, 2006), (Bach and Kuntz, 2005) respectively.

The other methods used for calculating the sim-

ilarity between sets are based on the methods used

for calculating the distance of pair of clusters in

hierarchical clustering algorithms. These methods

are single-link (SSIM

), complete-link (SSIM

) and

average-link (SSIM

) (Maimon, 2005).

Up to now, the taxonomy, relation and attribute

similarities between instances are deﬁned. Now, we

can combine these measures by giving them some

weight values. Semantic similarity (SS) between two

instances is deﬁned as follows:

SS(I

) =

aTS(I

) + bRS(I

) + cAS(I

)

a+ b+ c

(12)

The last step of SEMCBF is prediction of the un-

known ratings of the items given by users. In order

to predict the unknown ratings, our approach uses

the calculated similarities between items and user

model which consists of ratings given by the user

on a neighborhood-based method (Herlocker et al.,

1999)(Sarwar et al., 2001). To compute a prediction,

two prediction functions can be used after selecting

the k most similar items. First function calculates the

predicted rating of user u on item i by taking the av-

erage of ratings given by the user u on k most similar

items to i. Second function calculates the predicted

rating of user u on item i by taking the weighted aver-

age of the ratings given by the user u on k most similar

items to i. Weights of the ratings are set according to

the similarities between items.

Using user-item rating matrix, SEMCBF creates

enhanced user-item rating matrix. In other words, the

sparsity of user-item rating matrix is reduced. And

also, even if an item has no explicit rating given by

any user in the system, by using SEMCBF, our ap-

proach predicts a rating given by every user for that

item.

2.3 Collaborative Filtering

In the third phase of our approach, a neighborhood-

based (Herlocker et al., 1999) collaborative ﬁltering

algorithm is used on enhanced user-item matrix and

active user ratings vector which consists of only ac-

tual ratings given by the active user. The algorithm

CONTENT-BOOSTED COLLABORATIVE FILTERING USING SEMANTIC SIMILARITY MEASURE

369

ﬁrst computes the similarity between the active user

and other users in enhanced user-item matrix using

Pearson correlation. After computing similarities be-

tween active user and other users, n number of most

similar users is selected. An unknown rating is pre-

dicted by calculating the adjusted weighted sum of

the nearest neighbors’ ratings of active users.

At the end of a recommendation process, the sys-

tem recommends a number of unrated items which

have the highest predicted rating to the active user.

3 EXPERIMENTAL EVALUATION

The performance of proposed approach was evaluated

in the movie domain by using the MovieLens 100k

dataset (http://www.grouplens.org) which is publicly

available. We apply 5-fold cross-validation on the

disjoint test sets (20% of rating data) and their cor-

responding training sets (80% of rating data) that are

also provided in MovieLens 100k dataset. In ex-

perimental evaluation, we use mean absolute error

(MAE), precision, recall and F-measure performance

metrics which are commonly used to evaluate the per-

formance of recommender systems (Herlocker et al.,

1999). The evaluation of our approach consists of two

phases. In the ﬁrst phase of the evaluation we try to

ﬁnd the most appropriate values for SEMCBF param-

eters. In the second phase, the results of SEMCBF

and SEMCBCF is compared with some other ap-

proaches.

SEMCBF consists of some parameters as men-

tioned in section 2. These parameters and their possi-

ble values are shown in Table 2. It is obvious that the

performance of SEMCBF depends on the values of

these parameters. To ﬁnd the most appropriate value

of a parameter, performance of SEMCBF is evalu-

ated using all possible combinationsof this parameter,

O and k while the values of other parameters remain

constant. In each evaluation, the determined value as

the most appropriate value for a parameter is used in

later evaluations.

At the beginning of the evaluation, parameters rd,

a, b, c, TSC, SSIM

, SSIM

are set to 1, 0.4, 0.3,

0.3, TSC

, SSIM

SSIM

, SSIM

SSIM

respectively. The

parameters are analyzed in the following order; PF,

TSC, SSIM

, SSIM

, a, b, c, rd. SEMCBCF using

the values Ontology4 for O, 10 for k, pred2 for PF,

TSC

Lin

for TSC, SSIM

for SSIM

and SSIM

, 0.1

for a, 0.1 for b, 0.8 for c, 2 for rd gives the best result.

SEMCBF is used for enhancing user-item matrix

in SEMCBCF. So the performance of SEMCBCF is

dependent to the performance of SEMCBF. Because

of that, in this evaluation phase, both the performance

Table 2: Parameters and Possible Values of SEMCBF.

Parameter Values

Ontology (O) Ontology1

Ontology2

Ontology3

Ontology4

Max. Recursive Depth (rd) 0,1,2,3,4,

5,6,7,8

Weight of TS (a)

Weight of RS (b) (a+ b+ c) = 1

Weight of AS (c)

Measure for Taxonomy TSC

Similarity Between TSC

Wu&Palmer

Concepts (TSC) TSC

Lin

TSC

Mclean

SSIM Method for TS (SSIM

) SSIM

, SSIM

SSIM

, SSIM

SSIM Method for RS (SSIM

) SSIM

, SSIM

Number of Nearest 5,10,15,20,

Neighbors (k) 30,50,100,200

Prediction Function (PF) pred1, pred2

of SEMCBF and SEMCBCF are compared with some

other approaches. Table 3 gives precision, recall and

F-measure results of SEMCBF, SEMCBCF and some

approaches obtained from (Karaman, 2010). And also

CBCF (Melville et al., 2002) is implemented using

the same dataset to make a fair comparison. It can

be seen from Table 3, SEMCBF and SEMCBCF out-

perform the other approaches in recommending high-

quality items.

Table 3: Comparison of SEMCBF with Other Approaches.

Approach Prec. Rec. F-Measure

(%) (%) (%)

MovieLens 66 74 69,8

MovieMagician 61 75 67,3

Feature-Based

MovieMagician 74 73 73,5

Clique-Based

MovieMagician 73 56 63,4

Hybrid

OPENMORE 75,2 73,7 74,4

ReMovender 72 78 74,9

CBCF 60 95,2 73,6

SEMCBF 63,4 92,3 75,2

SEMCBCF 63,7 93,1 75,6

4 CONCLUSIONS

This paper presents a hybrid approach, which uses

both content-based and collaborative ﬁltering, in or-

der to overcome the sparsity and item cold-start prob-

WEBIST 2011 - 7th International Conference on Web Information Systems and Technologies

370

lems of collaborative ﬁltering. The presented ap-

proach is based on content-boosted collaborative ﬁl-

tering presented in (Melville et al., 2002). Our hybrid

approach (SEMCBCF) ﬁrst uses content-based ﬁlter-

ing (SEMCBF) to enhance the user-item similarity

matrix, then performs collaborativeﬁltering using this

enhanced user-item matrix. The contribution of our

approach is that it uses semantic similarity measures

on ontology-based metadata to calculate the similari-

ties of items in content-based ﬁltering. Our hypothe-

sis was that using semantic similarity measures rather

than naive Bayesian classiﬁer (Mitchell, 1997) which

is used in (Melville et al., 2002) will improvethe qual-

ity of recommendations.

In the evaluation phase, ﬁrst, SEMCBF was

ﬁne-tuned by determining the values of its parame-

ters. Then, using the determined values, SEMCBF

and SEMCBCF was evaluated. The results showed

that SEMCBF and SEMCBCF outperforms content-

boosted collaborative ﬁltering presented in (Melville

et al., 2002) and some other approaches.

The characteristics of the ontology, such as the

taxonomy of concepts and representation of features

signiﬁcantly effect performance of SEMCBF. For

further research, ontology reﬁnement will be focused

to improve SEMCBF. And also, SEMCBF will be

improved by assigning some weight to relations and

attributes in the ontology.

REFERENCES

Bach, T. L. and Kuntz, R. D. (2005). Measuring similarity

of elements in owl dl ontologies. In Proceedings of

the AAAI’05 Workshop on Contexts and Ontologies:

Theory, Practice and Applications, pages 96–99.

Balabanov´ıc, M. and Shoham, Y. (1997). Fab: Content-

based, collaborative recommendation. Communica-

tions of the ACM, 40(3):66–72.

Breese, J. S., Heckerman, D., and Kadie, C. (1998). Em-

pirical analysis of predictive algorithms for collabora-

tive ﬁltering. In Proceedings of the Fourteenth Con-

ference on Uncertainty in Artiﬁcial Intelligence (UAI-

98), pages 43–52, San Francisco. Morgan Kaufmann.

Claypool, M., Gokhale, A., Miranda, T., Murnikov, P.,

Netes, D., and Sartin, M. (1999). Combining content-

based and collaborative ﬁlters in an online newspaper.

In Proceedings of ACM SIGIR Workshop on Recom-

mender Systems.

Herlocker, J. L., Konstan, J. A., Borchers, A., and Riedl,

J. (1999). An algorithmic framework for perform-

ing collaborative ﬁltering. In Proceedings of the

22nd annual international ACM SIGIR conference on

Research and development in information retrieval,

pages 230–237, New York, NY, USA. ACM Press.

Karaman, H. (2010). Content based movie recommendation

system empowered by collaborative missing data pre-

diction. M.Sc. Thesis in Computer Engineering De-

partment of Middle East Technical University.

Li, Y., Bandar, Z. A., and McLean, D. (2003). An approach

for measuring semantic similarity between words us-

ing multiple information sources. IEEE Transactions

on Knowledge and Data Engineering, 15:871–882.

Lin, D. (1998). An information-theoretic deﬁnition of sim-

ilarity. In ICML ’98: Proceedings of the Fifteenth In-

ternational Conference on Machine Learning, pages

296–304, San Francisco, CA, USA. Morgan Kauf-

mann Publishers Inc.

Lula, P. and Paliwoda-Pekosz, G. (2008). An ontology-

based cluster analysis framework. In 7th International

Semantic Web Conference (ISWC2008).

Maedche, A. and Zacharias, V. (2002). Clustering ontology-

based metadata in the semantic web. In Elomaa, T.,

Mannila, H., and Toivonen, H., editors, Proceedings

of the 6th European Conference on Principles of Data

Mining and Knowledge Discovery (PKDD 2002), Au-

gust 19-23, 2002, Helsinki, Finland, volume 2431 of

Lecture Notes in Computer Science, pages 383–408.

Springer, Berlin–Heidelberg, Germany.

Maimon, O. (2005). Decomposition Methodology For

Knowledge Discovery And Data Mining: Theory And

Applications (Machine Perception and Artiﬁcial Intel-

ligence). World Scientiﬁc Publishing Co., Inc., River

Edge, NJ, USA.

Melville, P., Mooney, R. J., and Nagarajan, R. (2002).

Content-boosted collaborative ﬁltering for improved

recommendations. In Proceedings of the Eighteenth

National Conference on Artiﬁcial Intelligence, pages

187–192.

Mitchell, T. M. (1997). Machine Learning. McGraw-Hill

International Edit.

Pazzani, M. J. (1999). A framework for collaborative,

content-based and demographic ﬁltering. Artiﬁcial In-

telligence Review, 13:393–408.

Sarwar, B., Karypis, G., Konstan, J., and Reidl, J. (2001).

Item-based collaborative ﬁltering recommendation al-

gorithms. In WWW ’01: Proceedings of the 10th inter-

national conference on World Wide Web, pages 285–

295, New York, NY, USA. ACM.

Sarwar, B. M., Karypis, G., Konstan, J. A., and Riedl, J. T.

(2000). Application of dimensionality reduction in

recommender systems: A case study. In WebKDD

Workshop at the ACM SIGKKD.

Tintarev, N. and Masthoff, J. (2006). Similarity for news

recommender systems. In Proceedings of the AH06

Workshop on Recommender Systems and Intelligent

User Interfaces.

Wu, Z. and Palmer, M. (1994). Verb semantics and lex-

ical selection. In Proc. of the 32nd annual meeting

on Association for Computational Linguistics, pages

133–138.

CONTENT-BOOSTED COLLABORATIVE FILTERING USING SEMANTIC SIMILARITY MEASURE

371