TAG RECOMMENDATION BASED ON USER’S BEHAVIOR

IN COLLABORATIVE TAGGING SYSTEMS

Nagehan Ilhan and S¸ule G¨und¨uz

O˘g¨ud¨uc¨u

Istanbul Technical University, Department of Computer Engineering, Maslak, Istanbul, 34469 Turkey

Keywords:

Collaborative tagging, Recommender systems, Tag suggestions, Social network analysis.

Abstract:

Social bookmarking Web sites allow users submitting their resources and labeling them with arbitrary key-

words, called tags, to create folksonomies. Tag recommendation is an important element of collaborative

tagging systems which aims at providing relevant information to users by proposing a set of tags to each

newly posted resource. In this paper, we focus on the task of tag recommendation when a user examines a

document based on the user’s tagging behavior. We explore the use of this semantic relationship in modeling

the user tagging behavior. The experiments are performed on the data set obtained from a social bookmarking

site. Our experimental result show that our method is efﬁcient in modeling users’ tagging behavior and it can

be used to recommend tags for resources.

1 INTRODUCTION

Collaborative tagging systems are popular tools for

creating, collecting and sharing huge amounts of so-

cial data over the Web (Golder and Huberman, 2006).

Social bookmarking services allow Web users to an-

notate the resources with freely chosen keywords

called tags. The tags given by a user to a resource

reﬂect the interest of the user in the resource as well

as the understanding of the content of the resource.

Most of the social bookmarking Web sites assist users

during the labeling process by recommending tags.

Recommending tags can employ on various purposes

such as increasing the probability of a resource’s get-

ting annotated or reminding the user what a resource

is about. There are numerous social bookmarking

Web sites providing these services, the most popular

being Delicious

. Delicious is a widely used social

bookmarking service devoted to tag URL’s. The aim

of this work is to model the tagging behavior of users

in order to recommend them personalized tags related

to the document they are interested in.

In this paper, we propose a method to enrich the

model of tagging behavior in a folksonomy by adding

some semantics based on the WordNet hierarchy of

concepts (Fellbaum, 1998). We focus on modeling

users’ tagging behavior effectively which in turn will

increase the recommendation accuracy. Our model

does not only consider previously used bookmarks of

http://del.icio.us.com/

the users but also takes into account the content of

the document. This feature is also helpful to handle

cold-start situations. Our objective is ﬁrstly to ex-

tract tagging pattern of users by analyzing the simi-

larity between user tags and the content of the doc-

ument in order to represent this relationship between

folksonomy tags and the content. The content of a

document is divided in this study into ﬁve different

components called document sections (e.g. page title,

main content, heading 1 etc.). We ﬁnd out effect rates

of different document sections on user tagging behav-

ior while she/he is bookmarking a Web page. Then,

we calculate score points for each user that reﬂects

the probability of choosing tags by a user that appear

in a particular section of the document. We generate

our recommendation set by considering the calculated

rates of the user.

The rest of the paper organized as follows. We

mentioned related works in Section 2. Our proposed

method is introduced in Section 3. We then present

our experiments and discuss results in Section 4. Fi-

nally, Section 5 concludes the paper.

2 RELATED WORK

There exist statistical investigations about the us-

age dynamics and tagging patterns of tag collections

(Golder and Huberman, 2005)(Kipp and Campbell,

2007).

570

Ilhan N. and Gündüz-Ö

güdücü ¸S..

TAG RECOMMENDATION BASED ON USER’S BEHAVIOR IN COLLABORATIVE TAGGING SYSTEMS.

DOI: 10.5220/0003151005700573

In Proceedings of the 3rd International Conference on Agents and Artiﬁcial Intelligence (ICAART-2011), pages 570-573

ISBN: 978-989-8425-40-9

 2011 SCITEPRESS (Science and Technology Publications, Lda.)

In (Lee and Chun, 2007) content-based tag rec-

ommendation which uses graph representation is pre-

sented. Their system recommends the tags extracted

from the content of a blog using an artiﬁcial neural

network which uses WordNet and word frequencies

in the training step. An example of content-based tag

recommendation which uses graph representation is

presented in (Lee and Chun, 2007). Their system rec-

ommends the tags extracted from the content of a blog

using an artiﬁcial neural network which uses Word-

Net and word frequencies in the training step.

The authors in (Tatu et al., 2008) utilize informa-

tion from resource content and the folksonomic struc-

ture of the graph. They use the graph to create a set

of tags related to the resource and a set of tags re-

lated to the user. Then the system enrich tag vocab-

ularies of the set of tags related to resource or user

by WordNet based search for words that represent the

same concept in order to recommend to the user. A

method which creates resource related tags with the

keywords found in the resource’s title and extending

them with the tags that co-occur with the base tags in

the system is presented in (Lipczak et al., 2009). Ex-

isting tag recommendation studies use previous tags

that has been assigned to the resource by other users.

Thus, they become insufﬁcient when a new resource

appears. Our recommendation model utilize content

of the Web document, hence new or frequently as-

signed resources does not alter our recommendation

success.

3 PROPOSED METHOD

3.1 Analysis of Tagging Behavior

It can be assumed that Web pages can be represented

by their text. In this study, this text is separated into

ﬁve different sections: (1) main content for long texts

in the body part of the document (C); (2) page title

(P), (3) heading 1 (H1); (4) heading 2 (H2); and (5)

the anchor text in the links (A). There are 6 heading

tags available in HTML coding and H1 is the largest

being at the top of the heading structure hierarchy. In

the remaining part of this paper, dx

denotes one of

this ﬁve sections of a document d

. A preprocessing

step is performed which includes stop word removal

and stemming of terms. The main content of a Web

page is then represented by top-k terms that have the

highest frequency among the other terms in the body

part of the document. The terms in a section of the

document are combined into a single vector:

−→

= (wx

, f

), (wx

, f

), . . . , (wx

, f

) (1)

where wx

, wx

, . . . , wx

are terms that appear in the

corresponding section dx

and f

, f

, . . . , f

are the

frequencies of the terms. Thus, a Web document can

be represented by 5 term vectors. Instead of com-

monly used TF-IDF (Term Frequency/Inverse Doc-

ument Frequency) weighting scheme we used TF

weighting in vector representations.

The tags assigned to a Web document are com-

bined into a single tag vector:

−→

= (t

, f

), (t

, f

), . . . , (t

, f

) (2)

where t

, t

, . . . , t

are tags assigned by users to docu-

ment d

and f

, f

, . . . , f

are the frequencies of the

corresponding tags in that document.

As stated earlier, the aim of this step is to ﬁnd

a relationship between terms appeared in the docu-

ment and the tags assigned to it. For this reason, the

similarity between each term vector and tag vector of

the document is computed using the cosine similarity

measure:

sim(

−→

) =

−→

•

−→

(3)

The second step of tag analysis comprises of

determining the semantic relationship between the

scope of a document and tags of this document using

WordNet. Each term in each term vector of a doc-

ument is converted into its hypernym and hyponym

versions using WordNet. A term’s hypernym is a gen-

eral term whereas a hyponym is speciﬁc. The fre-

quency f

of a termt

in a term vector of d

is mapped

to its hypernyms/hyponyms {h

, . . . , h

}. The

frequencies of synonym terms are determined in a

similar way of hypernym/hyponymcase. The similar-

ity between each term vector and synonym tag vector

is computed based on the cosine measure.

3.2 Personalized Tag Recommendations

We are given a set of users U = {u

, u

, . . . , u

}, a

set of Web pages R = {d

, d

, . . . , d

} and a set of

tags T = {t

, t

, . . . , t

}. In this paper, we will use the

following notations:

• tags(u

) ⊆ T is the set of tags used by user u

• tags(u

, d

) ⊆ tags(u

) is the set of tags given by

user u

to a Web page d

• tags(d

) ⊆ T is the set of tags given to Web page

• tags(dx

) ⊆ tags(d

) is the set of tags of Web page

that appear in the dx

part of that page. Note

that dx can be one of the ﬁve different sections of

the document, such as main content, page title, h1,

h2 or anchor text.

TAG RECOMMENDATION BASED ON USER'S BEHAVIOR IN COLLABORATIVE TAGGING SYSTEMS

571

For each user u

, a score is calculated to determine

whether the user selects tags related to the content of

the document and if so from which part of the doc-

ument or (s)he assigns tags from her/his own vocab-

ulary independent from the content of the document.

First, a score value is computed for each document

section-user pair which is the probability of choosing

tags by that user that appear in dx section in a docu-

ment:

score

|tags(u

, d

) ∩ tags(dx

|tags(u

, d

(4)

Each document section dx

contributes to the ﬁnal

set of tag recommendations with n

x, j

tags which is

proportional to the score value of this section. Let the

ﬁnal set of recommendations consists of k tags. The

number of tags in the ﬁnal recommendations set that

are part of dx

is:

x, j

score

∑

score

× k (5)

A recommendation set R(u

, dx

) is formed for user

with n

x, j

tags that have the highest frequency in

term vector dx

. Finally, user u

is provided with a set

of k recommended tags R(u

, d

) for a particular Web

document d

R(u

, d

) =

[

R(u

, dx

) (6)

4 EXPERIMENTAL RESULTS

4.1 Data Preperation

The experiments are performed on two different

datasets which are collected from the Delicious Web

site. The details of the datasets are given in Table 1.

Table 1: Dataset Information.

Urls Users Tags

Dataset1 1013 45654 42169

Dataset2(train) 25122 1020 82626

Dataset2 (test) 25880 1020 85321

Each Web document in each dataset is parsed

to remove HTML tagging. The same preprocess-

ing step is performed on each Web document and

the set of user tags by applying a stop word removal

and Porter’s stemming algorithm (Jones and Willet,

1997). Each Web document is divided into 5 sec-

tions by representing each section by a term vector

as explained in Section 3.1. Hypernym, hyponym

Figure 1: Similarity values between term and tag vectors of

documents.

Figure 2: Similarity values between hypernym term and tag

vectors of documents.

and synonym vectors of each term vector of each

document are constructed using WordNet (Fellbaum,

1998). Then the cosine similarity between each (hy-

pernym/hyponym/synonym) term vector and tag vec-

tor of documents is calculated.

For simpliﬁcation, we present the following ex-

perimental settings, S1-S3. In S1, the cosine similar-

ity between each term vector dx

and the tag vector tt

of d

is calculated using Eq. 3. The cosine similarity

between hypernyms of term vectors and tag vectors of

documents is calculated in S2. The synonym of term

vector is constructed for S3 and the cosine similarity

is calculated between synonym term vectors and tag

vectors. In each setup, the similarity values are aver-

aged over the entire set of documents in Dataset1.

Fig. 1, 2 and 3 show the similarity results for S1,

S2 and S3, respectively. The similarity between term

vector obtained from the content and the tag vector

is higher than the similarities between the remaining

term vectors and tag vector. The similarity value ob-

tained by using page title is close to the similarity

value of using content term vector.

Based on these result, a hybrid recommendation

ICAART 2011 - 3rd International Conference on Agents and Artificial Intelligence

572

Figure 3: Similarity values between synonym term and tag

vectors of documents.

Figure 4: Recommendation Results.

set for users by only calculating the users’ tagging

scores on page title and content of Web documents.

The recommendation set consists of 10 tags (k) which

is empirically determined. Recommendation results

given in Figure 4 support our prior review on simi-

larities between tags and document content. Recom-

mendation set generated by just using most frequent

content terms outperforms the set generated by using

most frequent page title terms. However, the recom-

mendation rate of our hybrid recommendation set per-

forms better than both sets.

5 CONCLUSIONS

In this paper, we considered the content of a resource

as tag source in creating the recommendation set. We

investigated the similarity between different parts of

the content of the resource with the tags assigned to

the resources. Our main aim was to determine which

part of the document has valuable tags and can be a

potential tag source. It is also examined that if the se-

mantically related terms of the content can be used as

tag source or not. Results indicate that users tend to

choose terms that appear in the content of the docu-

ment rather than selecting terms that are semantically