An Ontology-based Method for Sparsity Problem in Tag
Recommendation
Endang Djuana
1
, Yue Xu
1
, Yuefeng Li
1
, Audun Josang
2
and Clive Cox
3
1
School of Electrical Engineering and Computer Science, Queensland University of Technology, Brisbane, Austalia
2
Department of Informatics, University of Oslo, Oslo, Norway
3
Rummble Ltd, London, U.K.
Keywords:
Collaborative Tagging, Tag Recommendation, Domain Ontology, Folksonomy, Sparsity Problem.
Abstract:
Tags or personal metadata for annotating web resources have been widely adopted in Web 2.0 sites. However,
as tags are freely chosen by users, the vocabularies are diverse, ambiguous and sometimes only meaningful to
individuals. Tag recommenders may assist users during tagging process. Its objective is to suggest relevant
tags to use as well as to help consolidating vocabulary in the systems. In this paper we discuss our approach for
providing personalized tag recommendation by making use of existing domain ontology generated from folk-
sonomy. Specifically we evaluated the approach in sparse situation. The evaluation shows that the proposed
ontology-based method has improved the accuracy of tag recommendation in this situation.
1 INTRODUCTION
Tags or personally supplied keywords for describing
web resources have been widely adopted in Web 2.0
sites. This facility can be found in social bookmark-
ing such as Bibsonomy
1
, multimedia sharing such as
YouTube
2
, e-commerce such as Amazon
3
and more
recently micro-blogs such as Twitter
4
as well.
Tags are freely chosen words which act as annota-
tion or metadata for describing web resources which
can be used for personal organization, easy retrieval
or finding related resources (Marlow et al., 2006). As
users build up their tags collection, the aggregates of
all users vocabulary may exhibit an informal taxon-
omy system which is known as folksonomy or folks
taxonomy (Mathes, 2004).
However, since tags are chosen freely by users,
tags vocabularies are diverse, potentially ambiguous
and sometimes only meaningful to individuals. Be-
sides variations in format such as plurality, pre- and
suf-(fixes), and case variations, these tags may also
have polysemy (multiple meaning), synonymy and
generality problems (Golder and Huberman, 2006)
(Liang et al., 2010).
1
http://www.bibsonomy.org
2
http://www.youtube.com
3
http://www.amazon.com
4
https://twitter.com/
Tag recommendersytems are a specialized recom-
mender system for suggesting tags for annotating web
resources. Specifically, a tag recommender system
will recommend for a given user and a given item,
a set of tags for annotating the item. Its objectives
were to provide relevant tags and help consolidate the
annotation vocabulary in the systems (J¨aschke et al.,
2008).
In this paper we present a tag recommender ap-
proach which aims to solve problems with ambigu-
ities and generality during recommendation process.
We utilize an existing domain ontology generated
from folksonomy to represent tags as concepts with
their relationships (Djuana et al., 2012). For a given
user and a given item, by using the user-based collab-
orative filtering technique, a set of candidate tags can
be produced. In this paper, we propose a method to
expand the candidate tags by including more general
or specific concepts based on the domain ontology.
Moreover, we conduct experimentsto evaluate perfor-
mance of the proposed approach in sparse situation in
which tag recommendation needs to be produced for
users with less items, items with less tags and tags
with less users.
This paper is structured as follows. We introduce
some key concepts in Section 2. Then we present
problem formulation and related works in Section 3.
Section 4 discusses the adopted domain ontology as
467
Djuana E., Xu Y., Li Y., Josang A. and Cox C..
An Ontology-based Method for Sparsity Problem in Tag Recommendation.
DOI: 10.5220/0004439904670474
In Proceedings of the 15th International Conference on Enterprise Information Systems (ICEIS-2013), pages 467-474
ISBN: 978-989-8565-60-0
Copyright
c
2013 SCITEPRESS (Science and Technology Publications, Lda.)
basis for recommendation framework. Section 5 dis-
cusses the proposed tag recommendation approach.
Section 6 discusses evaluation and experiment results.
And in Section 7 we conclude this paper and discuss
some ideas for future work.
2 KEY CONCEPTS
In this section we introduce collaborativetagging sys-
tem, tag recommendation, and ontology from folk-
sonomy.
2.1 Collaborative Tagging Systems
A collaborativetagging system contains three entities:
users, tags, and items, which are described below:
Users U={u
1
,u
2
, ...u
|U|
} contains all users in an
online community who have used tags to organize
their items.
Tags T={t
1
,t
2
, . ..t
|T|
} contains all tags used by
the users in U. Tags are typically arbitrary strings
which could be a single word or short phrase. In
this paper, a tag is defined as a sequence of terms.
Fort T,t = hterm
1
,term
2
,.. .term
m
i A function
is defined to return the terms in a tag: tagset(t) =
{term
1
,term
2
,.. .term
m
}
Items I={i
1
,i
2
, ...i
|I|
} contains all domain-
relevant items or resources. What is considered
by an item depends on the type of user tagging
collection, for instance, in Bibsonomy the items
are mainly bookmarks and publications.
Based on these three entities, a collaborative tag-
ging system is formulated as Folksonomy which con-
sists of 4-tuple: F = (U, T, I,Y) where U,T, I are fi-
nite sets, whose elements are the users, tags and items,
respectively. Y is a ternary relation between them, i.e.,
Y U × T × I, whose elements are called tag assign-
ments or taggings. An element (u,t,i) Y represents
that user u collected item i using tag t. A function
Ft(u,i) is defined to return a set of tags that a user u
has assigned to an item i whereby Ft(u,i) = {t T |
(u,t,i) Y} for all u U and i I.
2.2 Tag Recommendation
A tag recommender system is a specific kind of rec-
ommender systems in which the goal is to recommend
a set of tags to use for a particular item. Based on
previous formulation of Folksonomy, the task of a tag
recommender system is to recommend, for a given
user u U and a given item i I which has not been
tagged by the user or Ft(u,i) =
/
0, a set
˜
T(u,i) T
of tags. In many cases
˜
T(u, i) T is computed by
first generating a ranking on the set of tags accord-
ing to some criterion, for instance by a collaborative
filtering, content based, or other recommendation al-
gorithms, from which then the top n tags are selected.
2.3 Ontology from Folksonomy
Ontology is formal description and explicit specifi-
cation of a shared conceptualization (Gruber, 1993).
Depending on the types of stored knowledge, ontol-
ogy can be categorized in two types: domain ontol-
ogy and general ontology (Navigli et al., 2003). Gen-
eral ontology defines concepts that are general for all
domains. Domain ontology forms the core of any
knowledge specifically for the domain.
Folksonomy which is emerging from collabo-
rative tagging has been acknowledged as potential
source for constructing ontology. As it captures vo-
cabulary of users which may be aggregated to produce
emergent semantics, people may develop lightweight
ontologies (Mika, 2007).
3 MOTIVATION
In this section we discuss main motivation for this
work by formulating main problem to solve and re-
view related works for the proposed method and prob-
lem solution.
3.1 Problem Formulation
Sparsity problem (Adomavicius and Tuzhilin, 2005)
refers to users who have rated very few items or items
which have received very few ratings. In tag recom-
mendation context, sparsity refers to users who tag a
few or very few resources, and in some situation only
one resource. It could also mean there are resources
which received very few annotations and there are
some tags which are only used by very few users.
Cold start (Schein et al., 2002) (Adomavicius and
Tuzhilin, 2005) is a specific situation in recommender
systems context when a new user arrives or new re-
source exists. In tag recommendation, each post con-
sists of a user, a resource and all tags that this user
has assigned to that resource. In this regards, cold-
start problem may be formulated as (1) new user who
posted for the first time in the system or in other
words, all posts of a new user are in the test set or
(2) new resource, on the other hand, refers to a re-
source that has never been tagged before by any other
user (Preisach et al., 2010).
ICEIS2013-15thInternationalConferenceonEnterpriseInformationSystems
468
For these situations, most of the state of the art tag
recommendation methods perform poorly (Preisach
et al., 2010). In this paper we aim to present a tag
recommendation method which may alleviate these
problems by utilizing domain ontology generated
from folksonomy.
3.2 Related Works
Tag recommender systems are broadly divided into
three classes: content-based, collaborative filtering,
and graph-based approaches (Musto et al., 2010).
One early content-based tag recommender is the
work by (Brooks and Montanez, 2006). The state of
the art works in this class include the approach by
(Tatu et al., 2008) which mapped textual contents in
Bibsonomy bookmarks, not just the tags, to concepts
in WordNet and a similar approach by (Lipczak et al.,
2009) which explored resource content as well as re-
source and user profiles. However, there is a draw-
back that these works relied on extended textual con-
tents provided by Bibsonomy which are not always
available in other collaborative tagging systems.
The baseline tag recommender system in col-
laborative class is the user-based CF (Marinho and
Schmidt-Thieme, 2008). There is also a notable work
by (Sigurbj¨ornsson and van Zwol, 2008) which is
based on tag co-occurrences. Although this work has
achieved good result, it didn’t rely on actual mean-
ing of tags which may miss the semantic relationships
among tags.
The most notable works in graph-based ap-
proaches are the work by (J¨aschke et al., 2008) which
utilized a graph-based tag ranking method named
FolkRank (Hotho et al., 2006) and the work by
(Symeonidis et al., 2008) which proposed the Tensor
Dimensionality Reduction method.
There wasn’t much work done in using domain
ontology for tag recommendation. Beside the work
proposed in this paper there is a work by (Baruzzo
et al., 2009) which used existing domain ontology to
recommend new tags by analyzing textual content of
a resource needed to be tagged. However, they didn’t
provide quantitative evaluation.
Most of the state of the art works in tag recom-
mendation are evaluated on dense datasets and rarely
on sparse datasets. The work we described in this
paper is a tag recommender approach which com-
bines collaborative filtering and graph-based method
but not utilizing content-based methods. Although
content-based methods may achieve good results for
cold-start situation, they may not be applicable to all
collaborative tagging systems because they rely on
extra information on resources which are not always
available. It also may not be practical since for dif-
ferent content type they will need different version of
the algorithm (Preisach et al., 2010).
4 ONTOLOGY SPECIFICATION
In this section we specify ontology specification
which we are going to utilize in the tag recommen-
dation approach proposed in this paper. Specifically
this ontology specification discusses one particular
approach that we have chosen for semantic and per-
sonalization capability of the generated domain on-
tology. Readers are referred to (Djuana et al., 2012)
for more detailed discussion.
In order to use existing domain ontology gen-
erated from folksonomy, we specify the criteria in
which the ontology has to conform to. This ontol-
ogy of tags should come from a general ontology with
good coverage. In particular this general ontology
needs to have synonym terms (synset) or the like in
their concept, and also general category or taxonomic
grouping system such as WordNet (Fellbaum, 1998).
It was expected by conducting ontology learning
process, domain ontology which represents a partic-
ular tag collection can be generated. There are 3
stages in domain ontology generation process which
are: mapping tags to concepts, mapping disambigua-
tion and relationships extraction.
It is possible that a tag can map directly to one of
synonym terms of a concept in the backbone ontol-
ogy. In other cases, only part of a tag that can map to
one of synonym terms. These cases where handled by
three mapping approaches which are (1) whole map-
ping (2) partial mapping and (3) term mapping.
After all possible mappings are found, the next
stage was mapping disambiguation to choose the most
appropriate concept from mapped concepts to rep-
resent the meaning of a tag for this particular tag
collection. Two disambiguation strategies were per-
formed which are (1) disambiguation by frequency
which comes from an expert point of view about gen-
eral meaning of tags. This mapping strength comes
from frequency in a representative corpus of doc-
uments which indicate how frequent one particular
synonym term would be used to represent the mean-
ing of concept that contains these terms; (2) disam-
biguation by tag relevance which comes from users
point of view about a personal meaning in tags collec-
tion. This mapping strength comes from tag relevance
in relation to similar users understanding and usage
of tags. Given a related tags that has been used for an
item, this mapping is chosen according to relevance to
other tags. After mapping disambiguation, each tag t
AnOntology-basedMethodforSparsityProbleminTagRecommendation
469
will map to one and only one concept. In the end,
confirmed mappings according to two disambigua-
tion strategies were: M
frequency
(t) and M
relevance
(t).
Based on tag to concept mapping, available relation-
ships (”is-a” relation) among concepts in general on-
tology were extracted to form domain ontology.
5 PROPOSED APPROACH
The proposed recommendation approach consists of
two parts. The first part is the user-based collab-
orative filtering (CF) tag recommendation approach
(user-based CF) (J¨aschke et al., 2008) (Marinho and
Schmidt-Thieme, 2008). This part will also serve as
a baseline tag recommender for evaluation purpose.
However, this approach may not be able to solve am-
biguities problem since it can only recommend pre-
viously used tags and may not be able to recommend
semantically related tags. Therefore, for the second
part we proposed the ontology-based concept expan-
sion tag recommendation approach. In this approach
we utilize concepts and relationships which have been
consolidated in the existing domain ontology. We at-
tempt to improve the user-based CF by utilizing ex-
panding vocabulary of recommended tags by mak-
ing use of synonym terms and semantic relationships
among related concepts in the ontology.
5.1 User-based CF Method
In the traditional user-based CF recommender sys-
tems for recommending items, user profiles are rep-
resented in a |U| × |I| user-item matrix X, where |U|
represents number of users and |I| represents number
of items. For each row vector:
x
u
= [x
u,1
,.. .x
u,|I|
], for
u = 1, ... ,|U|, x
u,i
indicates that user u rated item i by
a rating value. Each row vector
x
u
corresponds thus
to a user profile representing the users preferences to
the items.
Based on the profile matrix X, the neighbourhood
of the most similar k users to the user u can be com-
puted as follows:
N
k
u
= argmax
k
vU
sim(
x
u
,
x
v
) (1)
where sim(
x
u
,
x
v
) is the similarity between user u and
another user v. It can be calculated using a similarity
calculation method such as cosine similarity.
However, because of the ternary relational na-
ture of user tagging system, the traditional user-item
matrix X cannot be applied directly in tag recom-
menders, unless the ternary relation Y is reduced to
a lower dimensional space (Marinho and Schmidt-
Thieme, 2008).
In order to apply the user-based CF, the ternary
relation Y can be used to generate a |U| × |I| matrix
X
UI
= [
x
1
...
x
|I|
], called user-item(tag) matrix, with
x
u
= [x
u,1
,.. .x
u,|I|
], for u = 1,.. .,|U|, x
u,i
{0,1} in-
dicating that, there exists tags used by user u to tag
item i if x
u,i
= 1, otherwise no tags have been used by
user u to tag this item.
In the experiment, we implemented the user-item
(tag) projection as the user profile matrix for calculat-
ing user neighbourhood. The user-item (tag) matrix is
a binary matrix. Jaccards coefficient is used to mea-
sure the similarity of two binary vectors.
In this user-based CF method in order to recom-
mend tags to a target user for tagging a particular
item, it first generates a set of candidate tags which
have been used by other users (usually neighbour
users) to tag the item that target user is concerned.
It then ranks the candidate tags based on the similar-
ity between target user and other users to decide top n
tags as the final recommendations.
Let CT(u,i) be a set of tags which have been used
by us neighbors to tag item i. CT(u,i) are the candi-
date tags to be selected to generate recommendations
to u for tagging i. For a candidate tag t in CT(u, i), its
ranking can be calculated by the following equation:
w(u,t,i) =
vN
k
u
sim(
x
u
,
x
v
) δ(v,t, i),
δ(v,t,i) =
1 (v,t,i) Y
0 otherwise
(2)
where δ(v,t,i) = 1 indicates if the user v has used this
tag t to tag the item i, N
k
u
is the neighborhood of user
u. The top n tags can be determined based on the
ranking:
T(u,i) = arg
n
max
tT
w(u,t,i) (3)
5.2 Ontology-based Expansion Method
In this paper,we propose a method to improvethe per-
formance of the user-based CF (J¨aschke et al., 2008)
(Marinho and Schmidt-Thieme, 2008) described in
Section 5.1 (also serves as baseline recommender).
In the proposed method, we generate candidate
tags by utilizing the synonym set (synset) informa-
tion captured in the tag ontology and rank candidate
tags based on both user similarity and tag popular-
ity. The recommendations generated by baseline rec-
ommender and tag ontology based recommender de-
scribed in this Section are compared to evaluate the
improvement achieved by the expansion method. The
experiments and evaluation are provided in Section 6.
It is a well known insight to explore the possibility
of using a more general or more specific tag in rec-
ommending a new vocabulary to user. It is related to
ICEIS2013-15thInternationalConferenceonEnterpriseInformationSystems
470
a characteristic known as the basic level variations or
generality in collaborative tagging (Golder and Hu-
berman, 2006) in which certain users tend to use a
more general vocabulary while other users tend to use
a more specific vocabulary.
Therefore in an expansion to current approach we
introduce candidate tag expansion method which ex-
pands candidate tags to include the parent (more gen-
eral) concept and the children (more specific) con-
cepts as well as the basic level concepts. Each of these
concepts will need to be ranked according to seman-
tic relatedness measure to determine the closeness to
current mapped concepts. These methods will be de-
scribed below.
5.2.1 Candidate Tag Expansion
Let CT(u,i) be the set of candidate tags generated
based on neighbour users preferences. For each can-
didate tag t in CT(u,i), by using the disambiguation
mapping methods as described in Section 4, t can be
mapped to concepts M
frequency
(t) or M
relevance
(t) in
the tag ontology, respectively. For this step we will
have 2 different sources of candidate tags as follows:
1. Basic Level Tag Expansion In this strategy, for
mapped concepts which we identify as basic level
expanded concepts, from the synset terms of these
concepts, two expanded sets of candidate tags can
be generated based on the two methods:
Exp
CT
basic
frequency
(u,i) =
[
synset(M
frequency
(t))
tCT(u,i)
(4)
Exp
CT
basic
relevance
(u,i) =
[
synset(M
relevance
(t))
tCT(u,i)
(5)
2. Parent-Children Level Tag Expansion
In this strategy, for parent and children concepts
which we identify as more general and more spe-
cific concepts, we define two functions for retriev-
ing those concepts. Let c be a concept, parent(c)
be the parent concept of c, and children(c) be the
set of children concepts of c. For a tag t, the set
of its parent and children concepts are defined be-
low:
PC
frequency
(t) = {parent(M
frequency
(t))}
[
children(M
frequency
(t)) (6)
PC
relevance
(t) = {parent(M
relevance
(t))}
[
children(M
relevance
(t)) (7)
From these parent and children concepts, another
two expanded sets of candidate tags can also be
generated:
Exp
CT
PC
frequency
(u,i) =
[ [
synset(M
frequency
(t))
tCT(u,i)
cPC
frequency
(t)
(8)
Exp
CT
PC
relevance
(u,i) =
[ [
synset(M
relevance
(t))
tCT(u,i) cPC
relevance
(t)
(9)
5.2.2 Recommendation Ranking
For this step we will also have 2 different cases for
ranking calculation as follows:
1. Basic Level Tag Expansion
For each of the candidate tag t in CT(u,i),
Exp
CT
basic
frequency
(u,i) or Exp CT
basic
relevance
(u,i), its
ranking is calculated by the following equation:
w
γ
(u,t,i) =
vN
k
u
sim(
x
u
,
x
v
) δ(v,t, i) t CT(u,i)
vN
k
u
sim(
x
u
,
x
v
) δ(v,t, i) P (t) t / CT(u,i),
tExp
CT
basic
γ
(u,i)
(10)
where γ { frequency,relevance} and P (t) is the
popularity of tag t, which is calculated as:
P (t) =| UI
t
| / max
t
i
T
| UI
t
i
|.
P (t) is the ratio between | UI
t
| and the maximum
number of times that a tag has been used to tag
items in this tagging community. As has been
defined in (Djuana et al., 2012), | UI
t
| contains
(user, item) pairs representing the tag assignments
using tagt. | UI
t
| is the number of times that t has
been used to tag items. The higher the | UI
t
|, the
more popular tag t is.
2. Parent-Children Level Tag Expansion
On the other hand for each t of candidate tags
which are not original candidate tags in CT(u,i)
or the expanded basic tags in Exp
CT
basic
γ
(u,i),
AnOntology-basedMethodforSparsityProbleminTagRecommendation
471
t must be a parent or a child of a original can-
didate tag, i.e., t Exp CT
PC
frequency
(u,i) or t
Exp
CT
PC
relevance
(u,i). The ranking of tag t is cal-
culated by the following equation:
w
γ
(u,t,i) =
vN
k
u
sim(
x
u
,
x
v
) δ(v,t, i) P (t) S (t,t
o
)
t / CT(u, i),t Exp
CT
pc
γ
(u,i)
(11)
where S (t,t
o
) is the normalized similarity value
between tag t and its original candidate tag based
on WordNet similarity measures (semantic dis-
tance). In this approach we use Jiang-Conrath
similarity measures which are based on informa-
tion content in the glosses (Jiang and Conrath,
1997). We use the implementation in WordNet
Similarity package (Pedersen et al., 2004). The
more they closer in semantic distance, the higher
the similarity value will be.
6 EVALUATION
In this section, first we discuss experiments setup then
we present experiment results and discussion.
6.1 Experiments Setup
We have conducted experiments mainly using the
dataset for ECML PKDD Discovery Challenge 2009
which is summarized in (J¨aschke et al., 2012).
The dataset originated from Bibsonomy contains
two versions of training data: (1) snapshot of almost
all dumps of Bibsonomy and (2) dense part of the
snapshot. The dense part contains training data which
has been filtered to include only users, resources or
tags that appear in at least two posts. This is also
known as post core calculation (Batagelj and Zaver-
snik, 2002) at level 2.
The dataset also contains two separate test data
for (1) Content-based method whereby test data con-
tained posts whose user, resource or tag were not con-
tained in the dense part of training data and (2) Graph-
based method otherwise. Table 1 and 2 summarized
the statistics of the dataset.
We have simulated two situations in tag recom-
mendation context which are tag recommendation
using dense dataset and tag recommendation using
sparse dataset.
For dense dataset we use the dense part of snap-
shot data in Table 1 for training data, and Graph based
data in Table 2 for testing data. In this simulation,
Table 1: Training data statistics.
Statistics (until Dec. Overall Dense part
31
st
2008) Snaphot of Snapshot
#posts 421,928 64,120
#resources 378,378 22,389
#users 3,617 1,185
#tags 93,756 13,252
Table 2: Testing data statistics.
Statistics (Jan 1
st
- Task 1 Task 2
June 30
th
2009) Content-based Graph-based
#posts 43,002 778
#resources 40,729 667
#users 1,591 136
#tags 34,051 862
all users, items and tags in test dataset are all con-
tained in training data. For sparse dataset we use the
entire snapshot data in Table 1 for training data, and
Content based data in Table 2 for testing data. In this
simulation, users, items or tags in test data were not
contained in the dense part of training data which sim-
ulate sparse users and may contain cold start users or
items.
Top N tags are recommended to each target user
for one random user’s items in testing set. The rec-
ommended tags are compared to target users actual
tags of items in testing dataset. If a recommended tag
matches with an actual tag, we calculate this as a hit.
The standard precision and recall calculation are used
to evaluate the accuracy of tag recommendations.
We have conducted following runs to compare
performance between the baseline recommender and
the proposed methods.
User-CF: this is the user-based CF tag recom-
mender system which is the baseline (section 5.1)
Exp
Freq User Syn: this is the basic level synset
expansion based on frequency over user-based CF
(section 5.2)
Exp
Rel User Syn: this is the basic level synset
expansion based on tag relevance over user-based
CF (section 5.2)
Freq&Rel
User Syn: this is the combination of
Exp Freq User Syn and Exp Rel User Syn
Exp
Freq User PC: this is the parent chil-
dren synset expansion based on frequency over
Exp
Freq User Syn (section 5.2)
Exp
Rel User PC: this is the parent children
synset expansion based on tag relevance over
Exp
Rel User Syn (section 5.2)
Freq&Rel
User PC: this is the combination of
Exp Freq User PC and Exp Rel User PC
ICEIS2013-15thInternationalConferenceonEnterpriseInformationSystems
472
Folkrank TR: this is the state of the art graph-
based tag recommender (J¨aschke et al., 2008).
6.2 Results and Discussion
For the recommendation using dense dataset the re-
sults are depicted in Table 3 and Table 4 while for the
recommendation using sparse dataset the results are
depicted in Table 5 and Table 6.
Table 3: Precision for Dense Recommendation.
N 5 10 15 20
User-CF 0.183 0.103 0.070 0.052
Exp Freq User Syn 0.217 0.139 0.101 0.078
Exp Rel User Syn 0.217 0.140 0.103 0.079
Freq&Rel User Syn 0.218 0.142 0.104 0.081
Exp Freq User PC 0.221 0.144 0.103 0.079
Exp Rel User PC 0.221 0.145 0.104 0.080
Freq&Rel User PC 0.222 0.147 0.104 0.081
Folkrank TR 0.241 0.150 0.108 0.084
Table 4: Recall for Dense Recommendation.
N 5 10 15 20
User-CF 0.435 0.474 0.479 0.479
Exp Freq User Syn 0.477 0.503 0.507 0.512
Exp Rel User Syn 0.479 0.505 0.509 0.515
Freq&Rel User Syn 0.482 0.509 0.514 0.518
Exp Freq User PC 0.515 0.562 0.574 0.587
Exp Rel User PC 0.518 0.570 0.582 0.591
Freq&Rel User PC 0.523 0.578 0.588 0.593
Folkrank TR 0.576 0.685 0.726 0.750
Table 5: Precision for Sparse Recommendation.
N 5 10 15 20
User-CF 0.074 0.059 0.053 0.051
Exp Freq User PC 0.207 0.124 0.073 0.055
Exp Rel User PC 0.207 0.125 0.075 0.056
Freq&Rel User PC 0.208 0.126 0.077 0.058
Folkrank TR 0.205 0.121 0.066 0.051
For the recommendation using dense dataset we
are mainly observing how expansion by basic level
and parent-child synset expansion as well as com-
bined method may improve significantly over the
baseline recommender.
For the recommendation using sparse dataset we
are mainly observing whether or not the proposed ex-
pansion methods can outperform the state of the art
recommenders which are normally perform well in
dense situation but not in sparse situation.
The results of recommendation in dense dataset
in Table 3 and Table 4 show that the basic level
candidate tag expansion has improved the user-based
CF quite significanty in precision and in recall while
the parent-children expansion has improved further
in precision and in recall over basic level expansion
Table 6: Recall for Sparse Recommendation.
N 5 10 15 20
User-CF 0.169 0.238 0.302 0.340
Exp Freq User PC 0.513 0.572 0.582 0.586
Exp Rel User PC 0.514 0.574 0.583 0.588
Freq&Rel User PC 0.516 0.576 0.587 0.589
Folkrank TR 0.491 0.561 0.574 0.585
alone. Although these results are still lower than
FolkRank results, the gap is getting closer for preci-
sion in higher number of N which shows potential of
ontology based concept expansion for covering gaps
in collaborative filtering methods.
As we predicted Folkrank didn’t perform that well
for sparse dataset as shown in Table 5 and Table 6. In
this situation, the combination of all expansion meth-
ods has performed better than FolkRank in precision
and in recall. If we look more closely at the results
then the improvement to precision is more apparent
for higher number of N while improvement to recall
is more apparent for lower number of N. These results
shows that proposed expansion method help avoid
sudden decline in precision curve (maintaining accu-
racy) and help boost recall in first few recommended
tag which are mostly more relevant tags.
Based on these results we may conclude that the
proposed methods are quite effective for alleviating
sparsity situation.
7 CONCLUSIONS
In this paper we have presented a tag recommen-
dation approach which utilizes an existing domain
ontology generated from Folksonomy for improving
user-based CF method by expanding original candi-
date tags. We have presented an ontology-based ex-
pansion method which expands basic level tags and
includes more general and more specific tags. We
found that the expansion method based on basic level
expansion improved the baseline method quite signif-
icantly. Specifically, in sparse and cold-start situa-
tions, the combination of all expansion methods has
improved the accuracy even better than the state of
the art graph based recommendation method which
shows the potential of effectiveness in sparse situa-
tion.
ACKNOWLEDGEMENTS
This work is part of ARC Linkage Project
(LP0776400) supported by the Australian Research
Council.
AnOntology-basedMethodforSparsityProbleminTagRecommendation
473
REFERENCES
Adomavicius, G. and Tuzhilin, A. (2005). Toward the
next generation of recommender systems: A survey
of the state-of-the-art and possible extensions. IEEE
Transactions on Knowledge and Data Engineering,
17(6):734–749.
Baruzzo, A., Dattolo, A., Pudota, N., and Tasso, C. (2009).
Recommending new tags using domain-ontologies.
In Web Intelligence/IAT Workshops, pages 409–412.
IEEE.
Batagelj, V. and Zaversnik, M. (2002). Generalized cores.
cite arxiv:cs.DS/0202039.
Brooks, C. H. and Montanez, N. (2006). Improved anno-
tation of the blogosphere via autotagging and hierar-
chical clustering. In Proceedings of the 15th interna-
tional conference on World Wide Web, pages 625–632.
ACM.
Djuana, E., Xu, Y., Li, Y., and Cox, C. (2012). Person-
alization in tag ontology learning for recommendation
making. In 14th International Conference on Informa-
tion Integration and Web-based Applications & Ser-
vices (iiWAS 2012), pages 368–377. ACM.
Fellbaum, C. (1998). WordNet: An Electronic Lexical
Database. Language, Speech, and Communication
Series. MIT Press.
Golder, S. A. and Huberman, B. A. (2006). Usage patterns
of collaborative tagging systems. Journal of Informa-
tion Science, 32(2):198–208.
Gruber, T. R. (1993). A translation approach to portable
ontology specifications. Knowledge Acquisition,
5(2):199–220.
Hotho, A., Jschke, R., Schmitz, C., and Stumme, G. (2006).
Information retrieval in folksonomies: Search and
ranking. In The Semantic Web: Research and Appli-
cations, pages 411–426. Springer Berlin Heidelberg.
J¨aschke, R., Hotho, A., Mitzlaff, F., and Stumme, G. (2012).
Challenges in tag recommendations for collaborative
tagging systems. In Recommender Systems for the So-
cial Web, pages 65–87. Springer Berlin Heidelberg.
J¨aschke, R., Marinho, L., Hotho, A., Schmidt-Thieme, L.,
and Stumme, G. (2008). Tag recommendations in so-
cial bookmarking systems. AI Commun., 21(4):231–
247.
Jiang, J. J. and Conrath, D. W. (1997). Semantic similarity
based on corpus statistics and lexical taxonomy. In
Proceedings of International Conference Research on
Computational Linguistics (ROCLING X’1997).
Liang, H., Xu, Y., Li, Y., Nayak, R., and Tao, X. (2010).
Connecting users and items with weighted tags for
personalized item recommendations. In Proceedings
of the 21st ACM conference on Hypertext and hyper-
media, HT ’10, pages 51–60. ACM.
Lipczak, M., Hu, Y., Kollet, Y., and Milios, E. (2009). Tag
sources for recommendation in collaborative tagging
systems. In ECML PKDD Discovery Challenge 2009
(DC09), volume 497 of CEUR-WS.org, pages 157–
172.
Marinho, L. and Schmidt-Thieme, L. (2008). Collabora-
tive tag recommendations. In Data Analysis, Machine
Learning and Applications, pages 533–540. Springer
Berlin Heidelberg.
Marlow, C., Naaman, M., Boyd, D., and Davis, M. (2006).
Ht06, tagging paper, taxonomy, flickr, academic arti-
cle, to read. In Proceedings of the seventeenth con-
ference on Hypertext and hypermedia, pages 31–40.
ACM.
Mathes, A. (2004). Folksonomies - cooperative classifi-
cation and communication through shared metadata.
http://www.adammathes.com/academic/computer-me
diated-communication/folksonomies.html. Accessed:
2013-01-25.
Mika, P. (2007). Ontologies are us: A unified model of so-
cial networks and semantics. Web Semantics: Science,
Services and Agents on the World Wide Web, 5(1):5–
15.
Musto, C., Narducci, F., Lops, P., and Gemmis, M.
(2010). Combining collaborative and content-based
techniques for tag recommendation. In E-Commerce
and Web Technologies, pages 13–23. Springer Berlin
Heidelberg.
Navigli, R., Velardi, P., and Gangemi, A. (2003). Ontology
learning and its application to automated terminology
translation. IEEE Intelligent Systems, 18(1):22–31.
Pedersen, T., Patwardhan, S., and Michelizzi, J. (2004).
Wordnet: : Similarity - measuring the relatedness of
concepts. In AAAI, pages 1024–1025. AAAI Press.
Preisach, C., Balby Marinho, L., and Schmidt-Thieme, L.
(2010). Semi-supervised tag recommendation - using
untagged resources to mitigate cold-start problems. In
Advances in Knowledge Discovery and Data Mining,
pages 348–357. Springer Berlin Heidelberg.
Schein, A. I., Popescul, A., Ungar, L. H., and Pennock,
D. M. (2002). Methods and metrics for cold-start rec-
ommendations. In Proceedings of the 25th annual in-
ternational ACM SIGIR conference on Research and
development in information retrieval, pages 253–260.
ACM.
Sigurbj¨ornsson, B. and van Zwol, R. (2008). Flickr tag
recommendation based on collective knowledge. In
Proceedings of the 17th international conference on
World Wide Web, pages 327–336. ACM.
Symeonidis, P., Nanopoulos, A., and Manolopoulos, Y.
(2008). Tag recommendations based on tensor dimen-
sionality reduction. In Proceedings of the 2008 ACM
conference on Recommender systems, pages 43–50.
ACM.
Tatu, M., Srikanth, M., and D’Silva, T. (2008). Rsdc’08:
Tag recommendations using bookmark content. In
Proceedings of the ECML/PKDD 2008 Discovery
Challenge Workshop, pages 96–107.
ICEIS2013-15thInternationalConferenceonEnterpriseInformationSystems
474