tag suggestions, from a dynamically evolving individ-
ual tag space (assisted tagging).
Our long-term focus is placed on efficient
(semi)automated tagging and tag suggestions based
on that integrative, interoperable approach, and on
analysis and application of the resulting tag spaces
for optimized navigability abstracting from the spe-
cific tagging services in background. Users should be
enabled to work with one consistent, virtual tag space,
and not depend on service-specific restrictions.
2 RELATED WORKS
This paper covers specific topics related to compara-
bility, integration and interoperability of social clas-
sification and tagging, and analyses leading tagging
services with different scales of popularity, growth as
well as thematic focuses. For a general overview and
research motivationrefer to community discussions in
(Mathes, 2004), (Shirky, 2005). For recent research of
tagging motivations read (Ames and Naaman, 2007)
or (Zollers, 2007). Associated quantitative evalua-
tions of static and dynamic features as well as emerg-
ing structures in tag spaces are presented in (Cattuto
et al., 2006), (Cattuto, 2007), (Golder and Huberman,
2005), or (Lambiotte and Ausloos, 2006). (Zhang
et al., 2006) compare the motivations, advantages
and drawbacks of traditional top-down and emerging
bottom-up semantics concerning Web resources and
present results from del.icio.us analysis. A BibSon-
omy overview is given in (Hotho et al., 2006).
Comparison, Integration and Interoperability
Studies. (Gruber, 2005) proposes an approach for
defining an ontology that would enable the exchange
of tag data and the construction of tagging systems
that can compositionally interact with other systems.
(Veres, 2006) evaluates semantic intersections and
interoperable features between different tagging ser-
vices (flickr, del.icio.us), but lacks profound quan-
titative evaluation. The relation between texts from
blog posts and tags associated with them are analysed
in (Berendt and Hanser, 2007). Inter-relations be-
tween different tag spaces are not considered. (Bhagat
et al., 2007) analyse how different information net-
works (e.g. web, chat, email, blog, instant messenger)
interact with each other, e.g. correlations between
blog - blog, blog - web or blog - messenger. (Schmitz
et al., 2007) analyse and compare co-occurrence net-
work properties of del.icio.us data (actual as of 2004-
2005) and BibSonomy data (as of July 2006).
Distribution, Growth, and Stability. Feed based
analysis using del.icio.us data is exploited in (Shaw,
2005), (Begelman et al., 2006), or on deli.ckoma
1
web site. The last one presents actual statistics de-
rived from recent RSS feeds, and evaluates data re-
trieval coverage and error probability. (Halpin et al.,
2007) analyse whether coherent and stable categoriza-
tion schemes can emerge from unsupervised tagging,
and they evaluate its dynamics over time, including
corresponding power-laws in del.icio.us tag distribu-
tions for resources with different popularity scale. A
brief CiteULike analysis including power-lawis given
in (Capocci and Caldarelli, 2007).
Tag Space Navigability and Efficiency. (Chi and
Mytkowicz, 2007) analyse early data (actual as of
2004-2005) from large-scale del.icio.us with (condi-
tional) entropy concerning efficient navigability, and
reveal that efficiency is decreasing over time. Effi-
ciency analysis using entropy measure is also used in
(Zhang et al., 2006) and (Li et al., 2007). (Santos-
Neto et al., 2007) analyse CiteULike and BibSonomy
whether usage patterns can be exploited to improve
the navigability in a growing tagsonomy. They anal-
yse the smaller scale services BibSonomy and CiteU-
Like to reveal tagging activity distribution, and de-
fine metrics to uncover similarities in user interests.
(Brooks and Montanez, 2006) analyse the effective-
ness of tags to describe blog contents (technorati
2
,
REST API). The authors suggest that tags are more
useful to assign blogs to broad category clusters than
to indicate particular resource content. Hence, they
exploit text contents to automatically extract relevant
keywords (TF-IDF) for tag usage and compare differ-
ent combinations of these approaches.
Review of the State of the Art. Existing research
approaches introduce metrics and measures for tag
related similarities, growth, stability, and efficiency.
They apply them on basically comparable data sets
- mostly the popular broad folksonomy del.icio.us,
in some cases the less frequently used services Ci-
teULike or BibSonomy. However, results from these
different research publications cannot be effectively
compared due to different time scopes, evaluation tar-
gets, amounts of data, data retrieval concepts, and a
missing comprehensive analysis architecture follow-
ing an integrative approach. Thus, chances to evalu-
ate, compare and rank tag or resource spaces, e.g. for
efficienttag suggestions, and to deduce conclusionsto
optimize tagging processes are hard to identify. There
is need for an evaluation approach on comparable ac-
tual data sets from the same time span, based on uni-
form data retrieval which is in the scope of this paper.
1
http://deli.ckoma.net/stats
2
http://www.technorati.com/
COMPARATIVE STUDIES OF SOCIAL CLASSIFICATION SYSTEMS USING RSS FEEDS
395