be observed a stable pattern in tag proportions with-
out global control.
Social tagging produces some sort of a tag-
taxonomy. In contrast to existing ontologies, e.g.,
the tree-like Dewey decimal classification, social
tagging induces graphs which are constantly chang-
ing. Furthermore, folksonomies do not force unam-
biguous categorizations, but realize multi-label clas-
sifications. A prototypical example is the category
system of the Wikipedia (Voss, 2006) which is an
open-ended social ontology enhanced by a commu-
nity not only by publishing and interlinking of arti-
cle, but also by enabling user to categorize docu-
ments (Gleim, Mehler, 2006).
This paper proposes a web-based application
which combines social tagging, enhanced visual
representation of a document and the alignment to
an open-ended social ontology. More precisely we
introduce an approach for automatic extraction of
topic labels for indexing and content representation
as an add-on to social ontologies. That is, we per-
form automatic document classification in the
framework of a social ontology based on the
Wikipedia category taxonomy. This paper has two
main goals: to describe the method of automatic
tagging of digital documents and to provide an over-
view of the algorithmic patterns of lexical chaining
that can be applied for topic tracking and -labelling
of digital documents. Thereby, we first explain the
general architecture of the system in Section 2. Then
we present a formal model of the used lexical chain-
ing algorithm in Section 3. In Section 4, we outline
the alignment with the Wikipedia category system.
Finally, we give a conclusion and prospect future
work.
2 RELATED WORK
The method proposed in this paper belongs to the
domain of content classification in special the tag-
ging of content though meta-information and the
alignment of documents on a social ontology.
(Braun et al., 2007) presented an application
(SOBOLEO) on alignment of collaborative tagging
to a light-weighted ontology. This approach enables
users to add hyperlinks to an online-repository – so
called ‘social bookmarks’ – by assigning tags to
hyperlinks. Furthermore, each bookmark can be
categorized by referring to a terminological ontol-
ogy. The employed ontology can be specialised by
assigning new concepts. In this case both, tagging
and categorization of content has to be done manu-
ally. Contrary, our focus is set to an automatic –
none manually - approach of tagging and categoriza-
tion.
(Mika, 2005) presented an application for the extrac-
tion of community-base light-weighted ontologies
from web-pages. In special creating actor-concept
ontology by generating associations between an
actor (e.g. person) and a concept (e.g. label). This is
done by submitting a search query, combining the
two terms, and measuring the resultant page count.
This approach tends to be similar to the classical
lexical chaining approach, using a lexical network
(in this case a search engine) as a resource for gen-
erating associations between two terms. However an
integrated structure and content-based text model is
left out by using only already assigned tags from
content.
3 ARCHITECTURE MODEL
The main concept towards automatic content tagging
and topic tracking is an integrated structure and
content-based text model approach. This means in
first place the task of tracking semantically related
tokens based on a lexical reference system is com-
bined with a detailed structure analysis of text. The
idea behind this is that each content element of a text
(content and structure) is always semantically re-
lated to another segment in the same text. Therefore
we can span associations between tokens, sentences,
paragraphs and divisions based on their semantic
relatedness. This is done by introducing a Generic
Lexical Network Model exemplified by using a snap-
shot of the German Wikipedia-Project.
In addition an alignment to an existing ontology is
computed by normalizing, labelling and categorizing
topic chains. Generally speaking, the application
procedure can be subdivided into three coordinated
main modules
(see Figure 1) which provide an inte-
grated structure- and content-based text model for
topic tracking and automatic content tagging:
1. analysis of logical document structure
2. lexical content analysis and term extraction
3. ontology alignment and topic labelling
WEBIST 2008 - International Conference on Web Information Systems and Technologies
232