the described statistical approach can be improved by
a larger text corpus.
5 CONCLUSIONS
An approach has been described that allows to effi-
ciently generate semantic tagnets for given unstruc-
tured lists of tags and an arbitrary text corpus. This
semantic tagnet can be used to estimate text similar-
ities for tagged texts in digital libraries to provide a
more intuitive way of literature research adapted to
the user’s cognitive model. In addition to this the gen-
erated semantic tagnet could be used to define ontolo-
gies or allow users to enhance the network using a
suitable interface similar to the idea of user feedback
as proposed in (Doan and McCann, 2003).
In a future version the tagnet builder may be en-
hanced by some components that allow to extract
synonym relations, too. One idea of a rule-based
approach has been proposed in (Ananthanarayanan
et al., 2008) for english words. This enhancement
would allow to store relations between tags in dif-
ferent languages to create a multilingual semantic net
that can be used for a digital library that stores texts
in different languages.
ACKNOWLEDGEMENTS
This research was funded in part by the DFG Cluster
of Excellence on Ultra-high Speed Information and
Communication (UMIC), German Research Founda-
tion grant DFG EXC 89, and by the German Research
Foundation grant for the Project Interdisciplinary Text
Production and Writing (ipTS
12
).
REFERENCES
Ananthanarayanan, R., Chenthamarakshan, V., Deshpande,
P. M., and Krishnapuram, R. (2008). Rule based syn-
onyms for entity extraction from noisy text. In AND
’08: Proceedings of the second workshop on Analyt-
ics for noisy unstructured text data, pages 31–38, New
York, NY, USA. ACM.
Barnett, B. (2009). Regular expressions,
http://www.grymoire.com/unix/regular.html.
Berners-Lee, T., Hendler, J., and Lassila, O. (2001). The se-
mantic web - a new form of web content that is mean-
ingful to computers will unleash a revolution of new
possibilities.
12
http://www.ipts.rwth-aachen.de/
Chaffin, R. (1992). The concept of a semantic relation. In
A. Lehrer, E. K., editor, Frames, Fields and Contrasts,
pages 253–288. Lawrence Erlbaum, Hillsdale, N.J.
Collins, A. and Quillian, M. (1969). Retrieval time from
semantic memory. Journal of Verbal Learning and
Verbal Behavior, 8(2):240–247.
Doan, A. and McCann, R. (2003). Building data integration
systems: A mass collaboration approach. In IIWeb,
pages 183–188.
Fellbaum, C., editor (1998). WordNet: An Electronic Lex-
ical Database (Language, Speech, and Communica-
tion). The MIT Press.
Gaizauskas, R. and Humphreys, K. (1997). Using a se-
mantic network for information extraction. Nat. Lang.
Eng., 3(2):147–169.
G
¨
otten, D. (2009). Semantische Schlagwortnetze zur ef-
fizienten Literaturrecherche. Master’s thesis, RWTH
Aachen University.
Harris, Z. (1985). In Katz, J. J., editor, The Philosophy of
linguistics, pages 26–47. Oxford University Press.
Harrison, M. A. (1978). Introduction to Formal Language
Theory. Addison-Wesley Longman Publishing Co.,
Inc., Boston, MA, USA.
Lee, M. D., Pincombe, B., and Welsh, M. (2005). An em-
pirical evaluation of models of text document simi-
larity. In Proceedings of the 27th Annual Conference
of the Cognitive Science Society, pages 1254–1259,
Mahwah, NJ. Erlbaum.
L
¨
obner, S. (2003). Semantik. Eine Einf
¨
uhrung.
Lovins, J. B. (1968). Development of a stemming algo-
rithm. Mechanical Translation and Computational
Linguistics, 11:22–31.
Perera, P. and Witte, R. (2005). A self-learning context-
aware lemmatizer for german. In Proceedings
of Human Language Technology Conference and
Conference on Empirical Methods in Natural Lan-
guage Processing, pages 636–643, Vancouver, British
Columbia, Canada. Association for Computational
Linguistics.
Porter, M. F. (1980). An algorithm for suffix stripping. Pro-
gram, 14(3):130–137.
Porter, M. F. (2009). German stemming algorithm.
Quillian, M. R. (1967). Word concepts: A theory and simu-
lation of some basic semantic capabilities. Behavioral
Science, 12:410–430.
Schmid, H. (1994). Probabilistic part-of-speech tagging us-
ing decision trees. In Proceedings of International
Conference on New Methods in Language Processing.
Schmid, H. (1995). Improvements in part-of-speech tagging
with an application to german. In In Proceedings of
the ACL SIGDAT-Workshop, pages 47–50.
Sowa, J., editor (1991). Principles of Semantic Net-
works: Explorations in the Representation of Knowl-
edge (Morgan Kaufmann Series in Representation and
Reasoning). Morgan Kaufmann Pub.
Sowa, J. (2009). Semantic networks,
http://www.jfsowa.com/pubs/semnet.htm.
WEBIST 2010 - 6th International Conference on Web Information Systems and Technologies
54