Berenike Litz, Hagen Langer, Rainer Malaka



In this paper we propose a novel approach that combines syntactic and context information to identify lexical semantic relationships. We compiled semi-automatically and manually created training data and a test set for evaluation with the first sentences fromthe German version ofWikipedia. We trained the Trigrams’n’Tags Tagger by Brants (Brants, 2000) with a semantically enhanced tagset. The experiments showed that the cleanliness of the data is far more important than the amount of the same. Furthermore, it was shown that bootstrapping is a viable approach to ameliorate the results. Our approach outperformed the competitive lexico-syntactic patterns by 7% leading to an F1-measure of .91.


  1. Abney, S. (2002). Bootstrapping. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 360-367, Morristown, NJ, USA. Association for Computational Linguistics.
  2. Amsler, R. A. (1981). A taxonomy for english nouns and verbs. In Proceedings of the 19th Annual Meeting on Association for Computational Linguistics, pages 133-138, Morristown, NJ, USA. Association for Computational Linguistics.
  3. Brants, T. (2000). TnT - A statistical Part-of-Speech tagger. In Proceedings of the Sixth Applied Natural Language Processing (ANLP-2000), pages 224-231, Seattle, Washington.
  4. Choi, S. and Park, H. R. (2005). Finding taxonomical relation from an mrd for thesaurus extension. In Dale, R., Wong, K.-F., Su, J., and Kwong, O. Y., editors, Natural Language Processing - IJCNLP, volume 3651 of Lecture Notes in Computer Science, pages 357-365. Springer.
  5. Etzioni, O., Cafarella, M., Downey, D., Popescu, A.-M., Shaked, T., Soderland, S., Weld, D. S., and Yates, A. (2005). Unsupervised named-entity extraction from the web: an experimental study. Artif. Intell., 165(1):91-134.
  6. Hearst, M. A. (1992). Automatic acquisition of hyponyms from large text corpora. In Proceedings of COLING, Nantes, France.
  7. Kazama, J. and Torisawa, K. (2007). Exploiting wikipedia as external knowledge for named entity recognition. In Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 698-707.
  8. Lafferty, J., McCallum, A., and Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In ICML-01.
  9. Rabiner, L. R. (1989). A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 77.
  10. Snow, R., Jurafsky, D., and Ng, A. Y. (2005). Learning syntactic patterns for automatic hypernym discovery. In Saul, L. K., Weiss, Y., and Bottou, L., editors, Advances in Neural Information Processing Systems 17, pages 1297-1304. MIT Press, Cambridge, MA.
  11. Tufis, D. and Mason, O. (1998). Tagging romanian texts: a case study for qtag, a language independent probabilistic tagger. In Proceedings of the 1st International Conference of Language Resources and Evaluation (LREC-98), Granada, Spain.
  12. Van Rijsbergen, C. J. K. (1979). Information Retrieval, 2nd edition. Dept. of Computer Science, University of Glasgow.

Paper Citation

in Harvard Style

Litz B., Langer H. and Malaka R. (2009). TRIGRAMS’N’TAGS FOR LEXICAL KNOWLEDGE ACQUISITION . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2009) ISBN 978-989-674-011-5, pages 18-25. DOI: 10.5220/0002292100180025

in Bibtex Style

author={Berenike Litz and Hagen Langer and Rainer Malaka},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2009)},

in EndNote Style

JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2009)
SN - 978-989-674-011-5
AU - Litz B.
AU - Langer H.
AU - Malaka R.
PY - 2009
SP - 18
EP - 25
DO - 10.5220/0002292100180025