Interlingual wordnets validation and word-sense disambiguation

Dan Tufiş, Radu Ion

Abstract

Understanding natural language assumes one way or another, being able to select the appropriate meaning for each word in a text. Word-sense disambiguation is, by far, the most difficult part of the semantic processing required for natural language understanding. In a limited domain of discourse this problem is alleviated by considering only a few of the senses one word would have listed in any general purpose dictionary. Moreover, when multiple senses are considered for a lexical item, the granularity of these senses is very coarse so that discriminating them is much simpler than in the general case. Such a solution, although computationally motivated with respect to the universe of discourse considered, has the disadvantage of reduced portability and is fallible when the meanings of words cross the boundaries of the prescribed universe of discourse. A general semantic lexicon, such as Princeton WordNet 2.0 (henceforth PWN2.0), with word-senses labeled for specialized domains offers much more expressivity and power, reducing application dependency but, on the other hand posing the hard and challenging problem of contextual word-sense disambiguation. We describe a multilingual environment, relying on several monolingual wordnets, aligned to PWN2.0 via an interlingual index (ILI), for word-sense disambiguation in parallel texts. The words of interest, irrespective of the language in the multilingual documents are uniformly disambiguated by using the same sense-inventory labels.

References

  1. Fellbaum, Ch. (Ed.) (1998) WordNet: An Electronic Lexical Database, MIT Press
  2. Stamou, S., Oflazer K., Pala K., Christoudoulakis D., Cristea D., Tufis, D., Koeva S., Totkov G., Dutoit D., Grigoriadou M. (2002): BalkaNet A Multilingual Semantic Network for the Balkan Languages, in Proceedings of the 1st International Wordnet Conference, Mysore
  3. Tufis, D., Cristea, D. (2002): Methodological issues in building the Romanian Wordnet and consistency checks in Balkanet, In Proceedings of LREC2002 Workshop on Wordnet Structures and Standardisation, Las Palmas, Spain, May, 35-41
  4. Tufis, D., Cristea, D.: Probleme metodologice în crearea Wordnet-ului românesc si teste de consistenta pentru BalkaNet, în Tufis, D., F. Gh. Filip (eds.) Limba Româna în Societatea Informationala - Societatea Cunoasterii, Editura Expert, Academia Româna, (2002) 139-166.
  5. Vossen, P. (Ed.) (1999): EuroWordNet: a multilingual database with lexical semantic networks for European Languages, Kluwer Academic Publishers, Dordrecht
  6. Budanitsky, A., Hirst, G. (2001): Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures. In Proceedings of the Workshop on WordNet and Other Lexical Resources, Second meeting of the North American Chapter of the Association for Computational Linguistics, Pittsburgh, June.
  7. Erjavec T., Ide, N. (1998) “The Multext-East corpus”. In Proceedings LREC'1998, Granada, Spain, pp. 971-974.
  8. Tufis D., Barbu A.M., Ion R. (2003): A word-alignment system with limited language resources, Proceedings of the NAACL 2003 Workshop on Building and Using Parallel Texts; Romanian-English Shared Task, Edmonton, Canada, 36-39 (also at: http://www.cs.unt.edu/rada/wpt/index.html#proceedings/).
  9. Tufis, D. Barbu, A.M. (2002): „Revealing translators knowledge: statistical methods in constructing practical translation lexicons for language and speech processing”, in International Journal of Speech Technology. Kluwer Academic Publs, no.5, pp. 199-209.
  10. Dan Tufis, Ana Maria Barbu, Radu Ion: “Extracting Multilingual Lexicons from Parallel Corpora”, 38 pages (to appear in Computers and the Humanities, 2004)
  11. Nancy Ide, Tomaz Erjavec, Dan Tufis: „Sense Discrimination with Parallel Corpora” in Proceedings of the SIGLEX Workshop on Word Sense Disambiguation: Recent Successes and Future Directions. ACL2002, July Philadelphia 2002, pp. 56-60
Download


Paper Citation


in Harvard Style

Tufiş D. and Ion R. (2004). Interlingual wordnets validation and word-sense disambiguation . In Proceedings of the 1st International Workshop on Natural Language Understanding and Cognitive Science - Volume 1: NLUCS, (ICEIS 2004) ISBN 972-8865-05-8, pages 97-105. DOI: 10.5220/0002677900970105


in Bibtex Style

@conference{nlucs04,
author={Dan Tufiş and Radu Ion},
title={Interlingual wordnets validation and word-sense disambiguation},
booktitle={Proceedings of the 1st International Workshop on Natural Language Understanding and Cognitive Science - Volume 1: NLUCS, (ICEIS 2004)},
year={2004},
pages={97-105},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002677900970105},
isbn={972-8865-05-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 1st International Workshop on Natural Language Understanding and Cognitive Science - Volume 1: NLUCS, (ICEIS 2004)
TI - Interlingual wordnets validation and word-sense disambiguation
SN - 972-8865-05-8
AU - Tufiş D.
AU - Ion R.
PY - 2004
SP - 97
EP - 105
DO - 10.5220/0002677900970105