Authors:
Dan Tufiş
1
and
Radu Ion
2
Affiliations:
1
Institute for Artificial Intelligence; Faculty of Informatics, University “A.I. Cuza”, Romania
;
2
Institute for Artificial Intelligence, Romania
Abstract:
Understanding natural language assumes one way or another, being able to select the appropriate meaning for each word in a text. Word-sense disambiguation is, by far, the most difficult part of the semantic processing required for natural language understanding. In a limited domain of discourse this problem is alleviated by considering only a few of the senses one word would have listed in any general purpose dictionary. Moreover, when multiple senses are considered for a lexical item, the granularity of these senses is very coarse so that discriminating them is much simpler than in the general case. Such a solution, although computationally motivated with respect to the universe of discourse considered, has the disadvantage of reduced portability and is fallible when the meanings of words cross the boundaries of the prescribed universe of discourse. A general semantic lexicon, such as Princeton WordNet 2.0 (henceforth PWN2.0), with word-senses labeled for specialized domains offers
much more expressivity and power, reducing application dependency but, on the other hand posing the hard and challenging problem of contextual word-sense disambiguation. We describe a multilingual environment, relying on several monolingual wordnets, aligned to PWN2.0 via an interlingual index (ILI), for word-sense disambiguation in parallel texts. The words of interest, irrespective of the language in the multilingual documents are uniformly disambiguated by using the same sense-inventory labels.
(More)