Interlingual wordnets validation and word-sense disambiguation

Dan Tufiş, Radu Ion


Understanding natural language assumes one way or another, being able to select the appropriate meaning for each word in a text. Word-sense disambiguation is, by far, the most difficult part of the semantic processing required for natural language understanding. In a limited domain of discourse this problem is alleviated by considering only a few of the senses one word would have listed in any general purpose dictionary. Moreover, when multiple senses are considered for a lexical item, the granularity of these senses is very coarse so that discriminating them is much simpler than in the general case. Such a solution, although computationally motivated with respect to the universe of discourse considered, has the disadvantage of reduced portability and is fallible when the meanings of words cross the boundaries of the prescribed universe of discourse. A general semantic lexicon, such as Princeton WordNet 2.0 (henceforth PWN2.0), with word-senses labeled for specialized domains offers much more expressivity and power, reducing application dependency but, on the other hand posing the hard and challenging problem of contextual word-sense disambiguation. We describe a multilingual environment, relying on several monolingual wordnets, aligned to PWN2.0 via an interlingual index (ILI), for word-sense disambiguation in parallel texts. The words of interest, irrespective of the language in the multilingual documents are uniformly disambiguated by using the same sense-inventory labels.


