Table 2: Rate of disambiguation of Arabic ambiguous
words after pre-
treatment.
Methods applied The rate of disambiguation
LSA 72.3%
Harman 64.2%
Croft 64.5%
Okapi 59.4%
Lesk 76%
From our experiments we conclude that the
lowest rate of disambiguation is mainly due to the
insufficient number of contexts of use, which result
in the failure to meet all possible events. We also
note that LSA provides the best results. Comparing
these results with the various works is a difficult
task, because we do not work on the same corpus, or
the same language, or with the same methods:
The method created by lesk (Lesk, 1986) used a
list of words appearing in the definition of each
sense of the ambiguous word achieved 50% - 70%
correct disambiguation; our system achieved 76%
correct disambiguation. Karov and Edelman (Karov
and al., 1998) (in this issue), propose an extension to
similarity-based methods, which gives 92% accurate
results on four test words.
4 CONCLUSIONS
We have proposed a system for disambiguation of
words in Arabic. This system is based
simultaneously on the methods of information
retrieval and the algorithm of Lesk used to calculate
the proximity between the current context (i.e. the
occurrence of ambiguous word) and the different
contexts of use of the possible meanings of the
word. While Lesk algorithm is used to help the
system to choose the most appropriate sense
proposed by previous methods.
The results founded are satisfactory. For a small
sample of 10 ambiguous words, the proposed system
allows to determine correctly 76% of ambiguous
words. We have tried to establish a sufficiently
robust system based on methods that have improved
their success in many system of word
disambiguation. On the other hand, during the pre-
processing we tried to make the ambiguous Arabic
words known by the system we proposed a database
containing the possible contexts of use for each
sense of an ambiguous word, synonyms, signatures
identifying the meaning of each one .
We propose that in the future works we can use
the syntactic level to disambiguate words.
REFERENCES
Al-Shalabi, R., Kanaan, G., and Al-Serhan, H., 2003. New
approach for extractingArabic roots. Paper presented
at the International Arab Conference on Information
Technology (ACIT’2003), Egypt.
Black, W. J. and Elkateb, S., 2004. A Prototype English-
Arabic Dictionary Based on WordNet, Proceedings of
2nd Global WordNet Conference, GWC2004, Czech
Republic: 67-74
Croft, W., 1983. Experiments with representation in a
document retrieval system; Research and
development, 2(1); pp. 1-21.
De Loupy, 2000. Assessing the contribution of linguistic
knowledge in semantic disambiguation and
information retrieval. THESIS presented in the
University of Avignon and the country of Vaucluse.
Derwester, S., Dumais, S.T., Furnas, G.W., Landauer,
T.K. and Harshmann, R., 1990. Indexing by Latent
Semantic Analysis. Journal of the American Society
for Informartion Science, pp. 41: 391-407.
Harman, D., 1986. An experimental study of factors
important in document ranking; Actes de ACM
Conference on Research and Development in
Information Retrieval ; Pise, Italie .
Ide, N. and Verronis, J., 1998. Word Sense
Disambiguation: The State Of the Art. Computational
Linguistics, pp. 2424:1, 1-40.
Karov, Y. and Shimon, E., 1998. Similarity- based word
sense disambiguation. In this issue.
Lesk, M., 1986. Automatic sense disambiguation using
machine readable dictionaries: how to tell a pine cone
from an ice cream cone , ACM Special Interest Group
for Design of Communication Proceedings of the 5th
annual international conference on Systems
documentation; pp. 24 – 26. ISBN 0897912241 .
Robertson, S., Walker, M., Hancock- Beaulieu and
Gatford, M., 1994. Okapi at TREC-3 ; Third Text
Retrieval Conference (TREC-3), NIST special
publication 500- 225; pp. 109-126; Gaithersburg,
Maryland, USA.
Salton, G. and Buckley, C., 1988. Term-weighting
approaches in automatic text retrieval. Information
Processing and Management, 24(5), pp. 513-523.
Sawalha and al., 2008. Comparative Evaluation of Arabic
Language Morphological Analysers and Stemmers.
Coling 2008: Companion volume – Posters and
Demonstrations, pages 107–110, Manchester, August
2008.
Vasilescu, F., 2003. Monolingual corpus disambiguation
by the approaches of Lesk : University of Montreal,
Faculty of Arts and Sciences; Paper presented at the
Faculty of Graduate Studies to obtain the rank of
Master of Science (MSc) in computer science.
Zouaghi A., Zrigui M. and Antoniadis G., 2008.
Understanding of the Arabic spontaneous speech: A
numeric modelisation, Revue TAL VARIA.
ARABIC WORD SENSE DISAMBIGUATION
655