A Multi-Agent System for Detecting and Correcting
“Hidden” Spelling Errors in Arabic Texts
Chiraz Ben Othmane Zribi, Fériel Ben Fraj and Mohamed Ben Ahmed
RIADI Laboratory, ENSI, La ManoubaUniversity, La Manouba, Tunisia
Abstract.
: In this paper, we address the problem of detecting and correcting
hidden spelling errors in Arabic texts. Hidden spelling errors are morphologi-
cally valid words and therefore they cannot be detected or corrected by conven-
tional spell checking programs. In the work presented here, we investigate this
kind of errors as they relate to the Arabic language. We start by proposing a
classification of these errors in two main categories: syntactic and semantic,
then we present our multi-agent system for hidden spelling errors detection and
correction. The multi-agent architecture is justified by the need for collabora-
tion, parallelism and competition, in addition to the need for information ex-
change between the different analysis phases. Finally, we describe the testing
framework used to evaluate the system implemented.
1 Introduction
Hidden errors are spelling errors that occur as valid words. The presence of such a
word within an incorrect syntactic or semantic context makes the whole sentence un-
intelligible. For instance:
Example: UقﻮّﺸﻟاU ﻦﻣ ﺎﻨﻴﻠﻋ ﺲﻤّﺸﻟا ﻊﻠﻄﺗ (the sun shines from desire)
In this example, the writer intended to write "قﺮ
ّﺸﻟا"(east) not "قﻮّﺸﻟا"(desire) but a
typographical error yielded a sentence that does not make sense. Statistics given by
Mitton (cited in Verberne, 2002) show that hidden errors count for 40% of all spelling
errors. This high number demonstrates the need for studying this kind of errors.
Several researchers have taken an interest in this problem, Golding studied this kind
o
f errors for the English language and proposed multiple correction methods such as
the Bayesian method (Golding, 1995), the trigram-based method (Golding and Scha-
bes, 1996) and the Winnow method (Golding and Dan Roth, 1999). Chinese was also
studied by Xiaolong and Jianhua (2001). Swedish was the subject of a similar study
by Bigert and Knutsson (2002).
Even though Arabic has characteristics that in
crease the probability of such errors
occurring, there is not any research done in the subject of hidden errors for Arabic. In
this paper, we describe a multi-agent system that allows the detection and correction
of hidden errors, occurring in Arabic texts. Due to the complexity of the problem, we
made some assumptions to restrict the scope of our investigation: first, we did not
take into account the vowel markings in words and assumed that there is only one
hidden error per sentence. Second, we assumed that the error resulted from one ele-
Ben Othmane Zribi C., Ben Fraj F. and Ben Ahmed M. (2005).
A Multi-Agent System for Detecting and Correcting “Hidden” Spelling Errors in Arabic Texts.
In Proceedings of the 2nd International Workshop on Natural Language Understanding and Cognitive Science, pages 149-154
DOI: 10.5220/0002556601490154
Copyright
c
SciTePress