Table 6: Proposed approach vs. current approaches.
Approach Writing Vocabulary size Top1 Top2 Top3
Analytic
(Kanoun et al., 2005)
Printed 1000 74 81.2 83.9
Analytic
(Kammoun and Ennaji, 2004)
Printed 1423 81.3 95.7 99.7
Analytic
(Touj et al., 2007)
Handwritten 25 88.7 - -
Holistic
(Our approach)
Printed 5757 88.1 96.9 98.2
The proposed architecture of HMMs is
personalized. It embodies knowledge about the
linguistic properties of construction of Arabic words
around the roots. This knowledge inspired us in the
choice of states of consonants and affixes (prefix,
infix, and suffix) and the choice of transitions
between these states. The learning was performed on
a database of over 11000 samples of words. The
training of one HMM of a given root is performed
via hundreds of words derived from this root
following various schemes and different
conjugations. Very satisfactory rates of recognition
(top1=88% and top2=96.92%) were obtained in a
phase of test made on more than 5700 samples.
To improve our solution, we conducted an
analysis of the collisions which allowed us to
encircle the problems of ambiguity between roots
aiming to resolve them at the level of post-treatment
phase. Indeed, the most striking problem affects the
similar roots that present the same global primitives
although their letters are different. Hybridizing the
approach, by a local refinement of these particular
letters, could significantly increase the scores of the
system. Finally, despite that models of hidden
Markov are known by their robustness absorption of
writing variability and despite that they are usually
used in analytical approaches (with local primitives),
we used global primitives and we were able to reach
an interesting rate of recognition, thanks to the
integration of morphological knowledge.
Our approach could be also directly applied on
handwriting since global primitives are easy to
extract. Then, in the mean run, since we have been
already reassured regarding the linguistic based
structures of HMMs used just on global features, we
could switch to the use of local primitives such as
densities, invariable moments, Hu moments, etc.
REFERENCES
Avila, M. (1996). Optimisation de modèles markoviens
pour la reconnaissance de l'écrit. PHD Thesis,
University of Rouen.
Ben Amara, N., Belaïd A., Ellouze, N. (2000). Utilisation
des modèles markoviens en reconnaissance de
l'écriture arabe : Etat de l’art. Colloque International
Francophone sur l'Écrit et le Document (CIFED).
Lyon, France, pp 181-191.
Ben Cheikh, I., Kacem, A. and Belaïd, A. (2010). A
neural-linguistic approach for the recognition of a
wide Arabic word lexicon. 17
th
Document Recognition
and Retrieval Conference, part of the IS&T-SPIE
Electronic Imaging Symposium, San Jose, CA, USA,
January 17-22, 2010, SPIE Proceedings, pp 1-10.
Ben Cheikh, I., Belaïd, A. and Kacem, A. (2008). A novel
approach for the recognition of a wide Arabic
handwritten word lexicon. 19
th
International
Conference on Pattern Recognition (ICPR), IEEE,
Tampa, Florida, USA, pp. 1-4.
Bejaoui, M. (1985). Etude et réalisation d’un système
expert appliqué à l’analyse morpho-syntaxique de
phrases en langue arabe: méthode ascendante. PHD
Thesis, University Paul Sabatier, Toulouse (Sciences),
France.
Ben Hamadou, A. (1993). Vérification et Correction
Automatiques par Analyse Affixale des Textes Ecrits
en Langage Naturel. PHD Thesis, Faculty of Sciences
of Tunis, Tunisia.
Cheriet, M., Beldjehem M. (2006). Visual Processing of
Arabic Handwriting: Challenges and New Directions.
Summit Arabic and Chinese Handwriting Recognition,
Springer, India, September 27-28, pp 1-21.
El Yacoubi, A. (1996). Modélisation markovienne de
l'écriture manuscrite, application à la reconnaissance
des Adresses postales. PHD Thesis, University of
Rennes 1.
George Saon, Abdel Belaïd (1997). High Performance
Unconstrained Word Recognition System Combining
HMMs and Markov Random Fields. International
Journal of Pattern Recognition and Artificial
Intelligence (IJPRAI), Volume 11(5), pp 771-788.
Kammoun, W., Ennaji, A. (2004). Reconnaissance de
Textes Arabes à Vocabulaire Ouvert. Colloque
International Francophone sur l'Écrit et le Document
(CIFED), France.
Kanoun, S., Alimi, A., Lecourtier, Y. (2005). Affixal
Approach for Arabic Decomposable Vocabulary
Recognition: A Validation on Printed Word in Only
One Font. Eighth International Conference on
Document Analysis and Recognition (ICDAR), IEEE
HMMbasedClassifierfortheRecognitionofRootsofaLargeCanonicalArabicVocabulary
251