HMM based Classifier for the Recognition of Roots of a Large Canonical Arabic Vocabulary

Imen Ben Cheikh, Zeineb Zouaoui

2013

Abstract

The complexity of the recognition process is strongly related to language, the type of writing and the vocabulary size. Our work represents a contribution to a system of recognition of large canonical Arabic vocabulary of decomposable words derived from tri-consonantal roots. This system is based on a collaboration of three morphological classifiers specialized in the recognition of roots, schemes and conjugations. Our work deals with the first classifier. It is about proposing a root classifier based on 101 Hidden Markov Models, used to classify 101 tri-consonantal roots. The models have the same architecture endowed with Arabic linguistic knowledge. The proposed system deals, up to now, with a vocabulary of 5757 words. It has been learned then tested using a total of more than 17000 samples of printed words. Obtained results are satisfying and the top2 recognition rate reached 96%.

References

  1. Avila, M. (1996). Optimisation de modèles markoviens pour la reconnaissance de l'écrit. PHD Thesis, University of Rouen.
  2. Ben Amara, N., Belaïd A., Ellouze, N. (2000). Utilisation des modèles markoviens en reconnaissance de l'écriture arabe : Etat de l'art. Colloque International Francophone sur l'Écrit et le Document (CIFED). Lyon, France, pp 181-191.
  3. Ben Cheikh, I., Kacem, A. and Belaïd, A. (2010). A neural-linguistic approach for the recognition of a wide Arabic word lexicon. 17th Document Recognition and Retrieval Conference, part of the IS&T-SPIE Electronic Imaging Symposium, San Jose, CA, USA, January 17-22, 2010, SPIE Proceedings, pp 1-10.
  4. Ben Cheikh, I., Belaïd, A. and Kacem, A. (2008). A novel approach for the recognition of a wide Arabic handwritten word lexicon. 19th International Conference on Pattern Recognition (ICPR), IEEE, Tampa, Florida, USA, pp. 1-4.
  5. Bejaoui, M. (1985). Etude et réalisation d'un système expert appliqué à l'analyse morpho-syntaxique de phrases en langue arabe: méthode ascendante. PHD Thesis, University Paul Sabatier, Toulouse (Sciences), France.
  6. Ben Hamadou, A. (1993). Vérification et Correction Automatiques par Analyse Affixale des Textes Ecrits en Langage Naturel. PHD Thesis, Faculty of Sciences of Tunis, Tunisia.
  7. Cheriet, M., Beldjehem M. (2006). Visual Processing of Arabic Handwriting: Challenges and New Directions. Summit Arabic and Chinese Handwriting Recognition, Springer, India, September 27-28, pp 1-21.
  8. El Yacoubi, A. (1996). Modélisation markovienne de l'écriture manuscrite, application à la reconnaissance des Adresses postales. PHD Thesis, University of Rennes 1.
  9. George Saon, Abdel Belaïd (1997). High Performance Unconstrained Word Recognition System Combining HMMs and Markov Random Fields. International Journal of Pattern Recognition and Artificial Intelligence (IJPRAI), Volume 11(5), pp 771-788.
  10. Kammoun, W., Ennaji, A. (2004). Reconnaissance de Textes Arabes à Vocabulaire Ouvert. Colloque International Francophone sur l'Écrit et le Document (CIFED), France.
  11. Kanoun, S., Alimi, A., Lecourtier, Y. (2005). Affixal Approach for Arabic Decomposable Vocabulary Recognition: A Validation on Printed Word in Only One Font. Eighth International Conference on Document Analysis and Recognition (ICDAR), IEEE Computer Society, Seoul, pp 1025-29.
  12. Kanoun, S. (2002). Identification et Analyse de Textes Arabes par Approche Affixale. PHD Thesis, University of Science and Technology of Rouen.
  13. McClelland, J. L., Rumelhart, D. E. (1981). An interactive activation model of context effects in letter perception. In Psychological Review, 88: pp. 375-407.
  14. McClelland, J. L., Rumelhart, D. E. (1985). Distributed memory and the representation of general and specific information. In Journal of Experimental Psychology: General, pp.159-188.
  15. Touj, S., Ben Amara, N., Amiri, H. (2007). A Hybrid Approach for Off-line Arabic Handwriting Recognition Based on a Planar Hidden Markov Modeling. 9th International Conference on Document Analysis and Recognition (ICDAR), IEEE Computer Society, Brazil, pp 964-968.
Download


Paper Citation


in Harvard Style

Ben Cheikh I. and Zouaoui Z. (2013). HMM based Classifier for the Recognition of Roots of a Large Canonical Arabic Vocabulary . In Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-8565-41-9, pages 244-252. DOI: 10.5220/0004335202440252


in Bibtex Style

@conference{icpram13,
author={Imen Ben Cheikh and Zeineb Zouaoui},
title={HMM based Classifier for the Recognition of Roots of a Large Canonical Arabic Vocabulary},
booktitle={Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2013},
pages={244-252},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004335202440252},
isbn={978-989-8565-41-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - HMM based Classifier for the Recognition of Roots of a Large Canonical Arabic Vocabulary
SN - 978-989-8565-41-9
AU - Ben Cheikh I.
AU - Zouaoui Z.
PY - 2013
SP - 244
EP - 252
DO - 10.5220/0004335202440252