Fast Arabic Glyph Recognizer based on Haar Cascade Classifiers

Ashraf AbdelRaouf, Colin A. Higgins, Tony Pridmore, Mahmoud I. Khalil

2014

Abstract

Optical Character Recognition (OCR) is an important technology. The Arabic language lacks both the variety of OCR systems and the depth of research relative to Roman scripts. A machine learning, Haar-Cascade classifier (HCC) approach was introduced by Viola and Jones (Viola and Jones 2001) to achieve rapid object detection based on a boosted cascade Haar-like features. Here, that approach is modified for the first time to suit Arabic glyph recognition. The HCC approach eliminates problematic steps in the pre-processing and recognition phases and, most importantly, the character segmentation stage. A recognizer was produced for each of the 61 Arabic glyphs that exist after the removal of diacritical marks. These recognizers were trained and tested on some 2,000 images each. The system was tested with real text images and produces a recognition rate for Arabic glyphs of 87%. The proposed method is fast, with an average document recognition time of 14.7 seconds compared with 15.8 seconds for commercial software.

References

  1. Abdelazim, H. Y. (2006). Recent Trends in Arabic Character Recognition. The sixth Conference on Language Engineering, Cairo - Egypt, The Egyptian Society of Language Engineering.
  2. AbdelRaouf, A., C. Higgins and M. Khalil (2008). A Database for Arabic printed character recognition. The International Conference on Image Analysis and Recognition-ICIAR2008, Póvoa de Varzim, Portugal, Springer Lecture Notes in Computer Science (LNCS) series.
  3. AbdelRaouf, A., C. Higgins, T. Pridmore and M. Khalil (2010). "Building a Multi-Modal Arabic Corpus (MMAC)." The International Journal of Document Analysis and Recognition (IJDAR) 13(4): 285-302.
  4. Adolf, F. (2003) "How-to build a cascade of boosted classifiers based on Haar-like features.".
  5. Al-Marakeby, A., F. Kimura, M. Zaki and A. Rashid (2013). "Design of an Embedded Arabic Optical Character Recognition." Journal of Signal Processing Systems 70(3): 249-258.
  6. Alginahi, Y. M. (2013). "A survey on Arabic character segmentation." International Journal on Document Analysis and Recognition (IJDAR) 16(2): 105-126.
  7. Barros, R. C., M. a. P. Basgalupp, A. e. C. P. L. F. d. Carvalho and A. A. Freitas (2011). "A Survey of Evolutionary Algorithms for Decision-Tree Induction." IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews pp(99): 1-22.
  8. Box, G. E. P. and M. E. Muller (1958). "A Note on the Generation of Random Normal Deviates." The Annals of Mathematical Statistics 29(2): 610-611.
  9. Bradski, G. (2000). "The OpenCV Library." Dr. Dobb's Journal of Software Tools.
  10. Bradski, G. and A. Kaehler (2008). Learning OpenCV: Computer Vision with the OpenCV Library, O'Reilly Media, Inc.
  11. Consortium, T. U. (2003). The Unicode Consortium. The Unicode Standard, Version 4.1.0, Boston, MA, Addison-Wesley: 195-206.
  12. Consortium, T. U. (2013). The Unicode Consortium. The Unicode Standard, Version 6.3, Boston, MA, AddisonWesley: 195-206.
  13. Crow, F. C. (1984). "Summed-Area Tables for Texture Mapping." SIGGRAPH Computer Graphics 18(3): 207-212.
  14. IRIS (2004). Readiris Pro 10.
  15. Kasinski, A. and A. Schmidt (2010). "The architecture and performance of the face and eyes detection system based on the Haar cascade classifiers." Pattern Analysis and Applications 13(2): 197-211.
  16. Kohavi, R. and F. Provost (1998). "Glossary of Terms. Special Issue on Applications of Machine Learning and the Knowledge Discovery Process." Machine Learning 30: 271-274.
  17. Lienhart, R., A. Kuranov and V. Pisarevsky (2002). Empirical Analysis of Detection Cascades of Boosted Classifiers for Rapid Object Detection. 25th Pattern Recognition Symposium (DAGM03), Madgeburg, Germany.
  18. Lienhart, R. and J. Maydt (2002). An Extended Set of Haar-like Features for Rapid Object Detection. IEEE International Conference of Image Processing (ICIP 2002), New York, USA.
  19. Lorigo, L. M. and V. Govindaraju (May, 2006). "Offline Arabic Handwriting Recognition: A Survey." IEEE Transactions on Pattern Analysis and Machine Intelligence 28(5): 712-724.
  20. Messom, C. and A. Barczak (2006). Fast and Efficient Rotated Haar-like Features using Rotated Integral Images. Australian Conference on Robotics and Automation (ACRA2006).
  21. Mohan, b. A., C. Papageorgiou and T. Poggio (2001). "Example-Based Object Detection in Images by Components." IEEE Transactions on Pattern Analysis and Machine Intelligence 23(4): 349-361.
  22. Naz, S., K. Hayat, M. I. Razzak, M. W. Anwar and H. Akbar (2013). Arabic script based character segmentation: A review. Computer and Information Technology (WCCIT), 2013 World Congress on, IEEE.
  23. OpenCV (2002) "Rapid Object Detection With A Cascade of Boosted Classifiers Based on Haar-like Features." OpenCV haartraining Tutorial.
  24. Papageorgiou, C. P., M. Oren and T. Poggio (1998). A General Framework for Object Detection. 6th International Conference on Computer Vision, Bombay, India: 555-562.
  25. Schapire, R. E. (2002). The Boosting Approach to Machine Learning, An Overview. MSRI Workshop on Nonlinear Estimation and Classification, 2002, Berkeley, CA, USA.
  26. Seo, N. (2008) "Tutorial: OpenCV haartraining (Rapid Object Detection With A Cascade of Boosted Classifiers Based on Haar-like Features).".
  27. Slimane, F., S. Kanoun, J. Hennebert, R. Ingold and A. M. Alimi (2013). Benchmarking Strategy for Arabic Screen-Rendered Word Recognition. Guide to OCR for Arabic Scripts. H. E. A. Volker Märgner, Springer London: 423-450.
  28. Sonka, M., V. Hlavac and R. Boyle (1998). Image Processing: Analysis and Machine Vision, Thomson Learning Vocational.
  29. Unicode (1991-2006) "Arabic Shaping " Unicode 5.0.0.
  30. Viola, P. and M. Jones (2001). Rapid Object Detection using a Boosted Cascade of Simple Features. IEEE Conference on Computer Vision and Pattern Recognition (CVPR01), Kauai, Hawaii.
  31. Wang, G.-h., J.-c. Deng and D.-b. Zhou (2013). "Face Detection Technology Research Based on AdaBoost Algorithm and Haar Features." 1223-1231.
Download


Paper Citation


in Harvard Style

AbdelRaouf A., A. Higgins C., Pridmore T. and I. Khalil M. (2014). Fast Arabic Glyph Recognizer based on Haar Cascade Classifiers . In Proceedings of the 3rd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-018-5, pages 350-357. DOI: 10.5220/0004925803500357


in Bibtex Style

@conference{icpram14,
author={Ashraf AbdelRaouf and Colin A. Higgins and Tony Pridmore and Mahmoud I. Khalil},
title={Fast Arabic Glyph Recognizer based on Haar Cascade Classifiers},
booktitle={Proceedings of the 3rd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2014},
pages={350-357},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004925803500357},
isbn={978-989-758-018-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 3rd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - Fast Arabic Glyph Recognizer based on Haar Cascade Classifiers
SN - 978-989-758-018-5
AU - AbdelRaouf A.
AU - A. Higgins C.
AU - Pridmore T.
AU - I. Khalil M.
PY - 2014
SP - 350
EP - 357
DO - 10.5220/0004925803500357