Comparative Study of Two Segmentation Methods of Handwritten Arabic Text - MM-OIC and HT-MM

Fethi Ghazouani, Samia Snoussi Maddouri, Fadoua Bouafif Samoud

Abstract

We present in this paper a comparative study of two segmentation methods of handwritten Arabic text. The first method is a combination of the Mathematical Morphology (MM) and the algorithm of construction of the Outer Isothetic Cover of a digital object (OIC) named MM-OIC. The second method uses the Hough Transform (HT) and MMto segment the handwriting Arabic script called HT-MM. These methods are applied in two levels of segmentation: text lines and Pieces of Words. The two proposed methods are evaluated and compared to a set of documents selected from three databases: IFN/ENIT-database (17 documents), BSB (16 documents) and KSU (30 documents) online databases. The average rate line segmentation of MM-OIC is 75%, and of HT-MM is 45%. The average rate of PAW segmentation acheive 89% for the MM-OIC and 70% for the HT-MM method. The efficiency of the MM-OIC method is explained by the fact that this method can extract the approximate form of writing, and sometimes it can exceed some problems that are related to the Arabic script such as the overlapping lines and diacratical symbols.

References

  1. Abdulkader, A. Two-tier approach for arabic offline handwriting recognition. In IWFHR06, pages 65000T.1- 65000T.11.
  2. Bennasri, A., Zahour, A., and Taconet, B. (1999). Extraction des lignes dun texte manuscrit arabe. In Vision Interface99, pages 42-48.
  3. Biswas, A., Bhowmick, P., and Bhattacharya, B. (2010). Construction of isothetic covers of a digital object: A combinatorial approach. Journal of Visual Communication and Image Representation, 21:295-310.
  4. Bouafif, S. F., Snoussi, S. M., and Ellouze, N. (2006). Détection des lignes pré-imprimées de chèques bancaires tunisiens par la transformation de hough en vue de l'extraction de l'écriture manuscrite. In Séminaire Automatique Industrie (SAI), pages 45-52.
  5. Bouafif, S. F., Snoussi, S. M., and Ellouze, N. (2012). A hybrid method for three segmentation level of handwritten arabic script. IAJIT'12, 9(2):117-123.
  6. Bukhari, S. S., F., S., and Breuel, T. M. (2009). Use of the hough transformation to detect lines and curves in pictures. In Communication of the ACM, pages 446- 450, Barcelona, Spain.
  7. Duda, R. O. and Hart, P. E. (1972). Script-independent handwritten textlines segmentation using active contours. In ICDAR'72, pages 11-15.
  8. Li, Y., Zheng, Y., Doermann, D., and Jaeger, S. (2008). Script-independent text line segmentation in freestyle handwritten documents. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(8):1313- 1329.
  9. Malleron, V., Eglin, V., Emptoz, H., Dord-Crouslé, S., and Régnier, P. (2009). Text lines and snippets extraction for 19th century handwriting documents layout analysis. In ICDAR'09, pages 1001-1005.
  10. Miled, H., Cheriet, M., and Olivier, C. (1998). Multi-level arabic handwritten words recognition. In The International Workshops on Advances in Pattern Recognition (IAPR), pages 944-951, Sydney, Australia.
  11. Nicolaou, A. and Gatos, B. (2009). Handwritten text line segmentation by shredding text into its lines. In ICDAR'09, pages 626-630, Barcelona, Spain.
  12. Ouwayed, N., and Belaid, A. (2012). A general approach for multi-oriented text line extraction of handwritten documents. IJDAR12, 15(4):297-314.
  13. Sakar, A., Biswas, A., Bhowmick, P., and Bhattacharya, B. (2010). Word segmentation and baseline detection in handwritten documents using isothetic covers. In ICFHR10, pages 445-450.
  14. Sarfaz, M., Nawaz, S. N., and Al-Khuraidly, A. (2003). Offline arabic recognition system. In International conference on geometric modeling and graphics, pages 30-35.
  15. Snoussi, S. M. (2003). Modèle prespectif neuronal à vision globale-locale pour la reconnaissance de mots arabe omni-scripteurs. PhD thesis, ENIT.
  16. Snoussi, S. M., ElAbed, H., Bouafif, F. S., Bouriel, K., and Ellouze, N. (2008). Baseline extraction : Comparison of six methods on ifn/enit database. In ICFHR'08, page 1170.
  17. Zahour, A., Likforman-Sulem, L., Boussalaa, W., and Taconet, B. (2007). Text line segmentation of historical arabic documents. In ICDAR'07, pages 138-142, Curitiba, Paran, Brazil.
  18. Zahour, A., Taconett, B., Likforman-Sulem, L., and Boussellaa, W. (2008). Overlapping and multitouching text-line segmentation by block covering analysis. Pattern Analysis and Applications, 12(4):335-351.
Download


Paper Citation


in Harvard Style

Ghazouani F., Snoussi Maddouri S. and Bouafif Samoud F. (2014). Comparative Study of Two Segmentation Methods of Handwritten Arabic Text - MM-OIC and HT-MM . In Proceedings of the 3rd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-018-5, pages 528-535. DOI: 10.5220/0004764205280535


in Bibtex Style

@conference{icpram14,
author={Fethi Ghazouani and Samia Snoussi Maddouri and Fadoua Bouafif Samoud},
title={Comparative Study of Two Segmentation Methods of Handwritten Arabic Text - MM-OIC and HT-MM},
booktitle={Proceedings of the 3rd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2014},
pages={528-535},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004764205280535},
isbn={978-989-758-018-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 3rd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - Comparative Study of Two Segmentation Methods of Handwritten Arabic Text - MM-OIC and HT-MM
SN - 978-989-758-018-5
AU - Ghazouani F.
AU - Snoussi Maddouri S.
AU - Bouafif Samoud F.
PY - 2014
SP - 528
EP - 535
DO - 10.5220/0004764205280535