SKEW CORRECTION IN DOCUMENTS WITH SEVERAL DIFFERENTLY SKEWED TEXT AREAS

P. Saragiotis, N. Papamarkos

Abstract

In this paper we propose a technique for detecting and correcting the skew of text areas in a document. The documents we work with may contain several areas of text with different skew angles. In the first stage, a text localization procedure is applied based on connected components analysis. Specifically, the connected components of the document are extracted and filtered according to their size and geometric characteristics. Next, the candidate characters are grouped using a nearest neighbour approach to form words, in a first step, and then text lines of any skew, in a second step. Using linear regression, two lines are estimated for each text line representing its top and bottom boundaries. The text lines in near locations with similar skew angles are grown to form text areas. These text areas are rotated independently to a horizontal or vertical plane. This technique has been tested and proved efficient and robust on a wide variety of documents including spreadsheets, book and magazine covers and advertisements.

References

  1. W.Y. Chen, S.Y. Chen, Adaptive page segmentation for color technical journal's cover images, Image and Vision Computing 16, pp. 855-877, 1998.
  2. B. Gatos, N. Papamarkos and C. Chamzas, Skew detection and text line position determination in digitized documents, Pattern Recognition, Vol. 30, No. 9, pp. 1505-1519, 1997.
  3. J.J. Hull, Document image skew detection: survey and anotated bibliography. In: Hull, J.J., Taylor, S.L. (Eds.), Document Analysis Systems II. World Scientific, pp. 40-64, 1998.
  4. Y. Lu, and C. L. Tan, A nearest-neighbor chain based approach to skew estimation in document images, Pattern Recogn. Lett. 24, 14, pp. 2315-2323, 2003.
  5. U.-V. Marti, H. Bunke, Using a statistical language model to improve the performance of an HMM-based Cursive Handwriting Recognition System, Internat. J. Pattern Recognit. Artificial Intell. 15 (1), 65-90, 2000.
  6. S. Messelodi, C.M. Modena, Automatic identication and skew estimation of text lines in real scene images, Pattern Recognition 32:5, 791-810, 1999.
  7. O. Okun, M. Pietikainen, and J. Sauvola, Document skew estimation without angle range restriction, International Jurnal on Document Analysis and Recognition, pp. 132-144, 1999.
  8. N. Otsu, A Threshold selection method from gray-level histograms, IEEE Tran. on System Man and Cybernetics, SMC-9 (1), pp. 62-69, 1979.
  9. C. Strouthopoulos, N. Papamarkos and C. Chamzas, Identification of text-only areas in mixed type documents, Engineering Applications of Artificial Intelligence, Vol. 10, No. 4, pp. 387-401, 1997.
  10. Y. Zhong, K. Karu, A.K. Jain, Locating text in complex color images, Pattern Recognition, 28 (10), pp. 1523- 1535, 1995.
Download


Paper Citation


in Harvard Style

Saragiotis P. and Papamarkos N. (2007). SKEW CORRECTION IN DOCUMENTS WITH SEVERAL DIFFERENTLY SKEWED TEXT AREAS . In Proceedings of the Second International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, ISBN 978-972-8865-73-3, pages 85-92. DOI: 10.5220/0002041800850092


in Bibtex Style

@conference{visapp07,
author={P. Saragiotis and N. Papamarkos},
title={SKEW CORRECTION IN DOCUMENTS WITH SEVERAL DIFFERENTLY SKEWED TEXT AREAS},
booktitle={Proceedings of the Second International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP,},
year={2007},
pages={85-92},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002041800850092},
isbn={978-972-8865-73-3},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Second International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP,
TI - SKEW CORRECTION IN DOCUMENTS WITH SEVERAL DIFFERENTLY SKEWED TEXT AREAS
SN - 978-972-8865-73-3
AU - Saragiotis P.
AU - Papamarkos N.
PY - 2007
SP - 85
EP - 92
DO - 10.5220/0002041800850092