AColDPS - Robust and Unsupervised Automatic Color Document Processing System

Louisa Kessi, Frank Lebourgeois, Christophe Garcia, Jean Duong

2015

Abstract

This paper presents the first fully automatic color analysis system suited for business documents. Our pixel-based approach uses mainly color morphology and does not require any training, manual assistance, prior knowledge or model. We developed a robust color segmentation system adapted for invoices and forms with significant color complexity and dithered background. The system achieves several operations to segment automatically color images, separate text from noise and graphics and provides color information about text color. The contribution of our work is Tree-fold. Firstly, it is the usage of color morphology to simultaneously segment both text and inverted text. Our system processes inverted and non-inverted text automatically using conditional color dilation and erosion, even in cases where there are overlaps between the two. Secondly, it is the extraction of geodesic measures using morphological convolution in order to separate text, noise and graphical elements. Thirdly, we develop a method to disconnect characters touching or overlapping graphical elements. Our system can separate characters that touch straight lines, split overlapped characters with different colors and separate characters from graphics if they have different colors. A color analysis stage automatically calculates the number of character colors. The proposed system is generic enough to process a wide range of images of digitized business documents from different origins. It outperforms the classical approach that uses binarization of greyscale images.

References

  1. L. Bottou, P. Haffner, PG. Howard, Y. LeCun, Djvu: analyzing and compressing scanned documents for internet distribution. ICDAR, 2001.
  2. K. Jung, J. Han, Hybrid approach to efficient text extraction in complex color images, PRL, V. 25, I. 6, 19 April 2004, Pages 679-699.
  3. D. Karatzas, et al., Color text segmentation in web images based onhuman perception Image and Vision Computing, Volume 25, Issue 5, 1 May 2007, Pages 564-577.
  4. E. Badekas et al., Text segmentation in color documents, IJIST, V. 16, I. 6, 2006, Pages: 262-274.
  5. Y. Peng, J. Xiao, Color-based clustering for text detection and extraction in image, ICM 2007, Pages 847-850.
  6. A. Ouji, et al., Chromatic /achromatic separation in noisy document images, ICDAR 2011.
  7. A. Ait Younes et al., Color Image Profiling Using Fuzzy Sets, TJEECS, 13(3):343-369, 2005.
  8. E. Carel et al., Dominant Color Segmentation of Administrative Document Images by Hierarchical Clustering, DocEng 2013.
  9. S. Perreault and Patrick Hebert, Median Filtering in Constant Time, IEEE Image Processing 2007.
  10. J. Chanussot & P. Lambert, “Total ordering based on space filling curves for multivalued morphology”, Proc. ISMM'98, June 1998, pp 51-58.
  11. Jean Serra, Image Analysis and Mathematical Morphology, Academic Press, London, 1982.
  12. E. Aptoula et al., Multivariate mathematical morphology applied to color image analysis. In Chapter 10: Multivariate Image Processing, 2009.
  13. P. Soille, Morphological image analysis: Principles and applications. In Springer, 2004. 2nd Edition.
  14. Comaniciu et al., 2002, MeanShift: A Robust Approach toward Feature Space Analysis, PAMI, Vol. 24, No. 5.
  15. Gaceb D., et al., Adaptative Smart-Segmentation Method for Images of Business Documents, ICDAR 2013, pp. 118-122. .
  16. F. Lebourgeois et al., Fast Integral MeanShift: Application to Color Segmentation of Document Images, ICDAR. 2013, IEEE ed. Washington, USA. pp. 52-56.
Download


Paper Citation


in Harvard Style

Kessi L., Lebourgeois F., Garcia C. and Duong J. (2015). AColDPS - Robust and Unsupervised Automatic Color Document Processing System . In Proceedings of the 10th International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2015) ISBN 978-989-758-089-5, pages 174-185. DOI: 10.5220/0005315801740185


in Bibtex Style

@conference{visapp15,
author={Louisa Kessi and Frank Lebourgeois and Christophe Garcia and Jean Duong},
title={AColDPS - Robust and Unsupervised Automatic Color Document Processing System},
booktitle={Proceedings of the 10th International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2015)},
year={2015},
pages={174-185},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005315801740185},
isbn={978-989-758-089-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 10th International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2015)
TI - AColDPS - Robust and Unsupervised Automatic Color Document Processing System
SN - 978-989-758-089-5
AU - Kessi L.
AU - Lebourgeois F.
AU - Garcia C.
AU - Duong J.
PY - 2015
SP - 174
EP - 185
DO - 10.5220/0005315801740185