AColDSS: Robust Unsupervised Automatic Color Segmentation System for Noisy Heterogeneous Document Images

Christophe Garcia, Frank Lebourgeois, Louisa Kessi

Abstract

We present the first fully automatic color analysis system suited for noisy heterogeneous documents. We developed a robust color segmentation system adapted for business documents and old handwritten document with significant color complexity and dithered background. We have developed the first fully data-driven pixel-based approach that does not need a priori information, training or manual assistance. The system achieves several operations to segment automatically color images, separate text from noise and graphics and provides color information about text color. The contribution of our work is four-fold: Firstly, it does not require any connected component analysis and simplifies the extraction of the layout and the recognition step undertaken by the OCR. Secondly, it is the usage of color morphology to simultaneously segment both text and inverted text using conditional color dilation and erosion even in cases where there are overlaps between the two. Thirdly, our system removes efficiently noise and speckles from dithered background and automatically suppresses graphical elements using geodesic measurements. Fourthly, we develop a method to splits overlapped characters and separates characters from graphics if they have different colors. The proposed Automatic Color Document Processing System has archived 99 % of correctly segmented document and has the potential to be adapted into different document images. The system outperformed the classical approach that uses binarization of the grayscale image.

References

  1. L. Bottou,P. Haffner, PG. Howard, Y. LeCun, Djvu: analyzing and compressing scanned documents for internet distribution. ICDAR.
  2. A. Ouji, et al., Chromatic / achromatic separation in noisy document images, ICDAR 2011.
  3. E. Carel et al., Dominant Color Segmentation of Administrative Document Images by Hierarchical Clustering , DocEng 2013.
  4. L.Kessi ,et al.. “AColDPS :Robust and Unsupervised Automatic Color Document Processing System”, VISAPP'15 (to appear).
  5. F.LeBourgeois, et al. Fast Integral MeanShift : Application to Color Segmentation of Document Images. ICDAR 2013.
  6. D. Gaceb et al. Adaptative Smart-Binarization Method for Images of Business Documents, in 12th ICDAR 2013, pp. 118-122.
  7. J. Chanussot et al., “Total ordering based on space filling curves for multivalued morpholgy”,ISMM'98 Amsterdam, pp 51-58.
  8. S. Bres, J.M. Jolion, F. Lebourgeois, in book Traitement et analyse des images numériques, Hermes 2003, 412p.
  9. S.Perreault et al, IEEE IP07, Median Filtering in Constant Time.
  10. Chassery et al, Géométrie discrète en analyse d'images,Hermes 91, Paris,358p.
Download


Paper Citation


in Harvard Style

Garcia C., Lebourgeois F. and Kessi L. (2015). AColDSS: Robust Unsupervised Automatic Color Segmentation System for Noisy Heterogeneous Document Images . In European Project Space on Computer Vision, Graphics, Optics and Photonics - EPS Berlin, ISBN 978-989-758-156-4, pages 50-63. DOI: 10.5220/0006164200500063


in Bibtex Style

@conference{eps berlin15,
author={Christophe Garcia and Frank Lebourgeois and Louisa Kessi},
title={AColDSS: Robust Unsupervised Automatic Color Segmentation System for Noisy Heterogeneous Document Images},
booktitle={European Project Space on Computer Vision, Graphics, Optics and Photonics - EPS Berlin,},
year={2015},
pages={50-63},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006164200500063},
isbn={978-989-758-156-4},
}


in EndNote Style

TY - CONF
JO - European Project Space on Computer Vision, Graphics, Optics and Photonics - EPS Berlin,
TI - AColDSS: Robust Unsupervised Automatic Color Segmentation System for Noisy Heterogeneous Document Images
SN - 978-989-758-156-4
AU - Garcia C.
AU - Lebourgeois F.
AU - Kessi L.
PY - 2015
SP - 50
EP - 63
DO - 10.5220/0006164200500063