Authors:
Louisa Kessi
;
Frank Lebourgeois
;
Christophe Garcia
and
Jean Duong
Affiliation:
Université de Lyon and LIRIS, France
Keyword(s):
Document Image Analysis, Color Processing, Business Document, Mathematical Morphology, Color Morphology.
Related
Ontology
Subjects/Areas/Topics:
Applications and Services
;
Color and Texture Analyses
;
Computer Vision, Visualization and Computer Graphics
;
Document Imaging in Business
;
Image and Video Analysis
;
Image Formation and Preprocessing
;
Image Formation, Acquisition Devices and Sensors
Abstract:
This paper presents the first fully automatic color analysis system suited for business documents. Our pixel-based
approach uses mainly color morphology and does not require any training, manual assistance, prior
knowledge or model. We developed a robust color segmentation system adapted for invoices and forms
with significant color complexity and dithered background. The system achieves several operations to
segment automatically color images, separate text from noise and graphics and provides color information
about text color. The contribution of our work is Tree-fold. Firstly, it is the usage of color morphology to
simultaneously segment both text and inverted text. Our system processes inverted and non-inverted text
automatically using conditional color dilation and erosion, even in cases where there are overlaps between
the two. Secondly, it is the extraction of geodesic measures using morphological convolution in order to
separate text, noise and graphical elements. T
hirdly, we develop a method to disconnect characters touching
or overlapping graphical elements. Our system can separate characters that touch straight lines, split
overlapped characters with different colors and separate characters from graphics if they have different
colors. A color analysis stage automatically calculates the number of character colors. The proposed system
is generic enough to process a wide range of images of digitized business documents from different origins.
It outperforms the classical approach that uses binarization of greyscale images.
(More)