between each foreground superpixel and the centroid
of cluster belonging to it is computed. Then, the fore-
ground superpixels are sorted in descending order ac-
cording to the computed Euclidean distance values in
such a way that the first processed foreground super-
pixel is the one that has a higher Euclidean distance
value. The higher the values of the computed Eu-
clidean distances, the more there is a high probability
that the foreground superpixel is improperly labeled
since it is far from the centroid of cluster belonging
to it. Thus, the first processed foreground superpix-
els are those that have high values of Euclidean dis-
tances by using a multi-scale majority voting tech-
nique. By performing a multi-scale approach in the
majority voting technique, small isolated groups of
superpixels will be removed. Indeed, a local deci-
sion on the label of each selected foreground super-
pixel is taken using the maximum number or majority
of superpixel labels and pixel labels belonging to it,
which is performed at the same four pre-defined sizes
of sliding windows in the Gabor feature extraction
step. Then, if the processed foreground superpixel has
a new label, the pixels belonging to it will have the
same new label. Afterwards, the next processed fore-
ground superpixel is one that has a smaller Euclidean
distance value than the former foreground superpixel.
The labels of foreground superpixels and the pixels
belonging to them are updated on each run of multi-
scale majority voting technique on each foreground
superpixel to ensure a relevant refinement of the pixel
labeling results. Since the first step of post-processing
of the proposed algorithm “Post-processing 1” has
been performed, a post-processed 1 pixel-labeled doc-
ument image is obtained (cf. Figure 3(g)).
2.6 Post-processing 2
As already seen on the proposed algorithm (cf. Fig-
ure 1), our goal is to find homogeneous regions de-
fined by common characteristics or similar texture
features as easily, quickly, and automatically as pos-
sible. So since the first step of post-processing “Post-
processing 1” has been performed, our output data
consists of a post-processed 1 pixel-labeled document
image. Nevertheless, we need to identify group of
pixels sharing common characteristics or similar tex-
tural properties at this stage in order to extract ho-
mogenous region (i.e. to partition text into columns,
paragraphs, lines or words, and identify the graphi-
cal regions). Therefore, we aim in the second step of
post-processing “Post-processing 2” to fill automat-
ically the space within each pixel in order to deter-
mine the largest CCs illustrating similar content re-
gions by replacing a sequence of background pixels
with foreground ones and afterwards grouping pixels
which share common characteristics or similar textu-
ral properties from the post-processed 1 pixel-labeled
document image (cf. Section 2.5, Figure 3(g)).
First, a binarization step is performed using a stan-
dard parameter-free binarization method, the Otsu’s
method, on the enhanced document image (cf. Sec-
tion 2.2, Figure 3(d)) to obtain a binarized enhanced
document image (cf. Figure 3(e)) and subsequently
to retrieve the CCs (Otsu, 1979). Then, the majority
voting technique is applied on each extracted CC from
the binarized enhanced document image by comput-
ing the maximum number or majority of pixel labels
belonging to the localized CC on the post-processed
1 pixel-labeled document image (cf. Section 2.5, Fig-
ure 3(g)). Therefore, using the majority voting tech-
nique, the extracted CCs from the binarized enhanced
document image are labeled according to the post-
processed 1 pixel-labeled document image. The re-
sulting image of labeling the extracted CCs is illus-
trated in Figure 3(h).
Since the extracted CCs from the binarized en-
hanced document image are labeled, a color layer sep-
aration task is performed to split the CCs according
to their labels. Therefore, a document image contain-
ing only single color CCs is generated for each color
layer. For instance, in the example illustrated in Fig-
ure 3, there are two colors representing separately the
graphical (blue) and textual (green) CCs in Figures
3(i) and 3(m), respectively. The color layer separation
task ensures the segmentation of the extracted CCs
according to their label (i.e. content type). When we
separate the extracted CCs according to their label,
the issues caused by the complex, dense, and over-
lapping document layout of HDIs will be overcome.
The identification of homogeneous regions is based
on finding the largest CCs. By replacing a sequence
of background pixels with foreground ones and after-
wards grouping pixels which share common charac-
teristics or similar textural properties from a pixel-
labeled document image, the extraction of homoge-
neous regions will be more accurate and relevant. In-
deed, the idea is to fill automatically the space within
each component to partition text into columns, para-
graphs, lines or words on the one hand, and identify
the graphical regions on the other hand.
So an adaptive RLSA is proposed in this work,
which is a modified version of the state-of-the-art
RLSA (Wahl et al., 1982). The RLSA studies the
spaces between black pixels in order to link neighbor-
ing black areas by applying the run-length smearing
both horizontally and vertically. It operates by replac-
ing a horizontal (vertical, respectively) sequence of
background pixels with foreground ones if the num-
VISAPP2015-InternationalConferenceonComputerVisionTheoryandApplications
50