In this work, a method for image segmentation
and classification is presented, which is suitable for
use in mobile applications. Shots of magazines that
were taken by a smartphone camera are considered as
images. The method has to locate text or image re-
gions in the image, extract them and provide them for
further processing, for example for use with an OCR
application. The procedure must satisfy the following
claims for applicability of mobile devices:
i) High speed
ii) Low demands on resources
iii) Easy portability / Cross-platform
iv) Low complexity of the implementation to ensure
the practical application.
The resulting prototype should be ported with lit-
tle effort on various mobile Platforms (e.g. iOS, An-
droid). The prototype is therefore implemented in
Python using the OpenCV library. In the remainder,
first existing methods are presented for image seg-
mentation. Their performance and portability to mo-
bile devices are discussed. Then, a suitable procedure
for use on mobile devices, together with the theoreti-
cal basis is explained. The results of the prototypical
implementation of the process are presented, analyzed
and discussed and an outlook for future work is given.
2 RELATED WORK
In the literature, there are a variety of approaches for
the segmentation of documents. They can be divided
into three categories: bottom-up, top-down and hy-
brid methods (Lin et al., 2006). See (Ettl, 2012) for
more details.
Yuan & Tan (Yuan and Tan., 2000) segment the source
image into text and non-text areas and let the OCR
engine carry out a refined analysis. For this purpose
they require certain properties of text fields: It can be
assumed that words in text fields have similar height,
alignment, and spacing. Unfortunately, they do not
provide runtime data. Single statement: The algo-
rithm is ”relatively fast”.
Mollah et al. (Mollah et al., 2010) describe a method
of text segmentation on Business cards, which is spe-
cially designed for use on mobile devices. First, the
background is removed. Then, detection of text com-
ponents is performed using Connected Components
in isolated information areas. This procedure takes
between 0.06 seconds (0.3 megapixels) and 0.6 sec-
onds (3 megapixels) on a dual-core 1.73 GHz proces-
sor with 1 GB RAM.
Lin, Tapamo, and Ndovie (Lin et al., 2006) present a
method that segments a document using a gray scale
matrix for the detection of textures. Regions are clas-
sified in text, image and empty areas. The running
time of this process on a Pentium 4 is about 3 seconds
for an image with a resolution of 1449x2021 pixels.
3 IMAGE SEGMENTATION ON
MOBILE DEVICES
In this work, some algorithms that are frequently used
in the image segmentation are combined and tested.
The aim is to develop a two-stage process:
In the first step, a pre-processing method is car-
ried out and all the foreground objects are separated
from the background (object extraction). Foreground
objects can be text, images, or drawings. The algo-
rithms are applied to the entire original image.
In the second step, the detected objects are clas-
sified into text and non-text objects (object classifica-
tion). The algorithms are applied only to certain areas
of the source image’s regions of interest (ROI). This
has the advantage that individual objects may be ex-
amined according to need with various methods. In
addition, the running time is reduced when small ar-
eas of the image are edited instead of the whole im-
age.
From the available approaches a selection of ap-
propriate algorithms (Burger and Burge, 2009) for
rapid segmentation was selected. An important crite-
rion based on which algorithms were selected, was the
performance. The histogram analysis was deferred in
favor of heuristic lines of text recognition, because the
running time is worse and it has other weaknesses in
the distinction between text and drawings. The Canny
edge detector was rejected because it didn’t add value
to the Connected Components Analysis. The edge
detector produces a binary edge image, whereas the
CCA supplies object contours that are stored in a list
or a tree structure that can easily be further processed.
The original image is first smoothed by a Gaus-
sian filter. This is to reduce the noise on the one hand,
by filtering out small noise pixels. On the other hand
object contours are highlighted. Depending on the
degree of blurring, individual words blur into lines
of text, lines of text into blocks, and smaller objects
are grouped into larger areas. This allows text lines,
blocks of text, photos, and graphics to delimit easily
from the background of the picture. This effect is en-
hanced by the application of the Gaussian pyramid,
which also increased the processing speed by reduc-
ing the image resolution. In the next step, the image
is binarized by an adaptive thresholding method and
thus a binary mask is created. Background pixels are
set to 0 and foreground or object pixels to 255.
On the mask Connected Components Analysis
ClassificationofTextandImageAreasinDigitizedDocumentsforMobileDevices
89