IMAGE SEGMENTATION FOR OBJECT DETECTION
ON A DEEPLY EMBEDDED MINIATURE ROBOT
Alexander Jungmann
1
, Thomas Schierbaum
2
and Bernd Kleinjohann
1
1
Cooperative Computing & Communication Laboratory, University of Paderborn, Paderborn, Germany
2
Product Engineering, Heinz Nixdorf Institute, University of Paderborn, Paderborn, Germany
Keywords:
Image Segmentation, Run-length Encoding, Moments, Robotics.
Abstract:
In this paper, an image segmentation approach for object detection on the miniature robot BeBot - a deeply
embedded system - is presented. In order to enable the robot to detect and identify objects in its environment
by means of its camera, an efficient image segmentation approach was developed. The fundamental algorithm
bases on the region growing and region merging concept and identifies homogeneous regions consisting of
adjacent pixels with similar color. By internally representing a contiguous block of pixels in terms of run-
lengths, the computational effort of both the region growing and the region merging operation is minimized.
Finally, for subsequent object detection processes, a region is efficiently translated into a statistically feature
representation based on discretized moments.
1 INTRODUCTION
Embedded systems are usually very restricted with
respect to their memory and computational power.
The miniature robot BeBot (see Figure 1), which
combines an ARM Cortex-A8 600MHz processor,
256MB main memory and a small camera in a 9x9cm
chassis (Herbrechtsmeier et al., 2009), is a mobile
representative of an embedded system with such re-
strictions. In addition, it is able to explore its sur-
roundings by means of its differential chain drive.
However, for being able to act autonomously, the
robot has to perceive its environment by means of its
camera. For this purpose, an efficient image segmen-
tation approach that takes the mentioned restrictions
into account was developed and is presented within
the scope of this paper. Object detection mechanisms
are not content of this particular work though.
(a) (b)
Figure 1: Miniature robot BeBot, (a) with and (b) without
light guide, enabling the robot to express its internal state.
This paper is organized as follows. The funda-
mental principles of the realized segmentation pro-
cess are described in Section 2. Section 3 deals with
the external feature representation for subsequent ob-
ject detection processes. Results of the segmentation
algorithm running on a BeBot are content of Sec-
tion 4. The paper finally concludes with Section 5.
2 SEGMENTATION APPROACH
The basic idea of image segmentation is the identifi-
cation of contiguous blocks of pixels, that are homo-
geneous with respect to a pre-defined criterion. By
doing so, the pixel-based visual information is ab-
stracted in order to get a reduced data representation,
which is more convenient on the one hand and less
computational expensive on the other hand. Regard-
ing the computational power of the BeBots, object de-
tection based on raw pixel data is not feasible at all.
2.1 Color as Criterion of Homogeneity
In the context of this paper, the criterion for construct-
ing homogeneous regions is based on color informa-
tion. Since the camera of the BeBot delivers YUV im-
ages, a simple but very efficient heuristic H
yuv
, which
is based on the Manhattan distance, is applied in or-
441
Jungmann A., Schierbaum T. and Kleinjohann B..
IMAGE SEGMENTATION FOR OBJECT DETECTION ON A DEEPLY EMBEDDED MINIATURE ROBOT.
DOI: 10.5220/0003852104410444
In Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP-2012), pages 441-444
ISBN: 978-989-8565-03-7
Copyright
c
2012 SCITEPRESS (Science and Technology Publications, Lda.)
der to decide, whether two color tuples (y
1
, u
1
, v
1
)
and (y
2
, u
2
, v
2
) reside in a pre-defined neighborhood
within the three dimensional YUV color space:
H
yuv
= H((y
1
, u
1
, v
1
), (y
2
, u
2
, v
2
))
=
1 if
|y
1
y
2
| c
1
|u
1
u
2
| + |v
1
v
2
| c
2
,
0 else.
(1)
By separating the luma value Y from the chrominance
values U and V , the different components can be inde-
pendently weighted by means of the two parameters
c
1
and c
2
.
2.2 Internal Data Representation
For the internal representation of the regions during
the segmentation process, the Run-Length Encoding
concept is incorporated: sequences of adjacent pixels
are compactly encoded as so called run-lengths (or
runs), whereas a single run is defined in terms of three
integer values:
run
i
=
h
(x
i
, y
i
), l
i
i
, (2)
with (x
i
, y
i
) being the coordinates of the starting pixel,
whereas l
i
represents the number of adjacent pixels
within the same row and therefore denotes the length
of a single run. Furthermore, an entire region R may
consist of a sequence of n adjacent runs:
R =
hh
(x
1
, y
1
), l
1
i
,
h
(x
2
, y
2
), l
2
i
, . . . ,
h
(x
n
, y
n
), l
n
ii
.
(3)
The computational effort for both the region grow-
ing as well as the region merging operation (cf. Sec-
tion 2.3) is minimal. While adding a pixel to a region
is nothing but incrementing the length l
i
of the asso-
ciated run run
i
, merging of two regions is realized by
simply appending the sequence of runs of the first re-
gion to the sequence of runs of the second region.
2.3 Basic Algorithm
The basic segmentation process is depicted in Algo-
rithm 1. The main loop (lines 3-25) iterates row by
row over the whole image, starting at the topmost row.
The inner iteration loop (lines 4-24) processes each
pixel within a row once, starting at the leftmost pixel
position. After identifying the left adjacent run run
le f t
as well as its associated region R
le f t
, heuristic H
yuv
(1)
is applied in order to check if the colors of the current
pixel and R
le f t
are similar (line 7). If so, the region
growing step takes place by adding the current pixel
Algorithm 1: Image Segmentation Algorithm.
1: image = latest camera image
2: regions = {} // set of identified regions
3: for all row image do
4: for all pixel row do
5: run
le f t
= left adjacent run
6: R
le f t
= region(run
le f t
)
7: if H(yuv(pixel), yuv(R
le f t
)) then
8: // region growing
9: add(run
le f t
, pixel)
10: continue with next pixel
11: else
12: regions
top
= top adjacent regions
13: for all R
top
regions
top
do
14: if H(yuv(R
le f t
), yuv(R
top
)) then
15: // region merging
16: merge(R
le f t
, R
top
)
17: remove(regions, R
top
)
18: end if
19: end for
20: run
new
= new run(pixel)
21: R
new
= new region(run
new
)
22: add(regions, R
new
)
23: end if
24: end for
25: end for
26: return regions
pixel to the left adjacent run run
le f t
(length of run
le f t
is incremented by value 1) and updating the associ-
ated region R
le f t
with respect to its average color. Af-
terwards, the algorithm continues with the next pixel
of the row (line 10).
If heuristic H
yuv
fails, the left adjacent run run
le f t
is considered to be completed. The algorithm pro-
ceeds with its region merging mechanism. For this
purpose, all regions bordering run run
le f t
at the top
are identified. Subsequently, another iteration loop
tries to identify every region R
top
within Regions
top
that can be merged with the run
le f t
associated re-
gion R
le f t
(lines 13-19) by again applying heuristic
H
yuv
. If H
yuv
succeeds for two regions R
top
and R
le f t
,
the regions are merged by appending all runs of R
top
to R
le f t
. Furthermore, region R
le f t
is updated with
respect to its average color, whereas region R
top
is
completely discarded by removing it from the set of
heretofore identified regions.
Independent of the region merging step, a new run
run
new
with length 1 and with the current pixel as its
starting position as well as a new region R
new
with
run
new
as its first run are allocated. Finally, R
new
is
added to the set of heretofore identified regions.
VISAPP 2012 - International Conference on Computer Vision Theory and Applications
442
3 FEATURE DESCRIPTION
By interpreting a region’s associated pixels as a two-
dimensional Gaussian distribution in the image plane,
a region can be implicitly described by means of sta-
tistical parameters, namely the two mean values m
x
and m
y
, the two variances σ
2
x
and σ
2
y
, and the covari-
ance σ
xy
. A generalization of these specific parame-
ters are the statistical moments, which can be directly
applied in our context in discretized form (Hu, 1962):
the two mean values correspond to the two moments
of first order (m
10
and m
01
), whereas the two vari-
ances and the covariance correspond to the central-
ized (or central) moments of second order (µ
20
, µ
02
and µ
11
). In addition, the mass of a region is equiva-
lent to the moment of zeroth order (m
00
).
Since the central moments can be directly derived
from the moments through second order (Hu, 1962),
the basic feature descriptor for representing an ex-
tracted region is given by the following set M of mo-
ments:
M = {m
pq
|p + q 2}
= {m
00
, m
10
, m
01
, m
11
, m
20
, m
02
}
(4)
For efficiently computing the required five mo-
ments, the runs of a region can be directly used by
applying the Delta δ Method (Zakaria et al., 1987).
In this context, S1
i
and S2
i
are defined as follows:
S1
i
=
δ
i
1
k=0
k =
(δ
2
i
δ
i
)
2
S2
i
=
δ
i
1
k=0
k
2
=
δ
3
i
3
δ
2
i
2
+
δ
i
6
,
(5)
with δ
i
corresponding to the length l
i
of a single run
run
i
. The required geometric moments m
00
i
, m
10
i
,
m
01
i
, m
11
i
, m
20
i
and m
02
i
of run
i
can then be com-
puted in the following way:
m
00
i
= δ
i
m
01
i
= δ
i
· y
i
m
02
i
= δ
i
· y
2
i
,
m
10
i
= δ
i
· x
i
+ S1
i
m
11
i
= y
i
· [δ
i
· x
i
+ S1
i
] = y
i
· m
10
i
m
20
i
= δ
i
· x
2
i
+ 2 · S1
i
· x
i
+ S2
i
(6)
Finally, the moments of an entire region R correspond
to the sums of the particular moments of all associated
n runs:
M = {m
pq
|m
pq
=
n
i=1
m
pq
i
p + q 2} (7)
For representing a feature in a more explicit man-
ner, additional geometric attributes can be derived
from M and the associated central moments (Teague,
1980; Prokop and Reeves, 1992). In this context, the
coordinates (x, y) of the center of mass of an extracted
feature in the image plane are defined by the moments
of zeroth and first order:
x =
m
10
m
00
and y =
m
01
m
00
(8)
Furthermore, due to the statistical interpretation in
terms of a Gaussian distribution, a feature is equiv-
alent to a elliptical disk with constant intensity, hav-
ing definite size, orientation and eccentricity and be-
ing centered at the origin of the image plane. The
lengths of its major and minor axes as well as its an-
gle of inclination can be computed with the use of the
associated central moments (Teague, 1980). By com-
bining both the center of mass and the elliptical disk,
a feature can be geometrically described in terms of
an ellipse that is located at the center of mass of the
respective feature (cf. Figure 3).
4 RESULTS
The image segmentation approach was implemented
in C/C++ and successfully applied to the BeBot.
While capturing a YUV image of size 320x240 pix-
els, the algorithm enables us to process at least 10
entire images per second. Regarding the load factor
of the BeBot during our experiments, the overall per-
formance of the segmentation algorithm heavily de-
pends on the number of different regions that are con-
structed, which in turn depends on the values of the
calibration parameters in combination with the homo-
geneity of the original image.
Figure 2 shows different segmentation results with
respect to different values of the calibration parame-
ters c
1
and c
2
of heuristic H
yuv
. Whereas the goal of
the algorithm was to clearly extract the colored ob-
jects (four balls and one rectangular block), we con-
structed the background (the horizon) as inhomoge-
neous as possible (see Figure 2(a)). From Figure 2(c)
to Figure 2(d), only parameter c
1
for modifying the
influence of the luma component Y was changed. By
increasing c
1
, the segmented image becomes more
and more homogeneous in comparison to Figure 2(b).
The same holds true when the second parameter c
2
of
heuristic H
yuv
is exclusively modified (Figure 2(e) to
Figure 2(f)). However, when comparing Figure 2(c)
and Figure 2(f), changing the allowed range of luma
similarity (c
1
) seems to have a greater influence than
changing the allowed range of chrominance similarity
(c
2
). Keeping that in mind, Figure 2(g) shows a great
overall result with respect to the homogeneity of the
IMAGE SEGMENTATION FOR OBJECT DETECTION ON A DEEPLY EMBEDDED MINIATURE ROBOT
443
(a) original (b) c
1
= 10, c
2
= 10
(c) c
1
= 10, c
2
= 20 (d) c
1
= 10, c
2
= 30
(e) c
1
= 20, c
2
= 10 (f) c
1
= 30, c
2
= 10
(g) c
1
= 30, c
2
= 20 (h) c
1
= 50, c
2
= 50
Figure 2: Results of the segmentation algorithm running on
the BeBot with different parameters c
1
and c
2
. In addition,
the minimal length of a run was limited to 5 pixels.
segmented image. In this context, the value of param-
eter c
1
was set to 30, whereas the value of parameter
c
2
was set to a slightly lower value of 20.
We tried to relax the restriction of color similarity
even more by simultaneously increasing both param-
eters c
1
and c
2
. At some point, the algorithm begins
to merge color values, that obviously are not similar
at all, whereas regions, which are of similar color but
differ with respect to their intensity (brightness), are
not merged (see Figure 2(h)). This issue can be traced
back to the characteristics of the YUV color space.
Last but not least, Figure 3 exemplary depicts the
external region representation in terms of equivalent
ellipses, which are located at the centers of mass of
the extracted features.
5 CONCLUSIONS
In this paper, an efficient color-based image segmen-
(a) original (b) segmented
Figure 3: Feature representation in terms of the associated
center of masses and equivalent ellipses.
mentation approach for the deeply embedded minia-
ture robot BeBot is presented. In order to minimize
the computational effort as well as the memory con-
sumption during the segmentation process, regions
are compactly represented in terms of runs while they
are constructed. Furthermore, in order to provide a
convenient data representation for subsequent object
detection processes, the constructed regions are inter-
preted as a two-dimensional Gaussian distribution in
the image plane. Hence, they are efficiently trans-
lated into a statistical feature description in terms of
discretized moments. Finally, even though the cho-
sen heuristic for deciding whether two color values
are similar or not is very simple, it produces sufficient
good results with respect to the separation of objects
in a realistic environment.
REFERENCES
Herbrechtsmeier, S., Witkowski, U., and R
¨
uckert, U.
(2009). Bebot: A modular mobile miniature
robot platform supporting hardware reconfiguration
and multi-standard communication. In Progress in
Robotics, volume 44, pages 346–356. Springer Berlin
Heidelberg.
Hu, M.-K. (1962). Visual pattern recognition by moment in-
variants. Information Theory, IEEE Transactions on,
8(2):179–187.
Prokop, R. J. and Reeves, A. P. (1992). A survey of
moment-based techniques for unoccluded object rep-
resentation and recognition. CVGIP: Graph. Models
Image Process., 54(5):438–460.
Teague, M. R. (1980). Image analysis via the general theory
of moments. Journal of the Optical Society of America
(1917-1983), 70:920–930.
Zakaria, M. F., Vroomen, L. J., Zsombor-Murray, P. J. A.,
and van Kessel, J. M. H. M. (1987). Fast algorithm
for the computation of moment invariants. Pattern
Recogn., 20(6):639–643.
VISAPP 2012 - International Conference on Computer Vision Theory and Applications
444