platform (Tarokh 2003). The purpose of the present
paper is to enable robust person following in rough
terrain. In this work we employ color and shape for
person identification and pan/tilt camera control for
robust person tracking
2 TRAINING AND DETECTION
The first task in person following is the detection
and segmentation of the person from the scene.
This task consists of two subtasks, namely, training
a detection system and recognition of the person as
he/she moves in the environment. Both these
subtasks employ color and shape characteristics.
In our system, the person appears in front of the
camera at the start of a tour, and images of the
person are captured automatically when the person
takes several poses, i.e. back to camera, and side
view. The system is then trained to recognize the
shape and color of the person’s upper body. We use
H (hue or color), S (saturation or color depth), B
(brightness or lightness) color model, as HSB is
based on direct interpretation of colors and provides
a better characterization compared to other color
models such as RGB for this application. The
averages of H, S and B components for the poses
are recorded, which provide the nominal values
,H
nom nom
S and
nom
B . However since these
values will go through changes during the motion,
we allow deviations
H∆ , S
, B∆ from the nominal
values, which are found experimentally. Thus
during the person following, if an object in the
image has color components within the reference
values
),HHH(
nomref
±= )SSS(
nomref
=
and
)BB(B
nomref
∆±= , then the object will be a
candidate for the person’s image, and its shape
measures are checked.
We train the shape identification system with
the above mentioned poses. Shape measures must be
independent of the mass (area) of the person’s image
since the mass changes with the distance of the robot
to the person. The three measures that satisfy this
requirement are compactness C, circularity Q and
eccentricity E. Equations for computing these shape
measures are given in (Tarokh 2003), where the
normalized values of the three measures are between
0 and 1. During the training, each of these
measures is evaluated for the person in each of the
above two poses (k = 1,2) and their values
refk,
C
,
ref,k
Q and
ref,k
E are stored for the person
following phase. This completes the training of the
system, which takes a few seconds on a standard PC,
and can be considered as an off-line phase.
During person following, the camera takes
images of the scene and the system performs several
operations to segment the person from other objects.
The first operation is to scan every pixel and mark
the pixel as belonging to the person image, e.g. set it
is to white if all its three color components are
within the reference color ranges
ref
H ,
ref
S and
ref
B . This process of checking all pixels is time
consuming, and therefore we speed it up by
considering two observations. First, since the
person’s image occupies a large portion of the
image, it will be sufficient to check pixels on every
other row and every other column for color
verification. This way only a quarter of the pixels
are checked and marked white if they satisfy the
color range. The skipped pixels will be marked
white if the checked pixels around them have been
marked white. The second observation is that there
is a maximum distance that the person can move
between two consecutive frames. As a result, the
person’s pixels in the current frame must all lie
within a circle centered at the centroid (to be defined
shortly) of the previous frame. These two
observations limit the number of pixels to be
checked and speed up the marking of the pixels that
belong to the person’s image.
The final operation is to perform a standard
region growing on the marked pixels so that
connected regions can be formed. Regions smaller in
area than a specified value are considered noise and
are removed. The shape measures values
i
C ,
i
Q
and
i
E for the remaining regions are computed,
where i = 0,1,2,…,m-1 denote the region numbers.
Rather than checking each shape parameter with its
corresponding reference value, we define a single
measure for the closeness of the detected region to
the reference region, i.e. the person’s image during
the training. A possible function
is given in
Tarokh (2003).
The closeness function produces 1 if all shape
measures of the region are the same as the reference
value, and approaches zero if the region shape
measures are completely different. It is noted that for
each detected region, two shape measures are found,
i.e. one for each pose. The region that has the
largest value of closeness
is selected, and if this
value is close to 1, the selected region is assumed to
represent the person. If all the regions have small
values of
, then none is chosen and another image
is taken and analyzed.
The above method of distinguishing the region
corresponding to the person from other detected
regions in the image is simple and yet quite
effective. There are several reasons for this
effectiveness. One is that the robot is controlled
reasonably close to the person being followed and in
ICINCO 2005 - ROBOTICS AND AUTOMATION
102