Image from right Camera
Train
Image from left Camera
Depth Map
Disparity Image
Obstacle
Classifier
Direction
Classifier
Classify
Feature Extraction
Robot Drive
State Machine
Figure 2: System structure.
10 cm. Both cameras are equipped with a wide angle
lens with 4.2 mm of focal length.
Figure 2 shows the structure of the system setup.
Two cameras are used for image capturing. We are
scaling down the images to a size of 320 x 240 pix-
els and only use one single color channel. A pair of
images is grabbed approximately every 240 ms. Thus
about four frames per second are processed. In or-
der to extract depth-information, a disparity image is
determined by means of block matching. To reduce
noise artifacts smoothing filters like mean or median
are applied to the original images. Additionally we
use the census transform to get suitable images for
the subsequent block matching. Using a lookup table
derived from the camera geometry, we can calculate
a depth map from the disparity image. These results
serve as the basis for the feature extraction. Hence the
complete disparity image or depth map can be used. It
is also possible only to use the distance of the nearest
obstacle for each picture column of the depth map.
Alternatively height information of the objects situ-
ated in the area can be included.
Based on the classification results the robot is con-
trolled in its movements. We are using two state ma-
chines. Therefore, two classifiers are integrated. The
first one recognizes obstacles that have to be avoided.
If an obstacle is detected, the second classifier is
asked for a preferred evasive direction. This direc-
tion is retained until a movement straight forward is
possible again.
1.2 State of the Art
Most solutions of obstacle avoiding robots use laser
range sensors, ultrasonic or infrared sensors, but there
are also realizations using cameras. Especially 2-
camera-systems often serve as a basis for the obstacle
detection.
In this sense, Kumano, Ohya and Yuta
(Masako Kumano and Yuta, 2000) follow a very sim-
ple approach. Using two cameras directed in a fixed
angle towards the ground, images of the environment
are obtained. Starting from the geometrical setup
between the cameras and the floor, corresponding
points in both images are determined. Intensity
differences between corresponding points in both
images are assumed to be an obstacle in the way. The
scan lines correspond to different distances according
to the camera arrangement. Evaluating merely three
scan lines that represent the distances 40 cm, 65 cm
and 100 cm, a fast computing time (35 ms) can be
guaranteed.
Sabe, Fukuchi, Gutmann, Ohashi, Kawamoto and
Yoshigahara (Kohtaro Sabe et al., 2004) also use two
CCD-cameras. These are integrated in the humanoid
Sony QRIO Robot. Firstly landmark points of the en-
vironment are searched in both images. These points
appear in different picture coordinates according to
the different camera views and the distance. Using
these disparities, 3D coordinates can be computed.
On this basis the robot detects possible obstacles on
the floor level, that are used for path planning.
The electrical wheelchair Victoria (Libuda and
Kraiss, 2004) computes a depth map from the in-
cluded scene with the aid of a camera pair. Additional
corners and edges are determined in different detec-
tion stages and grouped to regions which correspond
to certain objects in the room. Using fixed relations
between these landmarks a model of the surroundings
can be produced with the corresponding distance in-
formation. One can search the walls for regions like
doors and the floor for obstacles.
There is still a number of further approaches with
two camera systems. For example the NEUROS
project at the Ruhr University of Bochum with its
visually controlled service robot Arnold uses such
a construction (Ruhr-Universit
¨
at, 1997). Arnold ex-
tracts edges from both images and calculates their dis-
tances by the shift of these landmarks. Hence apart
from an obstacle detection, card generation of the en-
vironment is possible. In unknown areas the robot
turns by 360 degrees to get the necessary data. To
evade people optical flow is included.
Another example is the Ratler (robotic all terrain
lunar exploration rover) (Reid Simmons and Whelan,
1996). It moves in an unknown area with a height-
map of the environment. This map is generated by the
extraction of 3D-points from a stereo-camera-system.
Daimler Chrysler develops a system for the detec-
tion of pedestrians (D.M. Gavrila and Munder, 2004).
At first a disparity image indicates potential areas. By
an edge extraction of these regions a comparison with
sample forms of persons is possible.
ICINCO 2005 - ROBOTICS AND AUTOMATION
448