Figure 1: Three classes of correspondences w.r.t. the PIDM.
Points with no blue line belong to static on ground or in-
finite, points with long blue lines belong to static above
ground or dynamic.
with a 1-point RANSAC (Fischler and Bolles, 1981).
The scale ρ in (Scaramuzza et al., 2009b) is given
by a velocity sensor. Then, in (Scaramuzza et al.,
2009a), a method for scale estimation is proposed.
But this is only possible, when a circular motion is
detected (Ψ > 0) and leads to a sparse distribution of
working points. Here, we describe a method for es-
timating Ψ and ρ in every phase of the robot’s mo-
tion. By explicitly solving optical flow equations for
image keypoints for a locally planar, circular motion
and extracting the main effecting terms, we derive a
measurement model for image correspondences. We
call this measurement model linearized planar mea-
surement model, LPMM. The structure of the scene is
approximated by an IDP model. Here, we assume a
planar inverse depth model, PIDM.
Combining LPMM and PIDM, we can derive 1-
point relations for image correspondences. By mas-
sive parallel computing on a GPU, image correspon-
dences are classified w.r.t. the PIDM into three
classes: static on ground or infinite, static above
ground and dynamic. This vectorized classification
can be done independently to other keypoints exploit-
ing the 1-point relations for thousands of correspon-
dences. Deviations from the depth model can be rec-
ognized and corrected by motion stereo of the inte-
grated motion signal. Therefore, individual image
keypoints 3D-locations are not estimated within an
EKF. Here we see the main difference to the approach
of (Civera et al., 2009). Our EKF layout is much
leaner because it only contains the motion parame-
ters and their temporal derivatives. Only selected and
categorized correspondences update the EKF which
therefore converges faster. See Figure 1 to get an idea
of this classification task.
Using a generalized disparity equation splits the
motion into a rotational (infinite homography H
∞
)
and a transitional component (Hartley and Zisserman,
2006). A similar approach is proposed in (Tardif
et al., 2008) but using omnidirectional cameras. Our
paper only covers perspective monocular or stereo
cameras heading into the direction of the ego motion.
Besides giving accurate results, (Tardif et al., 2008)
also gives a good overview of recent visual odome-
try, visual SLAM and SfM techniques. It turns out
that pitch (Θ) and roll (Φ) perturbations heavily ef-
fects the quality of the (ρ, Ψ) estimation. In the same
way we extract (ρ,Ψ), we derive equations for (Θ,Φ)
to reduce this effect.
For correspondence calculation we use a recently
proposed efficient and parallelized method for ex-
tracting and matching keypoints (Schweitzer and
Wuensche, 2009). We enhance this approach to long
term correspondences. This allows to correspond up
to 2000 keypoints in a stereo stream (2 × 752 × 480)
for both monocular streams plus stereo matching in
about 5ms on a NVidia Tesla GPU. As a result, three
closed lists of long term correspondences reside in
GPU memory which are then used for the vectorized
PIDM-classification mentioned above.
This paper is organized as follows. In section two
we describe the GPU computation of long term cor-
respondences and list generation. Section three intro-
duces the PIDM/LPMM and derives the 1-point re-
lations for the motion parameters from them. The
use for classification and EKF estimation is explained.
and results are evaluated against IMU and DPGS
ground truth in section four. We conclude with sec-
tion five.
2 GPU IMAGE
CORRESPONDENCES
(Schweitzer and Wuensche, 2009) proposed a novel
method for an efficient extraction and matching of
corner-based keypoints. It is based on a dense com-
putation of three normalized haar wavelet responses
(I
x
,I
y
,I
xy
)
t
per pixel at scale t, the so-called SidCell-
image (Scale Invariant Descrpitive Cell), shown in
Figure 2. Haar wavelets can be computed very ef-
ficiently by using integral images (Viola and Jones,
2001) and are also employed by SURF correspon-
dences (Bay et al., 2006). From the SidCell-image,
keypoints and descriptors are derived. Keypoints are
I
y
I
xy
I
x
2t
Figure 2: Sidcell Components.
extracted by a non-maximum suppression on the ab-
solute |I
xy
|-component of the SidCell-image. The sen-
sivity of the keypoint extraction is adjusted by a noise
threshold of |I
xy
| between [0,1]. (Schweitzer and
REAL-TIME VISUAL ODOMETRY FOR GROUND MOVING ROBOTS USING GPUS
21