REAL-TIME VISUAL ODOMETRY FOR GROUND MOVING
ROBOTS USING GPUS
Michael Schweitzer, Alois Unterholzner and Hans-Joachim Wuensche
Institute for Autonomous Systems Technology, Universit
¨
at der Bundeswehr M
¨
unchen, Germany
Keywords:
Robot Vision, GPGPU, Structure from motion.
Abstract:
This paper introduces a novel visual odometry framework for ground moving robots. Recent work showed
that assuming non-holonomic motion can simplify the ego motion estimation task to one yaw and one scale
parameter. Furthermore, a very efficient way of computing image frame to frame correspondences for those
robots was presented by skipping rotational invariance and optimizing keypoint extraction and matching for
massive parallelism on a GPU. Here, we combine both contributions to a closed framework. Long term
correpondences are preserved, classified and stablized by motion prediction, building up and keeping a trusted
map of depth-registered keypoints. We also allow other ground moving objects. From this map, the ego motion
is infered, extended by constrained rotational perturbations in pitch and roll.
A persistent focus is on keeping algorithms suitable for parallelization and thus achieving up to one hundred
frames per second. Experiments are carried out to compare against ground-truth given by DGPS and IMU
data.
1 INTRODUCTION
Reconstructing the motion from the 2D image stream
of a moving camera is an important task in com-
puter vision. It is commonly known as visual odome-
try and often concerned with real-time robot vision.
The idea is to compute image correspondences of
consecutive image frames connected to static points
in 3D space and infer the camera motion between
both frames. Early work needed at least eight points
to solve the equations given by epipolar constraints
(Longuet-Higgins, 1981; Hartley, 1994). (Nister
et al., 2004) proposed a more efficient way by us-
ing only five points. Recent work by (Scaramuzza
et al., 2009b; Scaramuzza et al., 2009a) reduced this
to an one point method by constraining the cam-
era motion. Once the motion is known, the depth
(structure) of the correspondences can be determined,
commonly known as Structure from Motion (SfM).
By keeping this structure and using longterm corre-
spondences, (Civera et al., 2009) introduced a novel
SfM-method within the extended kalman filter frame-
work (EKF). The state vector of the proposed camera-
centered EKF contains the location in the world frame
plus a map of 3D-registered image features in the
camera frame. Measuring around 20 features leads
to a state vector size around 300. Real-time capa-
bility at 30 frames per second comes from a 1-point
RANSAC algorithm to reject outliers updating the fil-
ter. Opposite to (Scaramuzza et al., 2009b) additional
information to achieve 1-point relations comes from
the propability distribution provided by the EKF in-
stead of constraining the robot’s motion. With a larger
amount of tracked features (around one hundred), the
frame rate drops to 1Hz and goes beyond real time.
However, this contribution shows good accuracy in its
results, similar to non-linear optimization techniques
like bundle adjustment (Triggs et al., 2000).
Using EKF frameworks it turns out that the in-
verse depth parameterization, briefly IDP, presented
in (Civera et al., 2008) is a good choice. Initializing
unregistered 3D points with an IDP and estimating
the inverse depth instead of euclidean coordinates re-
duces linearization errors within the EKF framework.
Thereby the EKF converges faster and is more stable.
In this paper, we deal with ground moving robots
described in (Scaramuzza et al., 2009b).
The robot is assumed to move in an ideal ground
plane with a locally circular motion. Only two pa-
rameters are needed for describing this motion model:
the yaw angle Ψ of the circle segment and the scale ρ
which denotes the chord. (Scaramuzza et al., 2009b)
shows that Ψ is computable from only one correspon-
dence in a monocular stream. Outliers are removed
20
Schweitzer M., Unterholzner A. and Wuensche H. (2010).
REAL-TIME VISUAL ODOMETRY FOR GROUND MOVING ROBOTS USING GPUS.
In Proceedings of the International Conference on Computer Vision Theory and Applications, pages 20-27
DOI: 10.5220/0002821200200027
Copyright
c
SciTePress
Figure 1: Three classes of correspondences w.r.t. the PIDM.
Points with no blue line belong to static on ground or in-
finite, points with long blue lines belong to static above
ground or dynamic.
with a 1-point RANSAC (Fischler and Bolles, 1981).
The scale ρ in (Scaramuzza et al., 2009b) is given
by a velocity sensor. Then, in (Scaramuzza et al.,
2009a), a method for scale estimation is proposed.
But this is only possible, when a circular motion is
detected (Ψ > 0) and leads to a sparse distribution of
working points. Here, we describe a method for es-
timating Ψ and ρ in every phase of the robot’s mo-
tion. By explicitly solving optical flow equations for
image keypoints for a locally planar, circular motion
and extracting the main effecting terms, we derive a
measurement model for image correspondences. We
call this measurement model linearized planar mea-
surement model, LPMM. The structure of the scene is
approximated by an IDP model. Here, we assume a
planar inverse depth model, PIDM.
Combining LPMM and PIDM, we can derive 1-
point relations for image correspondences. By mas-
sive parallel computing on a GPU, image correspon-
dences are classified w.r.t. the PIDM into three
classes: static on ground or infinite, static above
ground and dynamic. This vectorized classification
can be done independently to other keypoints exploit-
ing the 1-point relations for thousands of correspon-
dences. Deviations from the depth model can be rec-
ognized and corrected by motion stereo of the inte-
grated motion signal. Therefore, individual image
keypoints 3D-locations are not estimated within an
EKF. Here we see the main difference to the approach
of (Civera et al., 2009). Our EKF layout is much
leaner because it only contains the motion parame-
ters and their temporal derivatives. Only selected and
categorized correspondences update the EKF which
therefore converges faster. See Figure 1 to get an idea
of this classification task.
Using a generalized disparity equation splits the
motion into a rotational (infinite homography H
)
and a transitional component (Hartley and Zisserman,
2006). A similar approach is proposed in (Tardif
et al., 2008) but using omnidirectional cameras. Our
paper only covers perspective monocular or stereo
cameras heading into the direction of the ego motion.
Besides giving accurate results, (Tardif et al., 2008)
also gives a good overview of recent visual odome-
try, visual SLAM and SfM techniques. It turns out
that pitch (Θ) and roll (Φ) perturbations heavily ef-
fects the quality of the (ρ, Ψ) estimation. In the same
way we extract (ρ,Ψ), we derive equations for (Θ,Φ)
to reduce this effect.
For correspondence calculation we use a recently
proposed efficient and parallelized method for ex-
tracting and matching keypoints (Schweitzer and
Wuensche, 2009). We enhance this approach to long
term correspondences. This allows to correspond up
to 2000 keypoints in a stereo stream (2 × 752 × 480)
for both monocular streams plus stereo matching in
about 5ms on a NVidia Tesla GPU. As a result, three
closed lists of long term correspondences reside in
GPU memory which are then used for the vectorized
PIDM-classification mentioned above.
This paper is organized as follows. In section two
we describe the GPU computation of long term cor-
respondences and list generation. Section three intro-
duces the PIDM/LPMM and derives the 1-point re-
lations for the motion parameters from them. The
use for classification and EKF estimation is explained.
and results are evaluated against IMU and DPGS
ground truth in section four. We conclude with sec-
tion five.
2 GPU IMAGE
CORRESPONDENCES
(Schweitzer and Wuensche, 2009) proposed a novel
method for an efficient extraction and matching of
corner-based keypoints. It is based on a dense com-
putation of three normalized haar wavelet responses
(I
x
,I
y
,I
xy
)
t
per pixel at scale t, the so-called SidCell-
image (Scale Invariant Descrpitive Cell), shown in
Figure 2. Haar wavelets can be computed very ef-
ficiently by using integral images (Viola and Jones,
2001) and are also employed by SURF correspon-
dences (Bay et al., 2006). From the SidCell-image,
keypoints and descriptors are derived. Keypoints are
I
y
I
xy
I
x
2t
Figure 2: Sidcell Components.
extracted by a non-maximum suppression on the ab-
solute |I
xy
|-component of the SidCell-image. The sen-
sivity of the keypoint extraction is adjusted by a noise
threshold of |I
xy
| between [0,1]. (Schweitzer and
REAL-TIME VISUAL ODOMETRY FOR GROUND MOVING ROBOTS USING GPUS
21
Wuensche, 2009) define two strength tests to remove
weak keypoints. For correspondence calculation of
consecutive frames two step are carried out. First, a
prematching w.r.t. the sign-signature of the keypoints
(8 classes, different colors in Figure 3) is done. Sec-
ond, only if the first step matches, the descriptors are
compared. This avoids heavy-weight descriptor dis-
tance calculations where it is not promising and in-
creases matching efficiency a lot. The descriptors it-
selfs are built from distributed and weighted SidCells
around the keypoint, while the complexity of the dis-
tribution is user-defined (see Figure 3).
8−neighborhood4−neighborhood
0 21 12
order
SidCell
gaussian weighted
or
Figure 3: SidCell Matching with |I
xy
| > 0.2.
2.1 List Processing on GPUs
Collecting keypoints into a closed list is mostly done
by an intermediate CPU step. This requires addi-
tional data transfer between GPU and CPU and re-
duces the speed gain. Using modern atomic GPU-
functions like atomicInc() or atomicAdd() those
disturbing CPU-steps are avoidable. List data struc-
ture operations like merge, split or concatenate
all need an addElem operator. A parallelized version
of addElem needs a synchronization. Figure 4 shows
how addElemSync is implemented. The vectorized
versions of merge, split, concatenate are essen-
tial for this work. Assuming that the keypoints can
be processed independently of each other, full paral-
lelism is given. Atomic GPU functions are used in the
same way for building histograms needed later in this
paper.
0
t
1
t
2
t
3
t
1
t
3
1
t
2
t
2
= atomicInc()
t
3
t
1
t
1
t
1
t
3
t
2
t
2
2
3
t
3
Figure 4: Implementation of addElemSync on a GPU. The
first element of the list is its size. In this example, three
threads want to add a new element to an empty list simulta-
neously. Each thread gets its number of the list cell where
it can put its data by applying an atomicInc() on the size
element.
Figure 5: SidCell long term correspondences.
2.2 Long Term Correspondences
Long term correspondence (lt-correspondences) are
held in a list. To build this list, initially all frame
to frame correspondences (f2f-correspondences) are
added. Subsequent cycles search this list wether two
f2f-correspondences can be concatenated. If a lt-
correspondence does not have a successor, it is re-
moved from the list. F2f-correspondences which do
not have a predecessor are added as new element. Fig-
ure 5 shows the result. Selected f2f-correspondences
are the measurements for the EKF of the motion pa-
rameters. Lt-correspondences support the selection
and adjust deviations to the assumed inverse depth
model. This is explained in detail in the next section.
VISAPP 2010 - International Conference on Computer Vision Theory and Applications
22
3 OUR APPROACH
3.1 Planar Inverse Depth Model
In case of a known rigid motion [R|t] between two
cameras c
0
c
1
the image point m
1
in c
1
can be com-
puted from m
0
in c
0
:
e
m
1
= ARA
1
·
e
m
0
+ X
1
0
At (1)
A is the camera calibration matrix and assumed to
be the same for both cameras. It is obvious that this
equation also needs the inverse depth X
1
0
of the ho-
mogeneous image point
e
m
0
. Additionally it is not lin-
ear in case of X
1
0
6= 0 or t 6= 0. We discuss this equa-
tion later in this paper, for now it is important how to
get X
1
0
. The approach here is to imply a structural
model from which the inverse depth can be computed
for every pixel. In case of ground moving robots we
imply a ground plane model. Then, the idea is to
seperate correspondences to those which fit to this
model and those which do not fit. For those which
fit, the inverse depth is known and the motion can be
observed. For those which don’t fit, the inverse depth
can be computed from motion. The importance of the
PIDM fit is high at the beginning but should decrease
over time, when more and more new depth-registered
keypoints are available in a trusted map. When the
system fails, a re-initialization by the PIDM can be
done. To express a 3D point M
0
= (X
0
,Y
0
,0)
>
E(Z
c
,Θ
c
) in camera coordinates (see Figure 6) the ho-
mogeneous transformation R
Y
(Θ
c
) · T(0,0,Z
c
) is ap-
plied.
Z
c
cam
Z
ego
Z
cam
X
Θ
c
ego
X
M
0
cam
X
0
E(Z
c
,Θ
c
)
Figure 6: Depth X
0
of a 3D Point M
0
which lies on the plane
E(Z
c
,Θ
c
) given in camera coordinates.
The corresponding image point m
0
is then given
by
e
m
0
= R
Y
(Θ
c
)T(0,0, Z
c
)P
c
(k) ·
e
M
0
=
kY
0
k(X
0
Θ
c
Z
c
)
Z
c
Θ
c
+ X
0
=
e
y
0
e
z
0
e
w
0
As one can see, every image line z =
e
λ ·
e
z corresponds
to a constant depth, with
e
λ = 1/
e
w. Now we can derive
the PIDM:
X
1
(z) =
1
kZ
c
(z
z)
PIDM(z) =
(
X
1
(z) if X
1
> 0
0 else
(2)
where z
= kΘ
c
is the image line at infinity (horizon).
Note that the PIDM is a linear mapping without sin-
gularity at infinite points.
3.2 Non-holonomic Robot Motion
Recent work (Scaramuzza et al., 2009b) introduced
a simple circular motion model with two parameters
(ρ,Ψ) for wheeled ground moving robots (see Figure
7). The transitional component t of the rigid camera
ρ
Ψ
Ψ
Ψ/2
Figure 7: Local circular motion model by (Scaramuzza
et al., 2009b).
motion [R|t] is then given by:
t = ρ ·
cos(Ψ/2)
sin(Ψ/2)
0
(5)
In addition to this model, our model is augmented by
pitch (Θ) and roll (Φ). The motion (Θ,Φ) is uncon-
strained and modeled by assuming constant veloci-
ties.
3.3 Linearized Planar Measurement
Model LPMM
Using the disparity equation (1) and the our motion
model (5) it is possible to derive the displaced homo-
geneous image point
e
m
1
as given in (3). The mea-
surement is the displacement m
1
of the correspon-
dence of the image point m
0
. Therefore, the par-
tial derivatives m
1
/(Ψ,Θ, Φ,ρ) in euclidean coor-
dinates must be determined. Using (3) and the tran-
formation from homogeneous coordinates into eu-
clidean m
1
=
e
λ · (
e
y
1
,
e
z
1
)
>
the derivatives m
1
/(.. .)
yield to:
REAL-TIME VISUAL ODOMETRY FOR GROUND MOVING ROBOTS USING GPUS
23
e
m
1
=AR
Φ
R
Θ
R
Ψ
A
1
·
e
m
0
+ X
1
0
Aρ
cos(Ψ/2)
sin(Ψ/2)
0
=
k(c
1
c
2
c
3
c
3
s
1
) + y
0
(c
1
c
3
+ s
1
s
2
s
3
) + z
0
c
2
s
3
+
1
2
X
1
0
kΨρ
k(s
1
s
3
+ c
1
s
2
c
3
) y
0
(c
1
s
3
s
1
s
2
c
3
) + z
0
c
2
c
3
1
k
y
0
s
1
c
2
1
k
z
0
s
2
+ c
1
c
2
+ X
1
0
ρ
=
e
y
1
e
z
1
1/
e
λ
with s
1
= sinΨ, c
1
= cosΨ, s
2
= sinΘ, c
2
= cosΘ, s
3
= sinΦ, c
1
= cosΦ
(3)
m
1
(Ψ,Θ, Φ)
=
e
λ
e
y
1
e
z
1
·
∂Ψ
∂Θ
∂Φ
λ
(
1
2
X
1
0
ρ 1)k Ψy
0
Φk Θk Φy
0
+ z
0
Φk + Θy
0
k + Ψy
0
Θz
0
Ψk y
0
Φz
0
+
e
λ
(Ψ,Θ, Φ)
e
y
1
e
z
1
m
1
∂ρ
=
e
λ
∂ρ
e
y
1
e
z
1
+
e
λ
∂ρ
e
y
1
e
z
1
λ
1
2
X
1
0
Ψk
0
X
1
0
(1 Ψ +
1
2
X
1
0
ρ)k + y
0
+ Θz
0
Θk Φy
0
+ z
0
with
e
λ λ =
k
(1 + X
1
0
ρ)k + Ψy
0
Θz
0
(4)
m
1
(.. .)
=
e
λ
(
e
y
1
,
e
z
1
)
>
(.. .)
+ (
e
y
1
,
e
z
1
)
>
e
λ
(.. .)
(6)
This term gets very complex. We assume that the im-
age displacement are comparatively small which al-
lows for a small angle approximation: c
i
gets to 1 and
s
i
gets to {Ψ|Θ|Φ}. Also we assume that products of
two or more angles get to zero. Note that this approx-
imation is only valid after the derivatives are calcu-
lated.
It turns out that the terms for
e
λ/(Ψ,Θ, Φ) can also
be approximated to zero in case of X
1
0
0. Further-
more, the approximation
e
λ/∂ρ X
1
0
is valid. The
result is given by the set of equations in (4). This set
is what we call the linearized planar motion model,
briefly LPMM.
3.4 Discussing the LPMM
Getting granular on the LPMM one can draw valu-
able consequences. One important term is the prod-
uct of the motion scale ρ and the inverse depth X
1
0
of
a keypoint. If the robot is not moving and therefore
ρ = 0, X
1
0
also vanishes from the equation. No struc-
ture is observable in this case which is consistent with
the 3D reconstruction theory. The product also van-
ishes for far away (or infinite) 3D points. From those
measurements no information about ρ is observable.
But it’s worth to find those far away points because
the rotational component (or the infinite homography
H
) of the camera motion can be measured almost
directly. A further consequence can be seen, if ev-
ery angle-product is set to zero. This rough estimate
still contains valuable information because the dom-
inating terms are clearly visible. Another important
consequence is, that the individual parameters of the
LPMM become independent of each other. Further-
more, λ gets to (1 + X
1
0
ρ)
1
and for far away points
it gets to 1. One of the most important properties of
the LPMM is the fact that the z-component of m
1
/∂ρ
is completely independent of the yaw paramter Ψ. Ex-
ploiting those consequences allows a vectorized clas-
sification of image correspondences we discuss later.
4 EGO MOTION ESTIMATION
In order to estimate the robots motion we want to ex-
tract four signals, one transitional scale (ρ) and three
rotational angles (Ψ,Θ,Φ). Each image frame com-
ing with the measured f2f-correpondences updates an
EKF designed to output those signals by using the
LPMM for jacobian linearization. As still discussed,
measurements have to be selected w.r.t. their capabil-
ity to observe a certain signal. Mainly this depends
on their depth configuration and leads to an initializa-
tion problem. For our robot we imply the PIDM as a
starting point. Note that one can apply any (inverse)
VISAPP 2010 - International Conference on Computer Vision Theory and Applications
24
depth model, e.g. a corridor model for indoor robots
or depth from stereo. Needless to say, the assumed
inverse depth model has to be observable to a certain
degree, otherwise it would fail. The aim is to adapt
to the real scene by depth registering variations to the
model and soften the model assumption.
4.1 Scale Estimation
Using the PIDM (2) within the Ψ-independent z-
component of LPMM scale (4) leads to the following
1-point equation:
∂ρ =
kZ
c
(z
0
+ Θk Φy
0
)(z
0
z
)
z (7)
Once again, z
0
is the z-coordinate of the extracted im-
age keypoint m
0
and z is the z-displacement mea-
surement of the correspondence to the previous frame.
(7) gets singular for z
0
z
and on the other side well
conditioned for 3D points in the near field (X
1
0
0).
Figure (8) shows the keypoint correspondence classi-
fication and scale estimation without using any addi-
tional signal than the image stream. Figure (9) shows
the EKF scale output while Figure (10) uses an INS
pitch signal for correction. Setting the roll compo-
INS
˙
ρ = 9,91
m
s
m
s
n
0
20
y
z
z
Figure 8: Classification of image correspondences for scale
estimation. First, a histogram is built by a vectorized 1-
point scale computation for every correspondence indepen-
dently on the GPU. By histogram voting, the most likely
scale value is choosen. The dashed blue line indicates the
ground truth scale value measured by the INS. Green cor-
respondences match to the histogram vote, red ones don’t.
Histogram vote and EKF prediction are combined for a sta-
ble inlier classification. See also the image coordinate frame
and the line at infinity (black).
nent in equation (7) to zero, ∂ρ is completely inde-
pendent from the y-coordinate in the image. One can
say this denotes a
1
2
-point relation but in fact this is a
PIDM special case.
4.2 Rotation Estimation
At a first look, the rotation estimation easily can be
done by taking f2f-correspondences of far away key-
−2
0
2
4
6
8
10
12
14
0 200 400 600 800 1000 1200 1400 1600 1800
Frame
−0.1
−0.05
0
0.05
0.1
0.15
0 200 400 600 800 1000 1200 1400 1600 1800
Frame
EME
˙
ρ
INS
˙
ρ
Scale (ρ) Velocity Comparison INS/EME
˙
ρ [m/s]
INS
˙
Θ
˙
Θ [rad/s]
Pitch (Θ) Velocity INS
Figure 9: Ego motion estimation (EME, blue) scale signal
compared to the ground truth INS signal (green). EME ex-
tracts a near similar signal with two major perturbations.
Between frame 700-1000, a strong pitch perturbation oc-
curs, see the green INS pitch signal below. Between frame
1000-1200 the scene is quite keypoint-less. The constant
acceleration system model of the EKF stablizes the signal
for a certain time.
−2
0
2
4
6
8
10
12
14
0 200 400 600 800 1000 1200 1400 1600 1800
Frame
Scale (ρ) Velocity Comparison INS/EME with Pitch Compensation
˙
ρ [m/s]
INS
˙
ρ
EME
˙
ρ
Figure 10: Scale signal with pitch compensation by ground
truth INS data. The peaks between frame 700-1000 are flat-
tened.
−0.3
−0.25
−0.2
−0.15
−0.1
−0.05
0
0.05
0.1
0 200 400 600 800 1000 1200 1400 1600 1800
Frame
−0.1
−0.05
0
0.05
0.1
0.15
0 200 400 600 800 1000 1200 1400 1600 1800
Frame
−0.08
−0.06
−0.04
−0.02
0
0.02
0.04
0.06
0.08
0 200 400 600 800 1000 1200 1400 1600 1800
Frame
˙
Ψ [rad/s]
Yaw (Ψ) Velocity Comparison INS/EME
INS
˙
Ψ
EME
˙
Ψ
˙
Θ [rad/s]
Pitch (Θ) Velocity Comparision INS/EME
INS
˙
Θ
EME
˙
Θ
˙
Φ [rad/s]
Roll (Φ) Velocity Comparison INS/EME
INS
˙
Φ
EME
˙
Φ
Figure 11: Rotation estimation (EME, blue) by far away
keypoints compared to the ground truth INS signal (green).
The selection of the keypoints is done w.r.t. the PIDM.
points. The structure term ρX
1
0
of the optical flow
measurement m gets to zero for those points and
REAL-TIME VISUAL ODOMETRY FOR GROUND MOVING ROBOTS USING GPUS
25
0
50
100
150
200
250
300
350
400
−250 −200 −150 −100 −50 0 50 100 150
Y [m]
X [m]
Trajectory Comparison DGPS/INS/EME
DGPS Track
INS Track
EME(Ψ) Track
DGPS Track
INS Track
EME(Ψ) Track
Figure 13: Ego motion estimation (EME, blue) track by using the estimated yaw signal and the locally circular motion model.
The scale signal is given by INS ground truth. Red shows the differential GPS track, green shows the motion model using
ground truth INS data. The right image shows the tracks referenced in a satellite image by GoogleEarth.
EME(ρ,Ψ) Track
Figure 12: Ego motion estimation (EME, blue) track by us-
ing the pure monocular vision scale and yaw signal of Fig-
ure 10 and Figure 11.
therefore the LPMM (4) simplifies considerably. Fig-
ure 11 shows the EKF rotational signals for yaw Ψ
and pitch Θ using the same method as for scale es-
timation with the difference of taking far away key-
points instead of near ones. Figures 12 and 13 show
the driven tracks by integrating the estimated motion
signals.
5 CONCLUSIONS
In this paper we introduced a visual odometry method
using the EKF framework and GPU computed im-
age correspondences. We decribed the measurement
model LPMM of the EKF in detail and derived 1-
point relations for a vectorized inliers selection on
the GPU. We achieve a overall computation time of
less than 10ms: 4ms for GPU correspondence calcu-
lation and selection and 5ms for the EKF update on a
2GHz single core CPU. Future work will extend this
approach to a full structure from motion application.
ACKNOWLEDGEMENTS
The authors gratefully acknowledge support of this
work by the Deutsche Forschungsgemeinschaft (Ger-
man Research Foundation) within the Transregional
Collaborative Research Center 28 Cognitive Automo-
biles.
REFERENCES
Bay, H., Tuytelaars, T., and Gool, L. V. (2006). SURF:
Speeded up Robust Features. In Proc. European Conf.
on Computer Vision.
Civera, J., Davison, A., and Montiel, J. (2008). Inverse
Depth Parametrization for Monocular SLAM. IEEE
Transactions on Robotics, 24(5):932–945.
Civera, J., Grasa, O. G., Davison, A. J., and Montiel, J.
M. M. (2009). 1-Point RANSAC for EKF-Based
Structure from Motion. In Proc. IEEE/RSJ Int’l Conf.
on Intelligent Robots and Systems.
Fischler, M. A. and Bolles, R. C. (1981). Random Sample
Consensus: A Paradigm for Model Fitting with Ap-
plications to Image Analysis and Automated Cartog-
raphy. Comm. of the ACM, 24:381–395.
VISAPP 2010 - International Conference on Computer Vision Theory and Applications
26
Hartley, R. (1994). Euclidean reconstruction from uncali-
brated views. In Proc. IEEE Conf. on Computer Vision
and Pattern Recognition, pages 908–912.
Hartley, R. and Zisserman, A. (2006). Multiple View Geom-
etry. Cambridge University Press.
Longuet-Higgins, H. C. (1981). A Computer Algorithm for
Reconstructing a Scene From Two Projections. Na-
ture, 293(1):133–135.
Nister, D., Naroditsky, O., and Bergen, J. (2004). Visual
Odometry. In Proc. IEEE Conf. on Computer Vision
and Pattern Recognition, volume 1, pages I–652–I–
659 Vol.1.
Scaramuzza, D., Fraundorfer, F., Pollefeys, M., and Sieg-
wart, R. (2009a). Absolute Scale in Structure from
Motion from a Single Vehicle Mounted Camera by
Exploiting Nonholonomic Constraints. In Proc. IEEE
Int’l Conf. on Computer Vision.
Scaramuzza, D., Fraundorfer, F., and Siegwart, R. (2009b).
Real-Time Monocular Visual Odometry for On-Road
Vehicles with 1-Point RANSAC. In Proc. IEEE Int’l
Conf. on Robotics and Automation.
Schweitzer, M. and Wuensche, H.-J. (2009). Efficient Key-
point Matching for Robot Vision using GPUs. In Proc.
12th IEEE Int’l Conf. on Computer Vision, 5th IEEE
Workshop on Embedded Computer Vision.
Tardif, J., Pavlidis, Y., and Daniilidis, K. (2008). Monoc-
ular Visual Odometry in Urban Environments Using
an Omnidirectional Camera. In Proc. IEEE/RSJ Int’l
Conf. on Intelligent Robots and Systems, pages 2531–
2538.
Triggs, B., McLauchlan, P., Hartley, R., and Fitzgibbon, A.
(2000). Bundle Adjustment A Modern Synthesis.
In Vision Algorithms: Theory and Practice, LNCS,
pages 298–375. Springer Verlag.
Viola, P. and Jones, M. (2001). Rapid object detection using
a boosted cascade of simple features. In Proc. IEEE
Conf. on Computer Vision and Pattern Recognition.
REAL-TIME VISUAL ODOMETRY FOR GROUND MOVING ROBOTS USING GPUS
27