REAL-TIME VISUAL ODOMETRY FOR GROUND MOVING

ROBOTS USING GPUS

Michael Schweitzer, Alois Unterholzner and Hans-Joachim Wuensche

Institute for Autonomous Systems Technology, Universit

¨

at der Bundeswehr M

¨

unchen, Germany

Keywords:

Robot Vision, GPGPU, Structure from motion.

Abstract:

This paper introduces a novel visual odometry framework for ground moving robots. Recent work showed

that assuming non-holonomic motion can simplify the ego motion estimation task to one yaw and one scale

parameter. Furthermore, a very efﬁcient way of computing image frame to frame correspondences for those

robots was presented by skipping rotational invariance and optimizing keypoint extraction and matching for

massive parallelism on a GPU. Here, we combine both contributions to a closed framework. Long term

correpondences are preserved, classiﬁed and stablized by motion prediction, building up and keeping a trusted

map of depth-registered keypoints. We also allow other ground moving objects. From this map, the ego motion

is infered, extended by constrained rotational perturbations in pitch and roll.

A persistent focus is on keeping algorithms suitable for parallelization and thus achieving up to one hundred

frames per second. Experiments are carried out to compare against ground-truth given by DGPS and IMU

data.

1 INTRODUCTION

Reconstructing the motion from the 2D image stream

of a moving camera is an important task in com-

puter vision. It is commonly known as visual odome-

try and often concerned with real-time robot vision.

The idea is to compute image correspondences of

consecutive image frames connected to static points

in 3D space and infer the camera motion between

both frames. Early work needed at least eight points

to solve the equations given by epipolar constraints

(Longuet-Higgins, 1981; Hartley, 1994). (Nister

et al., 2004) proposed a more efﬁcient way by us-

ing only ﬁve points. Recent work by (Scaramuzza

et al., 2009b; Scaramuzza et al., 2009a) reduced this

to an one point method by constraining the cam-

era motion. Once the motion is known, the depth

(structure) of the correspondences can be determined,

commonly known as Structure from Motion (SfM).

By keeping this structure and using longterm corre-

spondences, (Civera et al., 2009) introduced a novel

SfM-method within the extended kalman ﬁlter frame-

work (EKF). The state vector of the proposed camera-

centered EKF contains the location in the world frame

plus a map of 3D-registered image features in the

camera frame. Measuring around 20 features leads

to a state vector size around 300. Real-time capa-

bility at 30 frames per second comes from a 1-point

RANSAC algorithm to reject outliers updating the ﬁl-

ter. Opposite to (Scaramuzza et al., 2009b) additional

information to achieve 1-point relations comes from

the propability distribution provided by the EKF in-

stead of constraining the robot’s motion. With a larger

amount of tracked features (around one hundred), the

frame rate drops to 1Hz and goes beyond real time.

However, this contribution shows good accuracy in its

results, similar to non-linear optimization techniques

like bundle adjustment (Triggs et al., 2000).

Using EKF frameworks it turns out that the in-

verse depth parameterization, brieﬂy IDP, presented

in (Civera et al., 2008) is a good choice. Initializing

unregistered 3D points with an IDP and estimating

the inverse depth instead of euclidean coordinates re-

duces linearization errors within the EKF framework.

Thereby the EKF converges faster and is more stable.

In this paper, we deal with ground moving robots

described in (Scaramuzza et al., 2009b).

The robot is assumed to move in an ideal ground

plane with a locally circular motion. Only two pa-

rameters are needed for describing this motion model:

the yaw angle Ψ of the circle segment and the scale ρ

which denotes the chord. (Scaramuzza et al., 2009b)

shows that Ψ is computable from only one correspon-

dence in a monocular stream. Outliers are removed

20

Schweitzer M., Unterholzner A. and Wuensche H. (2010).

REAL-TIME VISUAL ODOMETRY FOR GROUND MOVING ROBOTS USING GPUS.

In Proceedings of the International Conference on Computer Vision Theory and Applications, pages 20-27

DOI: 10.5220/0002821200200027

Copyright

c

SciTePress

Figure 1: Three classes of correspondences w.r.t. the PIDM.

Points with no blue line belong to static on ground or in-

ﬁnite, points with long blue lines belong to static above

ground or dynamic.

with a 1-point RANSAC (Fischler and Bolles, 1981).

The scale ρ in (Scaramuzza et al., 2009b) is given

by a velocity sensor. Then, in (Scaramuzza et al.,

2009a), a method for scale estimation is proposed.

But this is only possible, when a circular motion is

detected (Ψ > 0) and leads to a sparse distribution of

working points. Here, we describe a method for es-

timating Ψ and ρ in every phase of the robot’s mo-

tion. By explicitly solving optical ﬂow equations for

image keypoints for a locally planar, circular motion

and extracting the main effecting terms, we derive a

measurement model for image correspondences. We

call this measurement model linearized planar mea-

surement model, LPMM. The structure of the scene is

approximated by an IDP model. Here, we assume a

planar inverse depth model, PIDM.

Combining LPMM and PIDM, we can derive 1-

point relations for image correspondences. By mas-

sive parallel computing on a GPU, image correspon-

dences are classiﬁed w.r.t. the PIDM into three

classes: static on ground or inﬁnite, static above

ground and dynamic. This vectorized classiﬁcation

can be done independently to other keypoints exploit-

ing the 1-point relations for thousands of correspon-

dences. Deviations from the depth model can be rec-

ognized and corrected by motion stereo of the inte-

grated motion signal. Therefore, individual image

keypoints 3D-locations are not estimated within an

EKF. Here we see the main difference to the approach

of (Civera et al., 2009). Our EKF layout is much

leaner because it only contains the motion parame-

ters and their temporal derivatives. Only selected and

categorized correspondences update the EKF which

therefore converges faster. See Figure 1 to get an idea

of this classiﬁcation task.

Using a generalized disparity equation splits the

motion into a rotational (inﬁnite homography H

∞

)

and a transitional component (Hartley and Zisserman,

2006). A similar approach is proposed in (Tardif

et al., 2008) but using omnidirectional cameras. Our

paper only covers perspective monocular or stereo

cameras heading into the direction of the ego motion.

Besides giving accurate results, (Tardif et al., 2008)

also gives a good overview of recent visual odome-

try, visual SLAM and SfM techniques. It turns out

that pitch (Θ) and roll (Φ) perturbations heavily ef-

fects the quality of the (ρ, Ψ) estimation. In the same

way we extract (ρ,Ψ), we derive equations for (Θ,Φ)

to reduce this effect.

For correspondence calculation we use a recently

proposed efﬁcient and parallelized method for ex-

tracting and matching keypoints (Schweitzer and

Wuensche, 2009). We enhance this approach to long

term correspondences. This allows to correspond up

to 2000 keypoints in a stereo stream (2 × 752 × 480)

for both monocular streams plus stereo matching in

about 5ms on a NVidia Tesla GPU. As a result, three

closed lists of long term correspondences reside in

GPU memory which are then used for the vectorized

PIDM-classiﬁcation mentioned above.

This paper is organized as follows. In section two

we describe the GPU computation of long term cor-

respondences and list generation. Section three intro-

duces the PIDM/LPMM and derives the 1-point re-

lations for the motion parameters from them. The

use for classiﬁcation and EKF estimation is explained.

and results are evaluated against IMU and DPGS

ground truth in section four. We conclude with sec-

tion ﬁve.

2 GPU IMAGE

CORRESPONDENCES

(Schweitzer and Wuensche, 2009) proposed a novel

method for an efﬁcient extraction and matching of

corner-based keypoints. It is based on a dense com-

putation of three normalized haar wavelet responses

(I

x

,I

y

,I

xy

)

t

per pixel at scale t, the so-called SidCell-

image (Scale Invariant Descrpitive Cell), shown in

Figure 2. Haar wavelets can be computed very ef-

ﬁciently by using integral images (Viola and Jones,

2001) and are also employed by SURF correspon-

dences (Bay et al., 2006). From the SidCell-image,

keypoints and descriptors are derived. Keypoints are

I

y

I

xy

I

x

2t

Figure 2: Sidcell Components.

extracted by a non-maximum suppression on the ab-

solute |I

xy

|-component of the SidCell-image. The sen-

sivity of the keypoint extraction is adjusted by a noise

threshold of |I

xy

| between [0,1]. (Schweitzer and

REAL-TIME VISUAL ODOMETRY FOR GROUND MOVING ROBOTS USING GPUS

21

Wuensche, 2009) deﬁne two strength tests to remove

weak keypoints. For correspondence calculation of

consecutive frames two step are carried out. First, a

prematching w.r.t. the sign-signature of the keypoints

(8 classes, different colors in Figure 3) is done. Sec-

ond, only if the ﬁrst step matches, the descriptors are

compared. This avoids heavy-weight descriptor dis-

tance calculations where it is not promising and in-

creases matching efﬁciency a lot. The descriptors it-

selfs are built from distributed and weighted SidCells

around the keypoint, while the complexity of the dis-

tribution is user-deﬁned (see Figure 3).

8−neighborhood4−neighborhood

0 21 12

order

SidCell

gaussian weighted

or

Figure 3: SidCell Matching with |I

xy

| > 0.2.

2.1 List Processing on GPUs

Collecting keypoints into a closed list is mostly done

by an intermediate CPU step. This requires addi-

tional data transfer between GPU and CPU and re-

duces the speed gain. Using modern atomic GPU-

functions like atomicInc() or atomicAdd() those

disturbing CPU-steps are avoidable. List data struc-

ture operations like merge, split or concatenate

all need an addElem operator. A parallelized version

of addElem needs a synchronization. Figure 4 shows

how addElemSync is implemented. The vectorized

versions of merge, split, concatenate are essen-

tial for this work. Assuming that the keypoints can

be processed independently of each other, full paral-

lelism is given. Atomic GPU functions are used in the

same way for building histograms needed later in this

paper.

0

t

1

t

2

t

3

t

1

t

3

1

t

2

t

2

= atomicInc()

t

3

t

1

t

1

t

1

t

3

t

2

t

2

2

3

t

3

Figure 4: Implementation of addElemSync on a GPU. The

ﬁrst element of the list is its size. In this example, three

threads want to add a new element to an empty list simulta-

neously. Each thread gets its number of the list cell where

it can put its data by applying an atomicInc() on the size

element.

Figure 5: SidCell long term correspondences.

2.2 Long Term Correspondences

Long term correspondence (lt-correspondences) are

held in a list. To build this list, initially all frame

to frame correspondences (f2f-correspondences) are

added. Subsequent cycles search this list wether two

f2f-correspondences can be concatenated. If a lt-

correspondence does not have a successor, it is re-

moved from the list. F2f-correspondences which do

not have a predecessor are added as new element. Fig-

ure 5 shows the result. Selected f2f-correspondences

are the measurements for the EKF of the motion pa-

rameters. Lt-correspondences support the selection

and adjust deviations to the assumed inverse depth

model. This is explained in detail in the next section.

VISAPP 2010 - International Conference on Computer Vision Theory and Applications

22

3 OUR APPROACH

3.1 Planar Inverse Depth Model

In case of a known rigid motion [R|t] between two

cameras c

0

→ c

1

the image point m

1

in c

1

can be com-

puted from m

0

in c

0

:

e

m

1

= ARA

−1

·

e

m

0

+ X

−1

0

At (1)

A is the camera calibration matrix and assumed to

be the same for both cameras. It is obvious that this

equation also needs the inverse depth X

−1

0

of the ho-

mogeneous image point

e

m

0

. Additionally it is not lin-

ear in case of X

−1

0

6= 0 or t 6= 0. We discuss this equa-

tion later in this paper, for now it is important how to

get X

−1

0

. The approach here is to imply a structural

model from which the inverse depth can be computed

for every pixel. In case of ground moving robots we

imply a ground plane model. Then, the idea is to

seperate correspondences to those which ﬁt to this

model and those which do not ﬁt. For those which

ﬁt, the inverse depth is known and the motion can be

observed. For those which don’t ﬁt, the inverse depth

can be computed from motion. The importance of the

PIDM ﬁt is high at the beginning but should decrease

over time, when more and more new depth-registered

keypoints are available in a trusted map. When the

system fails, a re-initialization by the PIDM can be

done. To express a 3D point M

0

= (X

0

,Y

0

,0)

>

∈

E(Z

c

,Θ

c

) in camera coordinates (see Figure 6) the ho-

mogeneous transformation R

Y

(Θ

c

) · T(0,0,Z

c

) is ap-

plied.

Z

c

cam

Z

ego

Z

cam

X

Θ

c

ego

X

M

0

cam

X

0

E(Z

c

,Θ

c

)

Figure 6: Depth X

0

of a 3D Point M

0

which lies on the plane

E(Z

c

,Θ

c

) given in camera coordinates.

The corresponding image point m

0

is then given

by

e

m

0

= R

Y

(Θ

c

)T(0,0, Z

c

)P

c

(k) ·

e

M

0

=

kY

0

k(X

0

Θ

c

− Z

c

)

Z

c

Θ

c

+ X

0

=

e

y

0

e

z

0

e

w

0

As one can see, every image line z =

e

λ ·

e

z corresponds

to a constant depth, with

e

λ = 1/

e

w. Now we can derive

the PIDM:

X

−1

(z) =

1

kZ

c

(z

∞

− z)

PIDM(z) =

(

X

−1

(z) if X

−1

> 0

0 else

(2)

where z

∞

= kΘ

c

is the image line at inﬁnity (horizon).

Note that the PIDM is a linear mapping without sin-

gularity at inﬁnite points.

3.2 Non-holonomic Robot Motion

Recent work (Scaramuzza et al., 2009b) introduced

a simple circular motion model with two parameters

(ρ,Ψ) for wheeled ground moving robots (see Figure

7). The transitional component t of the rigid camera

ρ

Ψ

Ψ

Ψ/2

Figure 7: Local circular motion model by (Scaramuzza

et al., 2009b).

motion [R|t] is then given by:

t = ρ ·

cos(Ψ/2)

sin(Ψ/2)

0

(5)

In addition to this model, our model is augmented by

pitch (Θ) and roll (Φ). The motion (Θ,Φ) is uncon-

strained and modeled by assuming constant veloci-

ties.

3.3 Linearized Planar Measurement

Model LPMM

Using the disparity equation (1) and the our motion

model (5) it is possible to derive the displaced homo-

geneous image point

e

m

1

as given in (3). The mea-

surement is the displacement ∂m

1

of the correspon-

dence of the image point m

0

. Therefore, the par-

tial derivatives ∂m

1

/∂(Ψ,Θ, Φ,ρ) in euclidean coor-

dinates must be determined. Using (3) and the tran-

formation from homogeneous coordinates into eu-

clidean m

1

=

e

λ · (

e

y

1

,

e

z

1

)

>

the derivatives ∂m

1

/∂(.. .)

yield to:

REAL-TIME VISUAL ODOMETRY FOR GROUND MOVING ROBOTS USING GPUS

23

e

m

1

=AR

Φ

R

Θ

R

Ψ

A

−1

·

e

m

0

+ X

−1

0

Aρ

cos(Ψ/2)

sin(Ψ/2)

0

=

k(c

1

c

2

c

3

− c

3

s

1

) + y

0

(c

1

c

3

+ s

1

s

2

s

3

) + z

0

c

2

s

3

+

1

2

X

−1

0

kΨρ

k(s

1

s

3

+ c

1

s

2

c

3

) − y

0

(c

1

s

3

− s

1

s

2

c

3

) + z

0

c

2

c

3

1

k

y

0

s

1

c

2

−

1

k

z

0

s

2

+ c

1

c

2

+ X

−1

0

ρ

=

e

y

1

e

z

1

1/

e

λ

with s

1

= sinΨ, c

1

= cosΨ, s

2

= sinΘ, c

2

= cosΘ, s

3

= sinΦ, c

1

= cosΦ

(3)

∂m

1

∂(Ψ,Θ, Φ)

=

e

λ

e

y

1

e

z

1

·

∂

∂Ψ

∂

∂Θ

∂

∂Φ

≈λ

(

1

2

X

−1

0

ρ − 1)k − Ψy

0

Φk Θk − Φy

0

+ z

0

Φk + Θy

0

k + Ψy

0

− Θz

0

Ψk − y

0

− Φz

0

+

∂

e

λ

∂(Ψ,Θ, Φ)

e

y

1

e

z

1

∂m

1

∂ρ

=

e

λ

∂

∂ρ

e

y

1

e

z

1

+

∂

e

λ

∂ρ

e

y

1

e

z

1

≈λ

1

2

X

−1

0

Ψk

0

− X

−1

0

(1 − Ψ +

1

2

X

−1

0

ρ)k + y

0

+ Θz

0

Θk − Φy

0

+ z

0

with

e

λ ≈λ =

k

(1 + X

−1

0

ρ)k + Ψy

0

− Θz

0

(4)

∂m

1

∂(.. .)

=

e

λ

∂(

e

y

1

,

e

z

1

)

>

∂(.. .)

+ (

e

y

1

,

e

z

1

)

>

∂

e

λ

∂(.. .)

(6)

This term gets very complex. We assume that the im-

age displacement are comparatively small which al-

lows for a small angle approximation: c

i

gets to 1 and

s

i

gets to {Ψ|Θ|Φ}. Also we assume that products of

two or more angles get to zero. Note that this approx-

imation is only valid after the derivatives are calcu-

lated.

It turns out that the terms for ∂

e

λ/∂(Ψ,Θ, Φ) can also

be approximated to zero in case of X

−1

0

→ 0. Further-

more, the approximation ∂

e

λ/∂ρ ≈ −X

−1

0

is valid. The

result is given by the set of equations in (4). This set

is what we call the linearized planar motion model,

brieﬂy LPMM.

3.4 Discussing the LPMM

Getting granular on the LPMM one can draw valu-

able consequences. One important term is the prod-

uct of the motion scale ρ and the inverse depth X

−1

0

of

a keypoint. If the robot is not moving and therefore

ρ = 0, X

−1

0

also vanishes from the equation. No struc-

ture is observable in this case which is consistent with

the 3D reconstruction theory. The product also van-

ishes for far away (or inﬁnite) 3D points. From those

measurements no information about ρ is observable.

But it’s worth to ﬁnd those far away points because

the rotational component (or the inﬁnite homography

H

∞

) of the camera motion can be measured almost

directly. A further consequence can be seen, if ev-

ery angle-product is set to zero. This rough estimate

still contains valuable information because the dom-

inating terms are clearly visible. Another important

consequence is, that the individual parameters of the

LPMM become independent of each other. Further-

more, λ gets to (1 + X

−1

0

ρ)

−1

and for far away points

it gets to 1. One of the most important properties of

the LPMM is the fact that the z-component of ∂m

1

/∂ρ

is completely independent of the yaw paramter Ψ. Ex-

ploiting those consequences allows a vectorized clas-

siﬁcation of image correspondences we discuss later.

4 EGO MOTION ESTIMATION

In order to estimate the robots motion we want to ex-

tract four signals, one transitional scale (ρ) and three

rotational angles (Ψ,Θ,Φ). Each image frame com-

ing with the measured f2f-correpondences updates an

EKF designed to output those signals by using the

LPMM for jacobian linearization. As still discussed,

measurements have to be selected w.r.t. their capabil-

ity to observe a certain signal. Mainly this depends

on their depth conﬁguration and leads to an initializa-

tion problem. For our robot we imply the PIDM as a

starting point. Note that one can apply any (inverse)

VISAPP 2010 - International Conference on Computer Vision Theory and Applications

24

depth model, e.g. a corridor model for indoor robots

or depth from stereo. Needless to say, the assumed

inverse depth model has to be observable to a certain

degree, otherwise it would fail. The aim is to adapt

to the real scene by depth registering variations to the

model and soften the model assumption.

4.1 Scale Estimation

Using the PIDM (2) within the Ψ-independent z-

component of LPMM scale (4) leads to the following

1-point equation:

∂ρ =

kZ

c

(z

0

+ Θk − Φy

0

)(z

0

− z

∞

)

∂z (7)

Once again, z

0

is the z-coordinate of the extracted im-

age keypoint m

0

and ∂z is the z-displacement mea-

surement of the correspondence to the previous frame.

(7) gets singular for z

0

→ z

∞

and on the other side well

conditioned for 3D points in the near ﬁeld (X

−1

0

0).

Figure (8) shows the keypoint correspondence classi-

ﬁcation and scale estimation without using any addi-

tional signal than the image stream. Figure (9) shows

the EKF scale output while Figure (10) uses an INS

pitch signal for correction. Setting the roll compo-

INS

˙

ρ = 9,91

m

s

m

s

n

0

20

y

z

∞

z

Figure 8: Classiﬁcation of image correspondences for scale

estimation. First, a histogram is built by a vectorized 1-

point scale computation for every correspondence indepen-

dently on the GPU. By histogram voting, the most likely

scale value is choosen. The dashed blue line indicates the

ground truth scale value measured by the INS. Green cor-

respondences match to the histogram vote, red ones don’t.

Histogram vote and EKF prediction are combined for a sta-

ble inlier classiﬁcation. See also the image coordinate frame

and the line at inﬁnity (black).

nent in equation (7) to zero, ∂ρ is completely inde-

pendent from the y-coordinate in the image. One can

say this denotes a

1

2

-point relation but in fact this is a

PIDM special case.

4.2 Rotation Estimation

At a ﬁrst look, the rotation estimation easily can be

done by taking f2f-correspondences of far away key-

−2

0

2

4

6

8

10

12

14

0 200 400 600 800 1000 1200 1400 1600 1800

Frame

−0.1

−0.05

0

0.05

0.1

0.15

0 200 400 600 800 1000 1200 1400 1600 1800

Frame

EME

˙

ρ

INS

˙

ρ

Scale (ρ) Velocity Comparison INS/EME

˙

ρ [m/s]

INS

˙

Θ

˙

Θ [rad/s]

Pitch (Θ) Velocity INS

Figure 9: Ego motion estimation (EME, blue) scale signal

compared to the ground truth INS signal (green). EME ex-

tracts a near similar signal with two major perturbations.

Between frame 700-1000, a strong pitch perturbation oc-

curs, see the green INS pitch signal below. Between frame

1000-1200 the scene is quite keypoint-less. The constant

acceleration system model of the EKF stablizes the signal

for a certain time.

−2

0

2

4

6

8

10

12

14

0 200 400 600 800 1000 1200 1400 1600 1800

Frame

Scale (ρ) Velocity Comparison INS/EME with Pitch Compensation

˙

ρ [m/s]

INS

˙

ρ

EME

˙

ρ

Figure 10: Scale signal with pitch compensation by ground

truth INS data. The peaks between frame 700-1000 are ﬂat-

tened.

−0.3

−0.25

−0.2

−0.15

−0.1

−0.05

0

0.05

0.1

0 200 400 600 800 1000 1200 1400 1600 1800

Frame

−0.1

−0.05

0

0.05

0.1

0.15

0 200 400 600 800 1000 1200 1400 1600 1800

Frame

−0.08

−0.06

−0.04

−0.02

0

0.02

0.04

0.06

0.08

0 200 400 600 800 1000 1200 1400 1600 1800

Frame

˙

Ψ [rad/s]

Yaw (Ψ) Velocity Comparison INS/EME

INS

˙

Ψ

EME

˙

Ψ

˙

Θ [rad/s]

Pitch (Θ) Velocity Comparision INS/EME

INS

˙

Θ

EME

˙

Θ

˙

Φ [rad/s]

Roll (Φ) Velocity Comparison INS/EME

INS

˙

Φ

EME

˙

Φ

Figure 11: Rotation estimation (EME, blue) by far away

keypoints compared to the ground truth INS signal (green).

The selection of the keypoints is done w.r.t. the PIDM.

points. The structure term ρX

−1

0

of the optical ﬂow

measurement ∂m gets to zero for those points and

REAL-TIME VISUAL ODOMETRY FOR GROUND MOVING ROBOTS USING GPUS

25

0

50

100

150

200

250

300

350

400

−250 −200 −150 −100 −50 0 50 100 150

Y [m]

X [m]

Trajectory Comparison DGPS/INS/EME

DGPS Track

INS Track

EME(Ψ) Track

DGPS Track

INS Track

EME(Ψ) Track

Figure 13: Ego motion estimation (EME, blue) track by using the estimated yaw signal and the locally circular motion model.

The scale signal is given by INS ground truth. Red shows the differential GPS track, green shows the motion model using

ground truth INS data. The right image shows the tracks referenced in a satellite image by GoogleEarth.

0

50

100

150

200

250

300

350

400

−250 −200 −150 −100 −50 0 50 100 150

Y [m]

X [m]

Trajectory Comparison DGPS/INS/EME

INS Track

DGPS Track

EME(ρ,Ψ) Track

Figure 12: Ego motion estimation (EME, blue) track by us-

ing the pure monocular vision scale and yaw signal of Fig-

ure 10 and Figure 11.

therefore the LPMM (4) simpliﬁes considerably. Fig-

ure 11 shows the EKF rotational signals for yaw Ψ

and pitch Θ using the same method as for scale es-

timation with the difference of taking far away key-

points instead of near ones. Figures 12 and 13 show

the driven tracks by integrating the estimated motion

signals.

5 CONCLUSIONS

In this paper we introduced a visual odometry method

using the EKF framework and GPU computed im-

age correspondences. We decribed the measurement

model LPMM of the EKF in detail and derived 1-

point relations for a vectorized inliers selection on

the GPU. We achieve a overall computation time of

less than 10ms: 4ms for GPU correspondence calcu-

lation and selection and 5ms for the EKF update on a

2GHz single core CPU. Future work will extend this

approach to a full structure from motion application.

ACKNOWLEDGEMENTS

The authors gratefully acknowledge support of this

work by the Deutsche Forschungsgemeinschaft (Ger-

man Research Foundation) within the Transregional

Collaborative Research Center 28 Cognitive Automo-

biles.

REFERENCES

Bay, H., Tuytelaars, T., and Gool, L. V. (2006). SURF:

Speeded up Robust Features. In Proc. European Conf.

on Computer Vision.

Civera, J., Davison, A., and Montiel, J. (2008). Inverse

Depth Parametrization for Monocular SLAM. IEEE

Transactions on Robotics, 24(5):932–945.

Civera, J., Grasa, O. G., Davison, A. J., and Montiel, J.

M. M. (2009). 1-Point RANSAC for EKF-Based

Structure from Motion. In Proc. IEEE/RSJ Int’l Conf.

on Intelligent Robots and Systems.

Fischler, M. A. and Bolles, R. C. (1981). Random Sample

Consensus: A Paradigm for Model Fitting with Ap-

plications to Image Analysis and Automated Cartog-

raphy. Comm. of the ACM, 24:381–395.

VISAPP 2010 - International Conference on Computer Vision Theory and Applications

26

Hartley, R. (1994). Euclidean reconstruction from uncali-

brated views. In Proc. IEEE Conf. on Computer Vision

and Pattern Recognition, pages 908–912.

Hartley, R. and Zisserman, A. (2006). Multiple View Geom-

etry. Cambridge University Press.

Longuet-Higgins, H. C. (1981). A Computer Algorithm for

Reconstructing a Scene From Two Projections. Na-

ture, 293(1):133–135.

Nister, D., Naroditsky, O., and Bergen, J. (2004). Visual

Odometry. In Proc. IEEE Conf. on Computer Vision

and Pattern Recognition, volume 1, pages I–652–I–

659 Vol.1.

Scaramuzza, D., Fraundorfer, F., Pollefeys, M., and Sieg-

wart, R. (2009a). Absolute Scale in Structure from

Motion from a Single Vehicle Mounted Camera by

Exploiting Nonholonomic Constraints. In Proc. IEEE

Int’l Conf. on Computer Vision.

Scaramuzza, D., Fraundorfer, F., and Siegwart, R. (2009b).

Real-Time Monocular Visual Odometry for On-Road

Vehicles with 1-Point RANSAC. In Proc. IEEE Int’l

Conf. on Robotics and Automation.

Schweitzer, M. and Wuensche, H.-J. (2009). Efﬁcient Key-

point Matching for Robot Vision using GPUs. In Proc.

12th IEEE Int’l Conf. on Computer Vision, 5th IEEE

Workshop on Embedded Computer Vision.

Tardif, J., Pavlidis, Y., and Daniilidis, K. (2008). Monoc-

ular Visual Odometry in Urban Environments Using

an Omnidirectional Camera. In Proc. IEEE/RSJ Int’l

Conf. on Intelligent Robots and Systems, pages 2531–

2538.

Triggs, B., McLauchlan, P., Hartley, R., and Fitzgibbon, A.

(2000). Bundle Adjustment – A Modern Synthesis.

In Vision Algorithms: Theory and Practice, LNCS,

pages 298–375. Springer Verlag.

Viola, P. and Jones, M. (2001). Rapid object detection using

a boosted cascade of simple features. In Proc. IEEE

Conf. on Computer Vision and Pattern Recognition.

REAL-TIME VISUAL ODOMETRY FOR GROUND MOVING ROBOTS USING GPUS

27