Monoscopic Automotive Ego-motion Estimation and Cyclist Tracking

Johannes Gr

ater and Martin Lauer

Institute of Measurement and Control, Karlsruhe Insitute of Technology, Karlsruhe, Germany

1 RESEARCH PROBLEM

In trafﬁc accidents cyclists are always counted as a

vulnerable group suffering from heavy injuries and

fatalities. A particularly dangerous type of collision

involves trucks that turn to the right without recogniz-

ing cyclists driving on their lane in a blind spot. Even

though it is not the most frequent type of accident the

chance of survival for a cyclist involved is low.

One of the main causes for the collision is that the

truck drivers only have a limited ﬁeld of vision. The

cyclists in the surroundings are hard to perceive due

to their smaller size. Besides, it is difﬁcult to pre-

dict their behavior. The velocity of a cyclist is usu-

ally comparable to a slowly running car and they must

share the same road with other trafﬁc participants,

which makes them easy to be occluded by other ve-

hicles. Hence, this reduces the truck driver’s reaction

time once they are noticed. This also explains that the

heaviest accidents involving trucks and cyclists often

happen when a truck turns right at an intersection.

In order to solve this problem on an intelligent

level we are aiming at developing a driving assistance

system for trucks to avoid possible accidents with cy-

clists. The main task is to detect the cyclists with the

help of a state-of-the-art hardware setup consisting of

a single-row Light Detection And Ranging(LIDAR)-

Sensor in combination with a camera. Based on the

detection, the movement of the cyclist is estimated

and its behavior is predicted so that the risk of ac-

cidents can be assessed. An intelligent warning strat-

egy alarms the truck driver in dangerous situations to

avoid accidents.

The aim of this work is to estimate the metric tra-

jectory of the ego-vehicle and the cyclist using a sin-

gle camera and a low-cost single-row laser scanner.

For estimating the ego-motion without scale we com-

plement existing methods to satisfy the requirements

of our application. The main challenge of mono-

scopic Visual Odometry is the unobservability of the

scale of the scene. The focus of this project lies on

estimating the scale from scene implicit information

combined with a priori knowledge about the scenery.

On the other hand external sensors are applied to ob-

tain more data about the scale and ﬁnd an optimal con-

ﬁguration for estimating a scaled trajectory of a car in

various environments.

2 STATE OF THE ART

Monoscopic Visual Odometry and monoscopic Si-

multaneous Localization and Mapping (SLAM) are

well known subjects in the image processing commu-

nity. Although it has been undergoing research during

approximately two decades it is still a contemporary

issue. Scaramuzza et al. give an overview of Visual

Odometry algorithms (Scaramuzza and Fraundorfer,

2011; Fraundorfer and Scaramuzza, 2012). Espe-

cially the uprising of virtual reality as well as the

development of small size, cheap Unmanned Aerial

Vehicles (UAV) equipped with a front camera in-

creased its popularity. A current trend catches up an

old principle to estimate the ego-motion by the aid

of sparse optical ﬂow where the ego-motion is esti-

mated by minimizing the photometric error. Forster

et al.(Forster et al., 2014) present an algorithm for us-

age on an UAV where the before mentioned method

is employed. A particularly interesting part of this

work is the feature alignment which is used to re-

duce scale drift. The work of Engel et al. (Engel

et al., 2014) is more oriented towards virtual reality

focusing not only on the ego-pose estimation but also

on a highly accurate reconstruction of the environ-

ment. However, both are optimized for their respec-

tive tasks. Especially the need to observe the land-

marks over a long image sequence and the reliance on

loop closure makes them inadequate for the use on a

fast moving vehicle such as automobiles.

Therefore in this work the ”Stereo Scan” approach of

Andreas Geiger et al. is used (Geiger et al., 2011). It

is a feature-based framework developed for the use on

automotive vehicles which relies on the eight point al-

gorithm (Hartley, 1997). The key development is the

type of features used. The two non-standard feature

detectors detect corners and blobs. In combination

with a feature descriptor that is fast to compute and

to match, a large amount of features can be used for

Gräter J. and Lauer M..

Monoscopic Automotive Ego-motion Estimation and Cyclist Tracking.

 2015 SCITEPRESS (Science and Technology Publications, Lda.)

ego-motion estimation in real time.

While in the previously mentioned approaches the

unscaled ego-motion can be computed accurately, the

scale is roughly estimated and held ﬁx. Therefore the

focus lies on reducing the scale drift. For example

the popular Parallel Tracking and Mapping (Klein and

Murray, 2007) framework lets the user move the cam-

era about 10 cm to the right during the initialization

and ﬁxes the scale afterwards. Geiger’s approach uses

another principle. Given the height over ground and

the inclination of the camera to the ground plane it es-

timates the scale frame to frame by reconstructing the

soil, which then includes the scale drift. We want to

follow this approach while releasing the constraint of

a ﬁxed inclination angle of the camera.

3 OUTLINE OF OBJECTIVES

3.1 Main Objective and Challenge

We want to research the possibility to estimate the

scale of a monocular trajectory by means of scene

understanding and, if necessary, by the aid of exter-

nal sensors like a single-row LIDAR system. The

aim is to precisely estimate the driven metric trajec-

tory. Hereby not only the estimation of the scale itself

poses a challenge but also its drift which is impor-

tant since the trajectory is a concatenation of relative

motion which underlies uncertainties. Many algo-

rithms as ”Fast Semi-Direct Monocular Visual Odom-

etry” (Forster et al., 2014) and ”Large-Scale Direct

Monocular SLAM” (Engel et al., 2014) developed

sophisticated algorithms to reduce the drift as far as

possible so that the scale can be ﬁxed once and has

no need for modiﬁcation afterwards. We want to ap-

proach the problem from another perspective: If it

would be possible to calculate the scale from frame

to frame the scale-drift would be eliminated.

3.2 Why Monocular Vision?

The advantages of using a monocular system are man-

ifold. Firstly, it is a very inexpensive sensor setup - the

camera and its optics are low-cost in comparison with

multilayer laser scanners. Furthermore, a monocular

camera setup is a lot more robust than for example a

stereoscopic one, since the latter requires an accurate

calibration which might be lost even due to small me-

chanical shocks.

Moreover, the main problem of monocular sys-

tems, i.e. the unobservability of the scale, is at the

same time a big advantage. In the image space there

is no difference between small motion in a dense en-

vironment as for example the image of an endoscopic

system in a vein and an UAV that observes the earth

from large distances with high velocities. The critical

parameter is the ratio between the mean scene depth

and the velocity of the camera. Hereby the focus of

our research ﬁeld, automotive application, is particu-

larly challenging because this ratio can be very high.

On the other hand, the application on cars has

the advantage that it is possible to make assump-

tions about the environment. In general the height

over ground of the camera position is constant due

to the planar movement of the vehicle. This allows

to estimate the scale of the trajectory by modeling the

ground plane and comparing the image-space height

over ground with the real-world height. However, this

is only possible in areas of clearly identiﬁable streets

where the ground plane is dominant in the image.

In sceneries with dense trafﬁc we have to rely on

other assumptions. Humans can deduce their move-

ment with only one eye using their knowledge about

the real size of objects in the real world. Analo-

gously we could detect objects like cars, cyclists or

road markings in the image and deduce a prior for the

scaling estimation from that.

Another research direction is the use of external

sensors, for example a LIDAR system, which could

deduce depth information from the scene or even

global localization methods like the Global Naviga-

tion Satellite System (GNSS) could serve as a source

of scale information. Once having deduced the mo-

tion and the scale of the scene, it can be reconstructed

by classical methods of Structure from Motion (Hart-

ley and Zisserman, 2010, p. 312) which allows fur-

ther applications in scene understanding. Regarding

this project the trajectory of the ego-motion will be

combined with a cyclist detection and its tracking to

predict collisions.

4 STAGE OF THE RESEARCH

4.1 Scale Estimation

In a ﬁrst step we want to focus on the estimation of

the scale only. Therefore we choose an existing, very

efﬁcient algorithm for the unscaled ego-motion esti-

mation as a baseline, i.e. ”Stereo Scan” (Geiger et al.,

2011). This shall be considered as a ﬁrst attempt to

get a grip on the ego-motion estimation. More so-

phisticated algorithms are to be evaluated. We can

split up the scale estimation into estimation from a

priori knowledge about scene inherent features and

scale estimation with sensors different than cameras,

VISIGRAPP2015-DoctoralConsortium

i.e. external sensors.

4.1.1 Scale Estimation by inherent Scene

Information

The basic idea of these scaling methods is to make

use of information that we can extract from the scene

about which we have prior knowledge. Our ﬁrst at-

tempt on doing that is to extract the ground plane

coordinates of the scene. Knowing the coordinates

of the plane in the unscaled space and having prior

knowledge about the height over ground of the cam-

era we can compute the scale. In sceneries with a

clearly identiﬁable ground plane this yields already

good results. In comparison with the original algo-

rithm (Geiger et al., 2011), which assumes a constant

elevation angle of the camera to the ground plane, we

succeeded in reducing the error of the scale. In the

future, other features could further indicate valuable

information about the scale. Features to be consid-

ered could be other trafﬁc participants or static objects

like trafﬁc lights or posts as well as road markings of

which we know the metric measures. However, object

detection comes at higher computational cost, there-

fore the features have to be chosen carefully.

4.1.2 Scale Estimation by External Scene

Information

In environments with a highly occluded ground plane,

for example due to dense trafﬁc, the ground plane

estimation does not yield acceptable results. In this

case we will resort to external sensors, as for now

the depth information from a laser scanner. In or-

der to know the positions of the laser beams in the

camera image we need to calibrate the laser scanner

with respect to the camera image. This is a challeng-

ing task since the laser scanner possesses only little

precision at small distances. Knowing the position

of the laser beam we can extract depth information

for these points and include it in the scale estimation.

This is work in progress. First ideas are a comparison

of reconstructed 3d points with measured laser points

in their proximity or the direct reconstruction of the

laser beam hit point due to the epipolar geometry of

subsequent images.

4.2 Cyclist Tracking

In order to predict collisions between a moving ve-

hicle and a cyclist, both of their trajectories must be

known. Being able to deduce the ego-motion as de-

scribed in section 4.1 we still need to estimate the cy-

clist’s trajectory. First the cyclist is detected in the

image by a new methodology established by Tian and

Lauer (Tian and Lauer, 2014). Hence, we know the

position of the cyclist in the image and can recon-

struct the line of sight on which it is positioned. How-

ever, the scale is still unknown and we need a metric

measurement of the cyclist’s depth. Here the LIDAR

which is already used for solving the scale ambiguity

of the ego-motion provides depth data of the cyclist.

An emerging problem using different sensors is

their asynchronous measuring time, i.e. images are

obtained at a different frequency than laser scans.

Moreover, we cannot assure that the cyclist is hit by

the laser scanner in each scan since the gaps between

the laser beams might be too big. To solve this in an

elegant way we implemented two methods - one using

an Unscented Kalman Filter and another one using a

Least Squares approximation with latent variables.

5 METHODOLOGY

In this section we present the methods used for the

already established parts of our framework. Those

are the scale estimation by ground plane tracking, the

laser scanner to camera calibration as well as two

methods used for cyclist tracking.

5.1 Scale Estimation by Ground Plane

Tracking

The scale is an unobservable parameter for a mono-

scopic camera. One way to solve this problem is

the usage of a priori information about scene inher-

ent structures such as the ground plane. In a ﬁrst step

feature points of two consecutive frames are extracted

and matched (Geiger et al., 2011). Then, the corre-

sponding motion is estimated using an eight-point-

algorithm as well as a RANSAC based outlier re-

jection based scheme (Hartley and Zisserman, 2010,

p. 88, p. 262). Given the unscaled ego-motion and

two dimensional feature matches we can reconstruct

points in 3d space by classical Structure From Motion

algorithms (Hartley and Zisserman, 2010, p. 312).

The result is a 3d point cloud of the scene which has

yet to be multiplied by the scaling factor. Since the

metric height over ground of the camera is known in

this application, our goal is to obtain the scale from

the ground plane parameters. An overview of the al-

gorithm is given in ﬁgure 1.

5.2 Scale Estimation by a LIDAR

Sensor

If no dominant ground plane is visible in the scene,

external sensor information can be used to estimate

MonoscopicAutomotiveEgo-motionEstimationandCyclistTracking























































 



#



































$













%













































!%!



%



&'

(









Figure 1: Flowchart of the plane estimation. Estimation algorithms are boxed by a rectangle, estimated variables are shown in

ellipses. The tracking block is shown in a box with round corners. To estimate the orientation of the plane a RANSAC based

outlier rejection is affected, followed by a Least Squares optimization on the inlying points. The orientation is tracked by a

Kalman Filter. The tracked orientation is used to ﬁt a plane to the point cloud with another outlier rejection to ﬁnd the distance

of the plane to the origin. From there the scale can be calculated knowing the height over ground of the camera. The scale

has to be reverted to the outlier rejection in order to determine the inlier threshold. The decoupling of the plane orientation

estimation and the plane distance estimation is advantageous since sceneries with short occlusion of the ground plane can be

bridged by the tracking.

the scale. Since we want to focus on user oriented ap-

plications we use a low-cost single-row laser scanner.

In order to calculate the scale we need to cor-

relate a metric measurement with an unscaled point

from the reconstruction of the scene. A basic idea

is the selective reconstruction of the point that cor-

responds to the hit point of the laser p in the image

I. This point is known by the laser to camera cal-

ibration, see section 5.3. Knowing the fundamental

matrix F of two consecutive frames I

and I by the

eight-point-algorithm we can calculate the epipolar

line l

in the ﬁrst image I

corresponding to p with

= F p. The corresponding point p

in I

can then

be found economically by sampling key points along

this epipolar line and matching descriptors extracted

of their surroundings and the surrounding of p. As

descriptor a simple block matching or the descrip-

tor BRIEF (Calonder et al., 2010) are considered and

have yet to be evaluated. Knowing p and p

as well as

F and therefore the rotation and translation between I

and I

, we can reconstruct the 3d point corresponding

to p and extract the scale due to the metric measure-

ment of the laser scanner.

5.3 Laser to Camera Calibration

In order to know the image position of the points

where the laser beams hit an object, difference in pose

∆P between the camera and the laser scanner have to

be known. In our case, we use a low-cost laser scan-

ner. Its standard deviation of the range measurements

is very high, approximately 0.1m. Consequently tar-

get based calibration methods have a low chance of

success, which is why we rely on a proper vision

based method.

Our cameras do not have infrared ﬁlters so that

the laser-object intersection points can be recognized

in the image, see ﬁgure 2. To calibrate our device

we ﬁx the position of the camera-laser unit and put a

planar object in front of it. Over time we move that

object slowly varying the distance of the object and

the sensor device between 0.2m to 4.0m. Our goal

is to extract the measured laser point from the image

and minimize the reprojection error between the re-

projected 3d laser points measured by the scanner and

the observed points in the camera image as shown in

equation 1.

argmin

P=(α,β,γ,t

)

∑





u − u

pro j

(P)

v − v

pro j

(P)





(1)

α,β,γ indicate the rotation angles and t

the trans-

lation vector. u

pro j

denote the back projected 3d

laser scanner points and u,v are corresponding points

in the image.

A challenge poses the measurement point extrac-

tion. Especially at high distances the measured points

in the image can have a very low brightness whereas

VISIGRAPP2015-DoctoralConsortium

Figure 2: Points of the laser scanner seen in the camera

image in a darkroom. On top we see the scene with lights

on, in the lower image the light is turned off.

Figure 3: Overlapped images of laser scan measurements at

different distances. In red we see the manually marked lines

used for measurement extraction.

at small distances the points meld. To overcome this

we overlap the camera images from different points in

time, which results in a fan-like image. This fan rep-

resents the epipolar lines of the laser scanner in the

image. Next we can manually mark these lines and

extract pixels with maximum brightness in their prox-

imity. The overlapped image is shown in ﬁgure 3.

After convergence of the nonlinear minimization

problem posed in equation 1 we obtain the pose of

the laser scanner relative to the camera. A qualitative

evaluation of the calibration is given in ﬁgure 4.

5.4 Cyclist Motion Estimation

As described in section 4.2 we need to estimate the

trajectory of a cyclist detected in the image in order

to predict the collision between the automobile and

the cyclist. The cyclist’s position in image coordi-

nates can be estimated by novel algorithms developed

by Tian and Lauer (Tian and Lauer, 2014). Our goal

is to estimate the cyclist’s position in 3d coordinates.

In a ﬁrst approach we model the cyclist by its po-

sition in 3d space, x, y, z and by its velocity v

We assume that its acceleration is small and model it

by the uncertainties of the velocity. Furthermore, the

cyclist moves on a plane that is parallel to the plane

spanned by z and x. Therefore we choose v

to be zero

and y constant with very small uncertainty. Thus our

problem is nearly two dimensional and more robust

than in three dimensional space, but with the advan-

tage that small changes in the ground plane inclina-

tion can still be modeled.

Using a pinhole camera model the image posi-

tion of a 3d point of the camera coordinate system

, y

, z

)

is given by





= Intrin ·









(2)

Intrin is the intrinsic matrix of the pinhole cam-

era. Therefore an image point p given by u and v

maps onto a ray in 3d space.

By measuring the distance of the cyclist with a

range based method we can thus estimate the distance

from the cyclist to the camera and determine its po-

sition. Knowing the global camera pose due to the

ego-motion estimation, it is possible to express the

cyclist’s position (x, y, z)

in global coordinates.









= T (x) =





R(α, β, γ)

















(3)

Hereby R denotes the rotation matrix and

, t

)

the translation vector of the camera rela-

tive to the origin.

However, the raw position estimate of the detected

cyclist does not attain our need of precision. Con-

sequently a method is needed for estimating the po-

sition of the cyclist more accurately. Moreover, the

LIDAR measurements are not synchronized with the

image capture. As a result the algorithm needs ﬂex-

ibility regarding the incorporation of the range mea-

surement. Here we propose two suitable methods, the

ﬁrst one being able to yield good results if the fre-

quency of the range measurements is high, the second

one being more suitable for low range measurement

frequencies.

5.4.1 Cyclist Tracking by an Unscented Kalman

Filter

The ﬁrst method is tracking by an Unscented Kalman

Filter (UKF), a well established method for non-

linear tracking introduced by Wan (Wan and Van

Der Merwe, 2000). The state transition function

i+1

= f (x

) between two states x

and x

i+1

is here

deﬁned as

i+1

= f (x

) =







x + ∆t · v

y + ∆t · v

z + ∆t · v







(4)

MonoscopicAutomotiveEgo-motionEstimationandCyclistTracking

Figure 4: Evaluation of the laser scanner to camera calibration. Measured points in the image are shown in blue, 3d laser

scanner points that are reprojected using the calibrated pose are shown in green. We see that at distances up to 1.0m the

measurement points are very fuzzy. This is to one part caused by the noisy depth measurement of the laser scanner and to

another part caused by a time jitter due to the asynchronism of the laser scanning and the image capturing. However, at

distances from 1.5m on, the green curve and the blue curve ﬁt well and therefore the calibrated pose between the laser scanner

and the camera is sufﬁciently accurate.

The state transition is linear in x but depends on the

time increment ∆t which changes from estimation to

estimation. The measurement function p

i,Meas

= h(x

)

for the position of the cyclist in the image is a nonlin-

ear function. In addition to that we denote the mea-

surement function for the depth d

i,Meas

= g(x

). First

the global position estimate x

, y

, z

at time i is trans-

formed to local camera coordinates by equation 3.

Since the local coordinates are chosen such as the z-

axis corresponds directly to the depth of the image we

therefore formulate

i,Meas

= g(x

) =



0 0 1



· T (x

) (5)

Then, we project the local position estimate

i,l

, y

i,l

, z

i,l

to the image by the pinhole model with

equation 2.

A critical point using Kalman Filters is the deter-

mination of the process noise covariance and the mea-

surement error covariance. We assume the standard

deviation of the acceleration of the cyclist in the direc-

tions x and y: σ

acc

= 1.0

. Thus follow the standard

deviations for the velocities and positions

= ∆tσ

acc

(6)

σ =

∆t

acc

(7)

Note that the process noise covariance is dependent

on the time difference between two measurements.

The process noise covariance Q is hence deﬁned as







0 0 σσ

0 0

0 0.01σ

0 0 0.01σσ

0 0 σ

0 0 σσ

σσ

0 0 σ

0 0

0 0.01σσ

0 0 0.01σ

0 0 σσ

0 0 σ







(8)

The scalar factors in front of the entries of the position

and the velocity in y-direction model the quasi planar

motion of the cyclist. In order to make the tracking

more robust the height of the cyclist can additionally

be detected and included in the tracking as a 7th es-

timation parameter. This gives us an idea of the dis-

tance of the cyclist to the camera but can only be used

to stabilize the depth tracking since the accurate met-

ric measures of cyclists vary. The measurement noise

covariance for both h(x) and g(x) is assumed to be

uncorrelated.

5.4.2 Cyclist Tracking by a Least Squares

Approximation

In this second method we approximate the cyclist’s

state by a sliding window of n camera frames at

points in time t

. In this window we assume con-

stant velocity of the cyclist. Therefore the assump-

tions here are stronger compared to the UKF in sec-

tion 5.4.1 where we allow a small variation in each

VISIGRAPP2015-DoctoralConsortium

frame. To calculate the position x = (x, y, z)

and ve-

locity v = (v

, v

)

, we minimize the quadratic er-

ror

argmin

x,v

∑

i=1

k(x +t

v − p

(9)

is the measured cyclist position in 3d world coordi-

nates at t

. Zeroing the ﬁrst derivative respect to x and

v gives the classic Linear Least Squares equation





I · n I

∑













∑





, i = 1 . . . n (10)

I denotes the identity of R

. The cyclist position p

is measured from the vehicle at position P

by a range

from the laser scanner and a direction w

from the

camera, from which follows

= P

+ r

, kw

= 1 (11)

Hereby we face the problem that we can not ob-

serve a range detection for each t

. Thus we have to

include an estimation for range detections. This is

done by projecting the estimated local cyclist position

x +t

v − P

at each time instant t

on the direction w

With w

= w

= WW x

we conclude the es-

timation for the cyclist position and velocity as shown

in algorithm 1.

The strong assumption that the velocity is constant

in a sequence of frames, renders this method less ﬂex-

ible compared to the UKF, for which small variations

are allowed for each frame. Therefore the UKF is

more appropriate if a lot of range measurements are

available. However, if only few range measurements

are made, i.e. at a frequency of 1 Hz and less, this

method is more robust. A qualitative comparison of

both algorithms is given in ﬁgure 5. A quantitative

evaluation is work in progress.

6 EXPECTED OUTCOME

The declared goal of this PhD thesis is to lever-

age monoscopic Visual Odometry to broad appliance.

From our point of view the key to that is the solution

of the scale ambiguity. Therefore the expected result

of this PhD project is a monoscopic Visual Odometry

framework with very small drift due to simple addi-

tional sensors and the correct estimation of the metric

scale. This will allow to instantiate full-ﬂedged Vi-

sual Odometry algorithms using a monocular camera

system, which is a favorable platform on account of

its ease of use and low price as well as its broad appli-

cability. Utilizing this potential, a variety of applica-

tions can be put into practice, which could comprise:

Algorithm 1: Algorithm for estimating the cyclist po-

sition by a Linear Least Squares method with latent

variables.

Ensure: t {Vector of time increments, size n}

Ensure: w {Vector of directions camera to cyclist,

size n}

Ensure: P {Vector of camera position in global co-

ordinates, size n}

Ensure: r {Vector of distances of camera to cyclist,

less than n valid values}

M ← zero matrix of R

6x6

C ←



0 0 0 0 0 0



I ← identity of R

3x3

for i ← 1 . . . n do

M ← M +



I t(i)I

t(i)I t(i)



C ← C +



P(i)

t(i)P(i)



if Observed then

C ← C +



r(i)w(i)

t(i)r(i)w(i)



else

WW ← w(i) · w(i)

M ← M −



WW t(i)WW

t(i)WW t(i)

·W



C ← C −



WW · P(i)

t(i) ·WWP(i)



end if

end for

XV ← M

−1

C {6d target state}

• The trajectory estimation of an endoscopic system

from which objects in the scenery, such as medi-

cal anomalies, can be observed and reconstructed.

Since inside the human body the camera would

be placed in a dense environment a single-laser-

beam range ﬁnder could sufﬁce for determining

the scale.

• Establishing a high precision and low-cost naviga-

tion system by fusing monoscopic Visual Odom-

etry with GNSS data, which would be applicable

in autonomous driving, general robotics and en-

hanced reality.

A very important application, a lifesaving cyclist

detection and tracking system, is already on the edge

of realization. We will establish and test this system

within the next year as a part of the ABALID project

of the Federal Ministry of Education and Research of

Germany (ABALID, 2014).

MonoscopicAutomotiveEgo-motionEstimationandCyclistTracking

Figure 5: Comparison of the cyclist tracking with the Un-

scented Kalman Filter on top and the Linear Least Squares

method with latent variables at the bottom at a range mea-

surement frequency of 1 Hz and a camera measurement fre-

quency of 10 Hz. The Linear Least Squares method tends

to smoother results. However, at z = −130 the Linear Least

Squares method fails because of an inaccurate camera mea-

surement, the UKF however can continue. For high range

measurement frequencies the tracking with the UKF seems

more appropriate, for lower frequencies the Linear Least

Squares is the method of choice.

REFERENCES

ABALID (2014). Homepage of the abalid project.

http://abalid.de/. Accessed: 2014-12-10.

Calonder, M., Lepetit, V., Strecha, C., and Fua, P. (2010).

Brief: Binary robust independent elementary fea-

tures. In Computer Vision–ECCV 2010, pages 778–

792. Springer.

Engel, J., Sch

ops, T., and Cremers, D. (2014). Lsd-slam:

Large-scale direct monocular slam. In Computer

Vision–ECCV 2014, pages 834–849. Springer.

Forster, C., Pizzoli, M., and Scaramuzza, D. (2014). Svo:

Fast semi-direct monocular visual odometry. In Proc.

IEEE Intl. Conf. on Robotics and Automation.

Fraundorfer, F. and Scaramuzza, D. (2012). Visual odom-

etry: Part ii: Matching, robustness, optimization,

and applications. Robotics & Automation Magazine,

IEEE, 19(2):78–90.

Geiger, A., Ziegler, J., and Stiller, C. (2011). Stereoscan:

Dense 3d reconstruction in real-time. In IEEE Intelli-

gent Vehicles Symposium, Baden-Baden, Germany.

Hartley, R. and Zisserman, A. (2010). Multiple View Geom-

etry. Cambridge University Press, 7th edition.

Hartley, R. I. (1997). In defense of the eight-point algo-

rithm. Pattern Analysis and Machine Intelligence,

IEEE Transactions on, 19(6):580–593.

Klein, G. and Murray, D. (2007). Parallel tracking and map-

ping for small ar workspaces. In Mixed and Aug-

mented Reality, 2007. ISMAR 2007. 6th IEEE and

ACM International Symposium on, pages 225–234.

IEEE.

Scaramuzza, D. and Fraundorfer, F. (2011). Visual odom-

etry [tutorial]. Robotics & Automation Magazine,

IEEE, 18(4):80–92.

Tian, W. and Lauer, M. (2014). Fast and robust cyclist de-

tection for monocular camera systems.

Wan, E. A. and Van Der Merwe, R. (2000). The unscented

kalman ﬁlter for nonlinear estimation. In Adaptive

Systems for Signal Processing, Communications, and

Control Symposium 2000. AS-SPCC. The IEEE 2000,

pages 153–158. IEEE.

VISIGRAPP2015-DoctoralConsortium