ego-motion estimation in real time.
While in the previously mentioned approaches the
unscaled ego-motion can be computed accurately, the
scale is roughly estimated and held fix. Therefore the
focus lies on reducing the scale drift. For example
the popular Parallel Tracking and Mapping (Klein and
Murray, 2007) framework lets the user move the cam-
era about 10 cm to the right during the initialization
and fixes the scale afterwards. Geiger’s approach uses
another principle. Given the height over ground and
the inclination of the camera to the ground plane it es-
timates the scale frame to frame by reconstructing the
soil, which then includes the scale drift. We want to
follow this approach while releasing the constraint of
a fixed inclination angle of the camera.
3 OUTLINE OF OBJECTIVES
3.1 Main Objective and Challenge
We want to research the possibility to estimate the
scale of a monocular trajectory by means of scene
understanding and, if necessary, by the aid of exter-
nal sensors like a single-row LIDAR system. The
aim is to precisely estimate the driven metric trajec-
tory. Hereby not only the estimation of the scale itself
poses a challenge but also its drift which is impor-
tant since the trajectory is a concatenation of relative
motion which underlies uncertainties. Many algo-
rithms as ”Fast Semi-Direct Monocular Visual Odom-
etry” (Forster et al., 2014) and ”Large-Scale Direct
Monocular SLAM” (Engel et al., 2014) developed
sophisticated algorithms to reduce the drift as far as
possible so that the scale can be fixed once and has
no need for modification afterwards. We want to ap-
proach the problem from another perspective: If it
would be possible to calculate the scale from frame
to frame the scale-drift would be eliminated.
3.2 Why Monocular Vision?
The advantages of using a monocular system are man-
ifold. Firstly, it is a very inexpensive sensor setup - the
camera and its optics are low-cost in comparison with
multilayer laser scanners. Furthermore, a monocular
camera setup is a lot more robust than for example a
stereoscopic one, since the latter requires an accurate
calibration which might be lost even due to small me-
chanical shocks.
Moreover, the main problem of monocular sys-
tems, i.e. the unobservability of the scale, is at the
same time a big advantage. In the image space there
is no difference between small motion in a dense en-
vironment as for example the image of an endoscopic
system in a vein and an UAV that observes the earth
from large distances with high velocities. The critical
parameter is the ratio between the mean scene depth
and the velocity of the camera. Hereby the focus of
our research field, automotive application, is particu-
larly challenging because this ratio can be very high.
On the other hand, the application on cars has
the advantage that it is possible to make assump-
tions about the environment. In general the height
over ground of the camera position is constant due
to the planar movement of the vehicle. This allows
to estimate the scale of the trajectory by modeling the
ground plane and comparing the image-space height
over ground with the real-world height. However, this
is only possible in areas of clearly identifiable streets
where the ground plane is dominant in the image.
In sceneries with dense traffic we have to rely on
other assumptions. Humans can deduce their move-
ment with only one eye using their knowledge about
the real size of objects in the real world. Analo-
gously we could detect objects like cars, cyclists or
road markings in the image and deduce a prior for the
scaling estimation from that.
Another research direction is the use of external
sensors, for example a LIDAR system, which could
deduce depth information from the scene or even
global localization methods like the Global Naviga-
tion Satellite System (GNSS) could serve as a source
of scale information. Once having deduced the mo-
tion and the scale of the scene, it can be reconstructed
by classical methods of Structure from Motion (Hart-
ley and Zisserman, 2010, p. 312) which allows fur-
ther applications in scene understanding. Regarding
this project the trajectory of the ego-motion will be
combined with a cyclist detection and its tracking to
predict collisions.
4 STAGE OF THE RESEARCH
4.1 Scale Estimation
In a first step we want to focus on the estimation of
the scale only. Therefore we choose an existing, very
efficient algorithm for the unscaled ego-motion esti-
mation as a baseline, i.e. ”Stereo Scan” (Geiger et al.,
2011). This shall be considered as a first attempt to
get a grip on the ego-motion estimation. More so-
phisticated algorithms are to be evaluated. We can
split up the scale estimation into estimation from a
priori knowledge about scene inherent features and
scale estimation with sensors different than cameras,
VISIGRAPP2015-DoctoralConsortium
38