To summarize, our main contributions are:
• A direct stereo VO method using lines running in
real-time.
• A novel efficient line detection algorithm aided by
an IMU, that detects vertical lines.
• A fast line matching technique using a lightweight
line descriptor.
2 RELATED WORK
A widely used approach to estimate the pose of a
camera using the image data only is to use feature
points in images. Feature points are detected and
matched between subsequent images and used to es-
timate the pose of the camera. A popular monocu-
lar point-based SLAM-approach is PTAM (Klein and
Murray, 2007). It runs in real-time and is specifically
designed to track a hand-held camera in a small aug-
mented reality workspace. However, there also exist
adoptions tailored at large-scale environments ((Mei
et al., 2010), (Weiss et al., 2013)). As it needs to de-
tect and match feature points, the environment has to
contain sufficient texture suited for the feature point
detection algorithm. Additionally, as it is a monocular
approach, it has difficulties in handling pure rotations
and cannot estimate a correct scale of the scene. A
feature-based approach, which uses a stereo camera
as input and therefore does not have this problems is
Libviso (Geiger et al., 2011). However, as it also uses
explicitly detected feature points, the texture needs to
be adequate for the detection algorithm. Additionally,
as it is not a SLAM method like PTAM but just a VO
method, it does not compute a global map of the en-
vironment but just computes the relative pose from
frame to frame.
Instead of using points, also lines can be detected
and matched in order to compute the pose of a cam-
era. In (Elqursh and Elgammal, 2011), Elqursh et al.
propose to estimate the relative pose of two cameras
by using three lines having a special primitive config-
uration. This has the advantage that no texture at all
needs to be present, but just a special line configura-
tion has to exist. However, as explicit line detection
is relatively slow, this approach cannot be used on a
low-end onboard computer.
Recently, direct pose estimation algorithms be-
came popular. In comparison to feature point-based
approaches, direct approaches do not explicitly detect
and match features and compute the pose using the
feature matches, but compute the new pose by mini-
mizing the photometric error over the whole or over
big parts of the image. For example, DTAM (New-
combe et al., 2011) tracks the pose of a monocular
camera given a dense model of the scene, by minimiz-
ing the photometric error of the current image accord-
ing to the whole model. As this is a computational
expensive task, a high-performance GPU is necessary
to compute the pose in real-time. Similarly, LSD-
SLAM (Engel et al., 2014) computes the pose by min-
imizing the photometric error. However, to reduce
computational cost, it just uses areas in the image
where the image gradient is sufficiently high. There-
fore, it runs in real-time even on hand-held devices
like smartphones. Also an extension of LSD-SLAM
to a stereo setup has been proposed recently (Engel
et al., 2015). In (Forster et al., 2014), an approach
is proposed which uses both explicit feature point
matching and direct alignment and is therefore called
semi-direct visual odometry (SVO). It detects feature
points at keyframes and computes the poses of im-
ages between keyframes by minimizing the photomet-
ric error of patches around the feature points. It runs
very fast even on onboard computers and is specif-
ically designed for a downwards looking camera of
a micro aerial vehicle. In summary, direct methods
perform direct image alignment to estimate the pose
of a new frame with respect to previously computed
3D information by optimizing the pose parameters di-
rectly and by minimizing a photometric error. This is
in contrast to feature-based methods, where features
are detected and matched to compute the fundamen-
tal matrix, and optimization is just optionally used to
refine the computed poses by using the reprojection
error.
Our method is a direct method, but utilizes a stereo
camera rig instead of a monocular camera. In contrast
to all previous related work, we detect lines and esti-
mate the pose by minimizing the photometric error
of patches around the lines. Using our line detector,
it is possible to detect lines very fast in constrained
images. As in man-made environments most of the
structure consists of lines, using patches around lines
is usually sufficient to directly align consecutive im-
ages.
3 DIRECT VISUAL ODOMETRY
BASED ON VERTICAL LINES
In our approach, we explicitly use the structures
contained in man-made environments to introduce a
novel visual odometry approach.
Instead of directly aligning the whole image or
explicitly detecting and matching feature points, we
estimate the camera pose by aligning patches just
around detected lines by minimizing the photometric
error. In case no lines can be detected, feature points
Direct Stereo Visual Odometry based on Lines
477