covered trajectory. This way the pose estimation
along with its consistency to the estimated covari-
ance can be verified. As will be shown, the proposed
method is able to reliably estimate both entities con-
sistently even for large datasets.
Following this introduction previous work in the
field visual SLAM is presented and some prerequi-
sites are given. After the description of the used mod-
els an comparison with state of the art methods is
done and some experiments are discussed.
2 PREVIOUS WORK
Various kinds of estimators have been designed to
solve the task of SLAM. These range from direct esti-
mation of quadrifocal tensors (Hildebrandt and Kirch-
ner, 2010) and particle filters (
˙
Imre and Berger, 2009)
to extended Kalman Filters (Schleicher et al., 2007).
Nevertheless the most commonly used estimators are
based on the famous KF. This is due to the fact that
SLAM usually requires real time performance and the
simple integration of multiple sensors, allowing for
compensating the errors induced by the different sen-
sors types. The ability to not only estimate the system
state but also information on the estimations reliabil-
ity is another reason for its popularity.
As stated the KF is error-prone when non linear
models are used. Because of this the use of more ad-
vanced versions of the KF have been studied. The
advantages and drawbacks of these filters have been
analyzed in (Lefebvre et al., 2004). Their applica-
tions to SLAM include beside others the unscented
Kalman Filter (S
¨
underhauf et al., 2007) as well as it-
erated sigma point Kalman Filters (Song et al., 2011).
Such approaches try to overcome the drawbacks of
non linear models by using estimators more robust in
such cases. Thus the non linearities themselves are
not eliminated but more complex algorithms are used.
This results in increased computational costs and thus
less applicability to real time demands.
Apart from the used estimators the solutions to the
SLAM problem differ in the representation of land-
marks. For visual SLAM the so called Inverse Depth
(ID) representation is used mostly, e.g. see (Civera
et al., 2008),(S
¨
underhauf et al., 2007). There a land-
mark is represented by the position of the camera
from where the point was observed first, the direction
to the 3D point and its inverse distance to the cam-
era center. A drawback of this parametrization is, that
is uses a 6 dimensional representation. Since a land-
mark (point in 3D-space) only has 3 degrees of free-
dom (DOF), it is over parametrized. Thus, ambigui-
ties in the estimation can arise, since points initialized
from the same camera end up with different camera
positions. (Civera et al., 2008) have shown, that in ID
the error propagation for depth is linear under certain
assumptions. Nevertheless the models include inverse
tangent and normalization functions resulting in non
linearities not analyzed by Civera et al..
Since the ID parametrization for landmarks suf-
fers from inconsistent covariance estimation for short
distances, (Paz et al., 2008) discuss a combination
of representations. For their stereo SLAM system
they partition landmarks in far away and nearby
landmarks. For far points ID is used, and the 3D
Euclidean Space (ES) for nearby points. This im-
proves the estimation of landmarks and the system
pose, but as can be seen in their evaluation, the
parametrization still has problems covering the true
error distribution for reconstructed 3D points.
Due to the above mentioned benefits provided by
stereo camera systems, many stereo SLAM systems
have been developed, see (Paz et al., 2008), (Hilde-
brandt and Kirchner, 2010), (Schleicher et al., 2007).
The latter propose a system solely based on wide an-
gle stereo cameras and a two stage algorithm. They
create small local maps and detect close loops us-
ing SIFT-fingerprints. In (Hildebrandt and Kirchner,
2010) a system is discussed, that uses an IMU aided
estimation of the quadrifocal tensor.
In (Sol
´
a et al., 2007) the authors claim, that us-
ing two independent cameras outperforms stereo rigs.
In fact, they propose a combination of monocular and
stereo vision resulting in improved estimation. Since
this is a combination of both models, this approach
can be used to fuse most monocular and stereo mod-
els. In (Herath et al., 2006) a representation similar to
the one presented here, has been proposed for obser-
vation models. In contrast to our work they do not use
this for modeling landmarks in the SLAM system. It
was shown, that such observation models undergo a
Gaussian noise assumption.
Our Contribution. The main contribution of this
paper is to introduce a novel parametrization for land-
marks optimized for stereo SLAM algorithms using
image Points and Disparities (PD). This parametriza-
tion allows a minimal representation of a 3D land-
mark, that is, 3 parameters are used for the 3 DOF.
On the one hand this parametrization outperforms the
one in Euclidean 3-space by matching the Gaussian
noise distribution much better. On the other hand
compared to inverse depth it reduces the number of
variables to be estimated from 6 to 3 per landmark.
Moreover the resulting observation model is kept sim-
ple and the landmarks can be initialized accurately in
position and uncertainty.
A NOVEL STATE PARAMETRIZATION FOR STEREO-SLAM
145