the memory space takes up by the database can be
drastically reduced.
Finally, combining these two algorithms may be
time consuming. Real-time processing is maintained
through parallel computing and a specific thread man-
agement.
Roadmap. Details on LUKE and LKE algorithms
are given in §2 and in §3 respectively. In §4, we de-
scribe how these algorithms are combined together.
Experimental results on real data are reported in §5.
Finally, we give our conclusions and discuss further
work in §6.
2 LOCALIZATION IN AN
UNKNOWN ENVIRONMENT
LUKE algorithms are used when no a priori on the
observed environment is available. The environment
and the trajectory of the moving camera are simulta-
neously reconstructed from a video. The main draw-
back of LUKE algorithms is the unavoidable drift due
to accumulation of errors and the scale ambiguity.
Various approaches have been proposed for real-time
localization in an unknown environment. They can
be classified into two major types: local bundle ad-
justment (Mouragnon et al., 2006; Nister et al., 2004)
and Kalman filter (Davison, 2003) based algorithms.
We rather used the former approach since it has been
proved to be more accurate, see (Klein and Murray,
2007). We describe briefly the solution proposed in
(Mouragnon et al., 2006) which is used in our experi-
ments. A triplet of images is firstly selected to set up
the world coordinate frames and the initial geometry.
After this initialization, robust pose estimation is car-
ried out for each frame of the video using points de-
tection and matching. Note that in our experiments,
we used Harris corners (Harris and Stephens, 1988)
detector and SURF descriptors (Bay et al., 2008). A
crucial point described in (Mouragnon et al., 2006) is
that 3D points are not reconstructed for all the frames.
Specific ones are selected as key-frames and are used
for triangulation. A key-frame is chosen when the
motion is sufficiently large to accurately compute the
3D positions of matched points but not too much to
keep matching. The system operates in an incremen-
tal way, and when a new key-frame and 3D points
are added, it proceeds to a local bundle adjustment:
cameras associated to the latest key-frames
1
and 3D
1
the three latest key frames are updated in (Mouragnon
et al., 2006).
points they observed are updated. This algorithm is
summarized in figure 1 (Thread # 1).
3 LOCALIZATION IN A KNOWN
ENVIRONMENT
LKE algorithms are used when a priori on the ob-
served environment is available. We concentrate on
algorithms which use a 3D point cloud as model. It
is built through a learning stage that associates accu-
rate 3D positions to images covering the considered
environment. Each image is also resumed in a set of
interest points with their descriptors (about 100-500
points with their SURF descriptors for each image)
and their corresponding 3D points in the scene. All
this information is saved in a database. The online lo-
calization process consists in comparing the observed
image of the scene to all images of the database using
their descriptors. The most similar image, i.e. with
the highest correlation score, should correspond to
the currently viewed scene. The camera pose is then
computed using the 3D points observed in this image.
As the covered environment grows, it becomes im-
possible to compare, in a systematic way, the query
image with all images of the database. Therefore, a
vocabulary tree structure is used to speed-up this re-
trieval step. This structure has proved to be very effi-
cient even for very large database (more than 100000
images) (Arnold et al., 2009; Nister and .Stewenius,
2006; Schindler et al., 2007). It is a hierarchical tree
(with branching factor k and l levels) storing descrip-
tors by similarity in such a way that an exhaustive
search can be done in only k ∗ l descriptor compar-
isons (done with the L1-distance). Therefore, this per-
mits a quick comparison between a descriptor from
the query image and the whole set of descriptors of the
database. In detail, for each query image, we extract
about 400 interest points with their SURF descriptors.
Our vocabulary tree has 6 levels and a branching fac-
tor of 10. Hence, for each descriptor of the query im-
age, only 60 comparisons are computed.
4 COMBINING LOCALIZATION
IN KNOWN AND UNKNOWN
ENVIRONMENTS
4.1 Overview
In this section, we describe how the algorithms pre-
sented above are combined together. The idea is to
A VISION-BASED HYBRID SYSTEM FOR REAL-TIME ACCURATE LOCALIZATION IN AN INDOOR
ENVIRONMENT
217