between the nodes. In (Gaspar et al., 2000) a visual-
based navigation system is presented using an omni-
directional camera and a topological map as a rep-
resentation in structured indoor office environments.
(Frizera et al., 1998) describe a similar system which
is developed using a single camera.
In (Cummins and Newman, 2009), a SLAM sys-
tem called FAB-MAP is presented. The description of
the scenes is based on landmark extraction. Specifi-
cally, they use SURF features, and their experimental
dataset is a very large scale collection of outdoor om-
nidirectional images. Our aim is to develop a similar
system but using a wide-angle camera (which is more
economical than catadioptric or spherical vision cam-
era systems). Another difference is the kind of infor-
mation we use to describe the images, since we use
global-appearance descriptors.
The first step in our work consists in building a
map of the environment. We use a graph representa-
tion. In this representation, each node is composed of
8 wide-angle images that cover the complete field of
view from a position in the environment to map, and
the edges represent the connectivity between nodes to
estim.
In order to estimate the topological relationships
between nodes, we use the information extracted from
a set of images captured along some routes which pass
through the previously captured nodes. As a contribu-
tion of this work, we apply a multi-scale analysis of
the route and node’s images in order to increase the
similarity between them when we move away form
a node. From this analysis, we obtain both an in-
crease of correct matching of route images in the map
database, and also a measurement of the relative po-
sition of the compared scenes.
Once the map is built, as a second step we have
designed a path estimation algorithm that also takes
profit of that scale analysis to extrapolate the position
of the route scenes not only in the nodes but also in
intermediate points. The algorithm, which is also a
contribution of this work, introduces a weight func-
tion in order to improve the localization precision.
The remainder of the paper is structured as fol-
lows. Section 2 introduces the features of the
dataset used in the experimental part, and the global-
appearance descriptor selected in order to represent
the scenes. Section 3 presents the algorithm devel-
oped to build the topological map. In Section 4 we
explain the system that builds the representation of
route paths, and experimental results. Finally, in Sec-
tion 5 we summarize the main ideas obtained in this
work.
TERMINOLOGY: We use the term node to refer
to a collection of eight images captured from the same
position on the ground plane every 45
◦
, covering the
complete field of view around that position. We de-
note the collection of images of the nodes as map’s
images or database’s images. The graph that rep-
resents the topological layout of the nodes is named
map. The process of finding the topological connec-
tion between nodes and their relative position is the
map building. We call the relative position between
nodes topological distance. When we write image
distance we refer to the Euclidean distance between
the descriptors of two images. The topological dis-
tance between two images is denoted as l, and the
topological distance between nodes as c.
2 DATA SET AND DESCRIPTOR
FEATURES
In this section, we present the features of the images’
data set, and the global-appearance technique we use
in order to create a descriptor of the scenes.
The images are captured using a fisheye lens cam-
era. We choose this kind of lens due to its wide-angle
view. Specifically, the model used is the Hero2 of Go-
Pro (Woodman Labs, 2013). The angle of view of the
images is 127
◦
. Due to the fisheye lens, the scenes
present a distortion that makes it impossible to ob-
tain useful information from the images using global-
appearance descriptors, since they are based on the
spatial distribution and disposition of the elements in
the scene, and the distortion makes the elements to ap-
pear altered. For that reason, we use the Matlab Tool-
box OCamCalib in order to calibrating the camera
and computing the undistorted scenes from the origi-
nal images (Scaramuzza et al., 2006). In the reminder
of the paper, the term image refers to the undistorted
transform of the original scenes.
Since the aim of this work is to solve the problem
of place recognition using the global-appearance of
images, it is necessary to use descriptors that concen-
trate the visual information of the image as a whole,
being also interesting the robustness against illumi-
nation changes and the capacity of dealing with little
changes in the orientation of the scenes. Some works,
as (Paya et al., 2009), have compared the perfor-
mance of some global-appearance descriptors. Taking
them into account, we have decided to choose Gist-
Gabor descriptor (Torralba, 2003), (Oliva and Tor-
ralba, 2001) as it presents a good performance in im-
age retrieval when working with real indoor images.
It also shows a reasonable computational cost. With
an image’s size of 64x32 pixels, the algorithm spent
0.0442 seconds to compute the descriptor using Mat-
lab R2009b running over a 2.8 GHz Quad-Core Intel
ICINCO2013-10thInternationalConferenceonInformaticsinControl,AutomationandRobotics
386