topology and geometry, humans show a capability for recalling memorised scenes that
help themselves to navigate. This implies that humans have a sort of visual memory
that can help them locate themselves in a large environment. There is also experimental
evidence to suggest that very simple animals like bees and ants use visual memory to
move in very large environments [3]. From these considerations, a new approach to the
navigation and localization problem developed, namely, image-based navigation. The
robotic agent is provided with a set of views of the environment taken at various loca-
tions. These locations are called reference locations because the robot will refer to them
to locate itself in the environment. The corresponding images are called reference im-
ages. When the robot moves in the environment, it can compare the current view with
the reference images stored in its visual memory. When the robot finds which one of
the reference images is more similar to the current view, it can infer its position in the
environment. If the reference positions are organised in a metrical map, an approximate
geometrical localization can be derived. With this technique, the problem of finding the
position of the robot in the environment is reduced to the problem of finding the best
match for the current image among the reference images. The problem now is how to
store and to compare the reference images, which for a wide environment can be a large
number. In order to store and match a large number of images efficiently, it has been
shown in [9] the transformation of omnidirectional views into a compact representa-
tion by expanding it into its Fourier series. The agent memorises each view by storing
the Fourier coefficients of the low frequency components. This drastically reduces the
amount of memory required to store a view at a reference location. Matching the current
view against the visual memory is computationally inexpensive with this approach.
We show that a further reduction in memory requirements and computations can be
met by using log-polar images, obtained by a retina-like sensor, without any loss in the
discriminatory power of the methods.
2 Materials
2.1 Omnidirectional Retinal Sensor
The retina-like sensor used in this work is the Giotto camera developed by Lira-Lab at
the University of Genova [11] [12] and by the Unitek Consortium [4]. It is built using
the 35µm CMOS technology, and arranging the photosensitive elements in a log-polar
geometry. A constant number of elements is placed on concentric rings, so that the size
of these elements necessarily decreases from the periphery toward the center. This kind
of geometric arrangement has a singularity in the origin, where the element dimension
would shrink to zero. Since this dimension is constrained by the building technology
used, there is a ring from which no dimension decrement is possible for accomodat-
ing a constant number sensitive elements. Hence, the area inside this limiting ring does
not show a log-polar geometry in the arrangement of the elements, but is nevertheless
designed to preserve the polar structure of the sensor and at the same time tessellate
the area with pixels of the same size. This internal region will be called the fovea of
the sensor for its analogy with the fovea in the animal retina, whereas the region with
constant number of pixels per ring will be called periphery.
The periphery is composed by N
per
= 110 rings with M = 252 pixels each, and the
88