overall algorithm. The strong size reduction and low
pass filtering of the images lead to perceptual alias-
ing. However this is rather an advantage for seman-
tic place recognition because these images keep only
the most important characteristic of the scene with re-
spect to scene recognition. Concerning the sensitivity
to illumination, our results give similar results as in
(Ullah et al., 2008).
Different ways can be used in further studies to
improve the results. A final step of fine-tuning can
be introduced using back-propagation instead of us-
ing rough features. However, using the rough fea-
tures makes the algorithm fully incremental avoiding
the adaptation to a specific domain. The strict sepa-
ration between the construction of the feature space
and the classification allows considering other classi-
fication problems sharing the same feature space. The
independence of the construction of the feature space
has another advantage: in the context of autonomous
robotics it can be seen as a developmental maturation
acquired on-line by the robot, only once, during an
exploration phase of its environment. Temporal inte-
gration is also a point that deserves to be explored in
future studies. Another point concerns the sparsity of
the obtained code. If we assume that a sparse feature
space increases the linear separability of the represen-
tation, the study of different factors acting on sparsity
would certainly improve the classification score.
So, the present approach obtains scores compara-
ble to the ones based on hand-engineered signatures
(like Gist or SIFT detectors) and more sophisticated
classification techniques like SVM. As emphasized
by (Hinton et al., 2011), it illustrates the fact that fea-
tures extracted by DBN are more promising for image
classification than hand-engineered features.
REFERENCES
Bell, A. J. and Sejnowski, T. J. (1997). Edges are the ’in-
dependent components’ of natural scenes. Vision Re-
search, 37(23):3327–3338.
Dubois, M., Guillaume, H., Frenoux, E., and Tarroux, P.
(2011). Visual place recognition using bayesian fil-
tering with markov chains. In ESANN 2011, Bruges,
Belgium.
Field, D. (1994). What is the goal of sensory coding? Neu-
ral Computation, 6:559–601.
Guillaume, H., Dubois, M., Frenoux, E., and Tarroux, P.
(2011). Temporal bag-of-words - a generative model
for visual place recognition using temporal integra-
tion. In VISAPP, pages 286–295, Vilamoura, Algarve,
Portugal. SciTePress.
Hinton, G. (2002). Training products of experts by mini-
mizing contrastive divergence. Neural Computation,
14:1771–1800.
Hinton, G. (2010). A practical guide to training re-
stricted Boltzmann machines - version 1. Technical
report, Department of Computer Science, University
of Toronto, Toronto, Canada.
Hinton, G., Krizhevsky, A., and Wang, S. (2011). Trans-
forming auto-encoders. In Artificial neural networks
and machine learning - ICANN 2011.
Hinton, G., Osindero, S., and Teh, Y. (2006). A fast learning
algorithm for deep belief nets. Neural Computation,
18:1527–1554.
Krizhevsky, A. (2009). Learning multiple layers of fea-
tures from tiny images. Master sc. thesis, Department
of Computer Science, University of Toronto, Toronto,
Canada.
Krizhevsky, A. (2010). Convolutional deep belief networks
ocifar-10. Technical report, University of Toronto,
Toronto, Canada.
Oliva, A. and Torralba, A. (2006). Building the gist of a
scene: the role of global image features in recognition.
Progress in Brain Research, 14:23–36.
Olshausen, B. and Field, D. (2004). Sparse coding of
sensory inputs. Current Opinion in Neurobiology,
14:481–487.
Pronobis, A. and Caputo, B. (2007). Confidence-base cue
integration for visual place recognition. In IROS 2007.
S. Thrun, W. B. and Fox, D. (2005). Probabilistic Robotics
(Intelligent Robotics and Autonomous Agents). MIT
Press, Cambridge, MA, 1st edition.
Smolensky, P. (1986). Information processing in dynamical
systems foundations of harmony theory. In Rumelhart,
D. and McClelland, J., editors, Parallel Distributed
Processing Explorations in the Microstructure of Cog-
nition, volume 1: Foundations. McGraw-Hill, New
York.
Torralba, A., Fergus, R., and Weiss, Y. (2008). Small codes
and large image databases for recognition. In IEEE
Conference on Computer Vision and Pattern Recogni-
tion - CVPR 08, Anchorage, AK.
Torralba, A., Murphy, K., Freeman, W., and Rubin, M.
(2003). Context-based vision system for place and ob-
ject recognition. Technical Report AI MEMO 2003-
005, MIT, Cambridge, MA.
Ullah, M. M., Pronobis, A., Caputo, B., Jensfelt, P., and
Christensen, H. (2008). Towards robust place recogni-
tion for robot localization. In IEEE International Con-
ference on Robotics and Automation (ICRA’2008),
Pasadena, CA.
Ullah, M. M., Pronobis, A., Caputo, B., Luo, J., and Jens-
felt, P. (2007). The cold database. Technical report,
CAS - Centre for Autonomous Systems. School of
Computer Science and Communication. KTH Royal
Institute of Technology, Stockholm.
Wu, J. and Rehg, J. M. (2011). Centrist: A visual descriptor
for scene categorization. IEEE Trans. Pattern Anal.
Mach. Intell., 33(8):1489–1501.
SemanticPlaceRecognitionbasedonDeepBeliefNetworksandTinyImages
241