TEMPORAL BAG-OF-WORDS - A Generative Model for Visual Place Recognition using Temporal Integration

Hervè Guillaume; Mathieu Dubois; Emmanuelle Frenoux; Philippe Tarroux

doi:10.5220/0003353202860295

TEMPORAL BAG-OF-WORDS - A Generative Model for Visual Place Recognition using Temporal Integration

Hervè Guillaume, Mathieu Dubois, Emmanuelle Frenoux, Philippe Tarroux

2011

Abstract

This paper presents an original approach for visual place recognition and categorization. The simple idea behind our model is that, for a mobile robot, use of the previous frames, and not only the one, can ease recognition. We present an algorithm for integrating the answers from different images. In this perspective, scenes are encoded thanks to a global signature (the context of a scene) and then classified in an unsupervised way with a Self-Organizing Map. The prototypes form a visual dictionary which can roughly describe the environment. A place can then be learnt and represented through the frequency of the prototypes. This approach is a variant of Bag-of-Words approaches used in the domain of scene classification with the major difference that the different “words” are not taken from the same image but from temporally ordered images. Temporal integration allows us to use Bag-of-Words together with a global characterization of scenes. We evaluate our system with the COLD database. We perform a place recognition task and a place categorization task. Despite its simplicity, thanks to temporal integration of visual cues, our system achieves state-of-the-art performances.

References

Csurka, G., Bray, C., Dance, C., and Fan, L. (2004). Visual categorization with bags of keypoints. In Workshop on Statistical Learning in Computer Vision, ECCV, pages 1-22.
Douze, M., Jégou, H., Sandhawalia, H., Amsaleg, L., and Schmid, C. (2009). Evaluation of gist descriptors for web-scale image search. In International Conference on Image and Video Retrieval. ACM.
Fei-Fei, L. and Perona, P. (2005). A bayesian hierarchical model for learning natural scene categories. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, volume 2, pages 524- 531.
Filliat, D. (2008). Interactive learning of visual topological navigation. In Proceedings of the 2008 IEEE International Conference on Intelligent Robots and Systems (IROS 2008).
Filliat, D. and Meyer, J.-A. (2003). Map-based navigation in mobile robots - i. a review of localisation strategies. Journal of Cognitive Systems Research, 4(4):243-282.
Gokalp, D. and Aksoy, S. (2007). Scene classification using bag-of-regions representations. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2007), pages 1-8, Minneapolis, USA.
Guillaume, H., Denquive, N., and Tarroux, P. (2005). Contextual priming for artificial visual perception. In European Symposium on Artificial Neural Networks (ESANN 2005), pages 545-550, Bruges, Belgium.
Kohonen, T. (1990). Improved versions of learning vector quantization. In International Joint Conference on Neural Networks, pages 545-550.
Lazebnik, S., Schmid, C., and Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Association, I., editor, IEEE Conference on Computer Vision and Pattern Recognition, volume II, pages 2169-2178, New York.
Mozos, O. M. (2008). Semantic Place Labeling with Mobile Robots. PhD thesis, University of Freiburg, Freiburg, Germany.
Mozos, O. M., Stachniss, C., and Burgard, W. (2005). Supervised learning of places from range data using adaboost. In IEEE International Conference on Robotics and Automation, volume 2.
Ng, A. Y. and Jordan, M. I. (2002). On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes. In Advances in Neural Information Processing Systems, volume 14. MIT Press.
Ni, K., Kannan, A., Criminisi, A., and Winn, J. (2009). Epitomic location recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(12):2158-2167.
Ojala, T., Pietikäinen, M., and Mäenpää, T. (2002). Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 971-987.
Oliva, A. and Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3):145-175.
Orabona, F., Castellini, C., Caputo, B., Luo, J., and Sandini, G. (2007). Indoor place recognition using online independent support vector machines. In Proceeding of the British Machine Vision Conference (BMVC 2007), pages 1090-1099, Warwick, UK.
Pronobis, A. and Caputo, B. (2007). Confidence-based cue integration for visual place recognition. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2007), San Diego, CA, USA.
Pronobis, A. and Caputo, B. (2009). Cold: Cosy localization database. The International Journal of Robotics Research, 28(5).
Pronobis, A., Caputo, B., Jensfelt, P., and Christensen, H. I. (2006). A discriminative approach to robust visual place recognition. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2006), pages 3829-3836, Beijing, China.
Pronobis, A., Mozos, O. M., Caputo, B., and Jenseflt, P. (2010). Multi-modal semantic place classification. The International Journal of Robotics Research, 29(2- 3):298-320.
Quattoni, A. and Torralba, A. (2009). Recognizing indoor scenes. In IEEE Conference on Computer Vision and Pattern Recognition.
Torralba, A. (2003). Contextual priming for object detection. International Journal of Computer Vision, 53(2):169-191.
Torralba, A., Murphy, K., Freeman, W., and Rubin, M. (2003). Context-based vision system for place and object recognition. Technical report, Cambridge, MA.
Ullah, M. M., Pronobis, A., Caputo, B., Luo, J., Jensfelt, P., and Christensen, H. I. (2008). Towards robust place recognition for robot localization. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA 2008), Pasadena, USA.
Vasudevan, S. and Siegwart, R. (2008). Bayesian space conceptualization and place classification for semantic maps in mobile robotics. Robotics and Autonomous Systems, 56(6):522-537.
Walker, L. and Malik, J. (2004). When is scene identification just texture recognition? Vision Research, 44:23012311.
Wu, J., Christensen, H., and Rehg, J. (2009). Visual place categorization: Problem, dataset, and algorithm. In IEEE/RSJ International Conference on Intelligent Robots and Systems, 2009 (IROS 2009), pages 4763- 4770, St. Louis, USA. IEEE.

Download

Paper Citation

in Harvard Style

Guillaume H., Dubois M., Frenoux E. and Tarroux P. (2011). TEMPORAL BAG-OF-WORDS - A Generative Model for Visual Place Recognition using Temporal Integration . In Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2011) ISBN 978-989-8425-47-8, pages 286-295. DOI: 10.5220/0003353202860295

in Bibtex Style

@conference{visapp11,
author={Hervè Guillaume and Mathieu Dubois and Emmanuelle Frenoux and Philippe Tarroux},
title={TEMPORAL BAG-OF-WORDS - A Generative Model for Visual Place Recognition using Temporal Integration},
booktitle={Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2011)},
year={2011},
pages={286-295},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003353202860295},
isbn={978-989-8425-47-8},
}

in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2011)
TI - TEMPORAL BAG-OF-WORDS - A Generative Model for Visual Place Recognition using Temporal Integration
SN - 978-989-8425-47-8
AU - Guillaume H.
AU - Dubois M.
AU - Frenoux E.
AU - Tarroux P.
PY - 2011
SP - 286
EP - 295
DO - 10.5220/0003353202860295