On Reducing the Number of Visual Words in the Bag-of-Features
Representation

Giuseppe Amato; Fabrizio Falchi; Claudio Gennaro

doi:10.5220/0004290506570662

On Reducing the Number of Visual Words in the Bag-of-Features Representation

Giuseppe Amato, Fabrizio Falchi, Claudio Gennaro

2013

Abstract

A new class of applications based on visual search engines are emerging, especially on smart-phones that have evolved into powerful tools for processing images and videos. The state-of-the-art algorithms for large visual content recognition and content based similarity search today use the “Bag of Features” (BoF) or “Bag of Words” (BoW) approach. The idea, borrowed from text retrieval, enables the use of inverted files. A very well known issue with this approach is that the query images, as well as the stored data, are described with thousands of words. This poses obvious efficiency problems when using inverted files to perform efficient image matching. In this paper, we propose and compare various techniques to reduce the number of words describing an image to improve efficiency and we study the effects of this reduction on effectiveness in landmark recognition and retrieval scenarios. We show that very relevant improvement in performance are achievable still preserving the advantages of the BoF base approach.

References

Amato, G. and Falchi, F. (2011). Local feature based image similarity functions for kNN classfication. In Proceedings of the 3rd International Conference on Agents and Artificial Intelligence (ICAART 2011), pages 157-166. SciTePress. Vol. 1.
Amato, G., Falchi, F., and Gennaro, C. (2011). Geometric consistency checks for knn based image classification relying on local features. In SISAP 7811: Fourth International Conference on Similarity Search and Applications, SISAP 2011, Lipari Island, Italy, June 30 - July 01, 2011, pages 81-88. ACM.
Douze, M., Jégou, H., Sandhawalia, H., Amsaleg, L., and Schmid, C. (2009). Evaluation of gist descriptors for web-scale image search. In Proceedings of the ACM International Conference on Image and Video Retrieval, CIVR 7809, pages 19:1-19:8, New York, NY, USA. ACM.
Jegou, H., Douze, M., and Schmid, C. (2009). Packing bagof-features. In Computer Vision, 2009 IEEE 12th International Conference on, pages 2357 -2364.
Jégou, H., Douze, M., and Schmid, C. (2010). Improving bag-of-features for large scale image search. Int. J. Comput. Vision, 87:316-336.
Jégou, H., Perronnin, F., Douze, M., Sánchez, J., Pérez, P., and Schmid, C. (2012). Aggregating local image descriptors into compact codes. IEEE Transactions on Pattern Analysis and Machine Intelligence. QUAERO.
Lowe, D. G. (2004). Distinctive image features from scaleinvariant keypoints. International Journal of Computer Vision, 60(2):91-110.
Perronnin, F., Liu, Y., Sanchez, J., and Poirier, H. (2010). Large-scale image retrieval with compressed fisher vectors. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 3384 -3391.
Philbin, J., Chum, O., Isard, M., Sivic, J., and Zisserman, A. (2007). Object retrieval with large vocabularies and fast spatial matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
Salton, G. and McGill, M. J. (1986). Introduction to Modern Information Retrieval. McGraw-Hill, Inc., New York, NY, USA.
Sivic, J. and Zisserman, A. (2003). Video google: A text retrieval approach to object matching in videos. In Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2, ICCV 7803, pages 1470-, Washington, DC, USA. IEEE Computer Society.
Thomee, B., Bakker, E. M., and Lew, M. S. (2010). Topsurf: a visual words toolkit. In Proceedings of the international conference on Multimedia, MM 7810, pages 1473-1476, New York, NY, USA. ACM.
Zhang, X., Li, Z., Zhang, L., Ma, W.-Y., and Shum, H.- Y. (2009). Efficient indexing for large scale visual search. In Computer Vision, 2009 IEEE 12th International Conference on, pages 1103 -1110.

Download

Paper Citation

in Harvard Style

Amato G., Falchi F. and Gennaro C. (2013). On Reducing the Number of Visual Words in the Bag-of-Features Representation . In Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2013) ISBN 978-989-8565-47-1, pages 657-662. DOI: 10.5220/0004290506570662

in Bibtex Style

@conference{visapp13,
author={Giuseppe Amato and Fabrizio Falchi and Claudio Gennaro},
title={On Reducing the Number of Visual Words in the Bag-of-Features Representation},
booktitle={Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2013)},
year={2013},
pages={657-662},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004290506570662},
isbn={978-989-8565-47-1},
}

in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2013)
TI - On Reducing the Number of Visual Words in the Bag-of-Features Representation
SN - 978-989-8565-47-1
AU - Amato G.
AU - Falchi F.
AU - Gennaro C.
PY - 2013
SP - 657
EP - 662
DO - 10.5220/0004290506570662