Mean BoF per Quadrant - Simple and Effective Way to Embed Spatial Information in Bag of Features

Joan Sosa-Garcia; Francesca Odone

doi:10.5220/0005281002970304

Mean BoF per Quadrant - Simple and Effective Way to Embed Spatial Information in Bag of Features

Joan Sosa-Garcia, Francesca Odone

2015

Abstract

This paper proposes a new approach for embedding spatial information into a Bag of Features image descriptor, primarily meant for image retrieval. The method is conceptually related to Spatial Pyramids but instead of requiring fixed and arbitrary sub-regions where to compute region-based BoF, it relies on an adaptive procedure based on multiple partitioning of the image in four quadrants (the NE, NW, SE, SW regions of the image). To obtain a compact and efficient description, all BoF related to the same quadrant are averaged, obtaining four descriptors which capture the dominant structures of the main areas of the image, and then concatenated. The computational cost of the method is the same as BoF and the size of the descriptor comparable to BoF, but the amount of spatial information retained is considerable, as shown in the experimental analysis carried out on benchmarks.

References

Agarwal, S., Snavely, N., Simon, I., Seitz, S. M., and Szeliski, R. (2009). Building rome in a day. In ICCV, pages 72-79. IEEE.
Bouchard, G. and Triggs, B. (2005). Hierarchical partbased visual object categorization. In CVPR, volume 1, pages 710-715. IEEE.
Boureau, Y., Le Roux, N., Bach, F., Ponce, J., and LeCun, Y. (2011). Ask the locals: multi-way local pooling for image recognition. In ICCV, pages 2651-2658. IEEE.
Chum, O., Philbin, J., Sivic, J., Isard, M., and Zisserman, A. (2007). Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval. In ICCV, pages 1-8.
Crandall, D. J., Backstrom, L., Huttenlocher, D., and Kleinberg, J. (2009). Mapping the world's photos. In Proc. WWW, pages 761-770. ACM.
Csurka, G., Dance, C., Fan, L., Willamowski, J., and Bray, C. (2004). Visual categorization with bags of keypoints. In (SLCV, ECCV 2004, volume 1, page 22.
Dai, L., Yue, H., Sun, X., and Wu, F. (2012). Imshare: instantly sharing your mobile landmark images by search-based reconstruction. In Proc. MM.
Fanello, S., Noceti, N., Ciliberto, C., Metta, G., and Odone, F. (2014). Ask the image: supervised pooling to preserve feature locality. In CVPR.
Feng, J., Ni, B., Tian, Q., and Yan, S. (2011). Geometric p-norm feature pooling for image classification. In CVPR, pages 2609-2704. IEEE.
Gherardi, R., Toldo, R., Garro, V., and Fusiello, A. (2011). Automatic camera orientation and structure recovery with samantha. ISPRS, pages 38-5.
Hoàng, N. V., Gouet-Brunet, V., Rukoz, M., and Manouvrier, M. (2010). Embedding spatial information into image content description for scene retrieval. Pattern Recognition, 43(9):3013-3024.
Jaakkola, T. and Haussler, D. (1999). Exploiting generative models in discriminative classifiers. NIPS, pages 487- 493.
Jégou, H. and Chum, O. (2012). Negative evidences and co-occurences in image retrieval: The benefit of pca and whitening. In ECCV, pages 774-787.
Jegou, H., Douze, M., and Schmid, C. (2008). Hamming embedding and weak geometric consistency for large scale image search. In ECCV, pages 304-317. Springer.
Jégou, H., Douze, M., and Schmid, C. (2010a). Improving bag-of-features for large scale image search. IJCV, 87(3):316-336.
Jegou, H., Douze, M., and Schmid, C. (2011). Product quantization for nearest neighbor search. PAMI, IEEE Trans., 33(1):117-128.
Jégou, H., Douze, M., Schmid, C., and Pérez, P. (2010b). Aggregating local descriptors into a compact image representation. In CVPR, pages 3304-3311. IEEE.
Jégou, H., Perronnin, F., Douze, M., Schmid, C., et al. (2012). Aggregating local image descriptors into compact codes. PAMI, IEEE Tran. on, 34(9):1704-1716.
Kuo, Y.-H., Chen, K.-T., Chiang, C.-H., and Hsu, W. H. (2009). Query expansion for hash-based image object retrieval. In Proc. MM, pages 65-74. ACM.
Lazebnik, S., Schmid, C., and Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR, volume 2, pages 2169-2178. IEEE.
Liu, Y., Zhang, D., Lu, G., and Ma, W.-Y. (2007). A survey of content-based image retrieval with high-level semantics. Pattern Recognition, 40(1):262-282.
Mbanya, E., Gerke, S., and Ndjiki-Nya, P. (2011). Spatial codebooks for image categorization. In ICMR, page 50. ACM.
Nister, D. and Stewenius, H. (2006). Scalable recognition with a vocabulary tree. In CVPR, volume 2, pages 2161-2168. IEEE.
Perronnin, F., Sánchez, J., and Mensink, T. (2010). Improving the fisher kernel for large-scale image classification. In ECCV, pages 143-156. Springer.
Philbin, J., Chum, O., Isard, M., Sivic, J., and Zisserman, A. (2007). Object retrieval with large vocabularies and fast spatial matching. In CVPR, pages 1-8. IEEE.
Philbin, J., Chum, O., Isard, M., Sivic, J., and Zisserman, A. (2008). Lost in quantization: Improving particular object retrieval in large scale image databases. In CVPR, pages 1-8. IEEE.
Rui, Y., Huang, T. S., and Chang, S.-F. (1999). Image retrieval: Current techniques, promising directions, and open issues. Journal of visual communication and image representation, 10(1):39-62.
Savarese, S., Winn, J., and Criminisi, A. (2006). Discriminative object class models of appearance and shape by correlatons. In CVPR, volume 2, pages 2033-2040.
Sivic, J., Russell, B. C., Efros, A. A., Zisserman, A., and Freeman, W. T. (2005). Discovering objects and their location in images. In ICCV, pages 370-377.
Sivic, J. and Zisserman, A. (2003). Video google: A text retrieval approach to object matching in videos. In ICCV, pages 1470-1477. IEEE.
Wang, W., Luo, Y., and Tang, G. (2008). Object retrieval using configurations of salient regions. In CIVR, pages 67-74. ACM.
Wu, X., Hu, S., Li, Z., Tang, Z., Li, J., and Zhao, J. (2014). Comparisons of threshold ezw and spiht wavelets based image compression methods. TELKOMNIKA.
Yang, J., Yu, K., Gong, Y., and Huang, T. (2009). Linear spatial pyramid matching using sparse coding for image classification. In CVPR, pages 1794-1801. IEEE.
Yang, L., Meer, P., and Foran, D. J. (2007). Multiple class segmentation using a unified framework over meanshift patches. In CVPR, pages 1-8. IEEE.
Yuan, J., Wu, Y., and Yang, M. (2007). Discovery of collocation patterns: from visual words to visual phrases. In CVPR, pages 1-8. IEEE.
Zhou, W., Lu, Y., Li, H., Song, Y., and Tian, Q. (2010). Spatial coding for large scale partial-duplicate web image search. In Proc. ICMM, pages 511-520. ACM.
Zobel, J., Moffat, A., and Ramamohanarao, K. (1998). Inverted files versus signature files for text indexing. ACMTDS, 23(4):453-490.

Download

Paper Citation

in Harvard Style

Sosa-Garcia J. and Odone F. (2015). Mean BoF per Quadrant - Simple and Effective Way to Embed Spatial Information in Bag of Features . In Proceedings of the 10th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2015) ISBN 978-989-758-090-1, pages 297-304. DOI: 10.5220/0005281002970304

in Bibtex Style

@conference{visapp15,
author={Joan Sosa-Garcia and Francesca Odone},
title={Mean BoF per Quadrant - Simple and Effective Way to Embed Spatial Information in Bag of Features},
booktitle={Proceedings of the 10th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2015)},
year={2015},
pages={297-304},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005281002970304},
isbn={978-989-758-090-1},
}

in EndNote Style

TY - CONF
JO - Proceedings of the 10th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2015)
TI - Mean BoF per Quadrant - Simple and Effective Way to Embed Spatial Information in Bag of Features
SN - 978-989-758-090-1
AU - Sosa-Garcia J.
AU - Odone F.
PY - 2015
SP - 297
EP - 304
DO - 10.5220/0005281002970304