Approximate Image Matching using Strings of Bag-of-Visual Words Representation

Hong-Thinh Nguyen, Cécile Barat, Christophe Ducottet

Abstract

The Spatial Pyramid Matching approach has become very popular to model images as sets of local bag-of words. The image comparison is then done region-by-region with an intersection kernel. Despite its success, this model presents some limitations: the grid partitioning is predefined and identical for all images and the matching is sensitive to intra- and inter-class variations. In this paper, we propose a novel approach based on approximate string matching to overcome these limitations and improve the results. First, we introduce a new image representation as strings of ordered bag-of-words. Second, we present a new edit distance specifically adapted to strings of histograms in the context of image comparison. This distance identifies local alignments between subregions and allows to remove sequences of similar subregions to better match two images. Experiments on 15 Scenes and Caltech 101 show that the proposed approach outperforms the classical spatial pyramid representation and most existing concurrent methods for classification presented in recent years.

References

  1. Ballan, L., Bertini, M., Del Bimbo, A., and Serra, G. (2010). Video event classification using string kernels. Multimedia Tools and Applications, 48(1):69-87.
  2. Battiato, S., Farinella, G., Gallo, G., and Ravì, D. (2009). Spatial hierarchy of textons distributions for scene classification. In Proceedings of the 15th International Multimedia Modeling Conference on Advances in Multimedia Modeling, MMM 7809, pages 333-343, Berlin, Heidelberg. Springer-Verlag.
  3. Boureau, Y.-L., Bach, F., LeCun, Y., and Ponce, J. (2010). Learning mid-level features for recognition. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 2559-2566. IEEE.
  4. Cao, Y., Wang, C., Li, Z., Zhang, L., and Zhang, L. (2010). Spatial-bag-of-features. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 3352-3359. IEEE.
  5. Chen, X., Hu, X., and Shen, X. (2009). Spatial weighting for bag-of-visual-words and its application in contentbased image retrieval. In Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 7809, pages 867- 874, Berlin, Heidelberg. Springer-Verlag.
  6. Christodoulakis, M. and Brey, G. (2009). Edit distance with combinations and splits and its applications in ocr name matching. International Journal of Foundations of Computer Science, 20(06):1047-1068.
  7. de Avila, S. E. F., Thome, N., Cord, M., Valle, E., and de Albuquerque Araújo, A. (2013). Pooling in image representation: The visual codeword point of view. Computer Vision and Image Understanding, 117(5):453- 465.
  8. Gao, S., Tsang, I. W.-H., and Chia, L.-T. (2010). Kernel sparse representation for image classification and face recognition. In Computer Vision-ECCV 2010, pages 1-14. Springer.
  9. Harada, T., Ushiku, Y., Yamashita, Y., and Kuniyoshi, Y. (2011). Discriminative spatial pyramid. In CVPR, pages 1617-1624. IEEE.
  10. He, J., Chang, S.-F., and Xie, L. (2008). Fast kernel learning for spatial pyramid matching. In CVPR. IEEE Computer Society.
  11. Iovan, C., Picard, D., Thome, N., and Cord, M. (2012). Classification of Urban Scenes from Geo-referenced Images in Urban Street-View Context. In Machine Learning and Applications (ICMLA), 2012 11th International Conference on, volume 2, pages 339-344, Ó tats-Unis.
  12. Khurshid, K., Faure, C., and Vincent, N. (2009). A novel approach for word spotting using merge-split edit distance. In Computer Analysis of Images and Patterns, pages 213-220. Springer.
  13. Klein, P. N., Sebastian, T. B., and Kimia, B. B. (2001). Shape matching using edit-distance: an implementation. In Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms, pages 781-790. Society for Industrial and Applied Mathematics.
  14. Lazebnik, S., Schmid, C., and Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR 2006, IEEE Computer SocietyConference on Computer Vision and Pattern Recognition, volume 2, pages 2169- 2178. IEEE.
  15. Li, H. and Jiang, T. (2005). A class of edit kernels for svms to predict translation initiation sites in eukaryotic mrnas. Journal of Computational Biology, 12(6):702- 718.
  16. Seni, G., Kripasundar, V., and Srihari, R. K. (1996). Generalizing edit distance to incorporate domain information: Handwritten text recognition as a case study. Pattern Recognition, 29(3):405-414.
  17. Sharma, G. and Jurie, F. (2011). Learning discriminative spatial representation for image classification. In Jesse Hoey, Stephen McKenna and Emanuele Trucco, Proceedings of the British Machine Vision Conference, pages, pages 6-1.
  18. Sivic, J. and Zisserman, A. (2003). Video Google: A text retrieval approach to object matching in videos. In Proceedings of the International Conference on Computer Vision, volume 2, pages 1470-1477.
  19. Tirilly, P., Claveau, V., and Gros, P. (2008). Language modeling for bag-of-visual words image categorization. In Proceedings of the 2008 international conference on Content-based image and video retrieval, CIVR 7808, pages 249-258, New York, NY, USA. ACM.
  20. Viitaniemi, V. and Laaksonen, J. (2009). Spatial extensions to bag of visual words. In CIVR.
  21. Wagner, R. and Fischer, M. (1974). The string-to-string correction problem. J. ACM, 21(1):168-173.
  22. Yang, J., Yu, K., Gong, Y., and Huang, T. (2009a). Linear spatial pyramid matching using sparse coding for image classification. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 1794-1801. IEEE.
  23. Yang, J., Yu, K., Gong, Y., and Huang, T. (2009b). Linear spatial pyramid matching using sparse coding for image classification. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 1794-1801. IEEE.
  24. Yang, Y. and Newsam, S. (2011). Spatial pyramid cooccurrence for image classification. In Computer Vision (ICCV), 2011 IEEE International Conference on, pages 1465-1472. IEEE.
  25. Yeh, M.-C. and Cheng, K.-T. (2011). Fast visual retrieval using accelerated sequence matching. Multimedia, IEEE Transactions on, 13(2):320-329.
Download


Paper Citation


in Harvard Style

Nguyen H., Barat C. and Ducottet C. (2014). Approximate Image Matching using Strings of Bag-of-Visual Words Representation . In Proceedings of the 9th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2014) ISBN 978-989-758-004-8, pages 345-353. DOI: 10.5220/0004676803450353


in Bibtex Style

@conference{visapp14,
author={Hong-Thinh Nguyen and Cécile Barat and Christophe Ducottet},
title={Approximate Image Matching using Strings of Bag-of-Visual Words Representation},
booktitle={Proceedings of the 9th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2014)},
year={2014},
pages={345-353},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004676803450353},
isbn={978-989-758-004-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 9th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2014)
TI - Approximate Image Matching using Strings of Bag-of-Visual Words Representation
SN - 978-989-758-004-8
AU - Nguyen H.
AU - Barat C.
AU - Ducottet C.
PY - 2014
SP - 345
EP - 353
DO - 10.5220/0004676803450353