Weakly Supervised Object Localization with Large Fisher Vectors

Josip Krapac, Siniša Šegvić


We propose a novel method for learning object localization models in a weakly supervised manner, by employing images annotated with object class labels but not with object locations. Given an image, the learned model predicts both the presence of the object class in the image and the bounding box that determines the object location. The main ingredients of our method are a large Fisher vector representation and a sparse classification model enabling efficient evaluation of patch scores. The method is able to reliably detect very small objects with some intra-class variation in reasonable time. Experimental validation has been performed on a public dataset and we report localization performance comparable to strongly supervised approaches.


  1. Alexe, B., Deselaers, T., and Ferrari, V. (2010). What is an object? In CVPR, pages 73-80.
  2. Andrews, S., Tsochantaridis, I., and Hofmann, T. (2002). ”Support Vector Machines for Multiple-Instance Learning”. In NIPS, pages 561-568.
  3. Auer, P. (1997). ”On Learning From Multi-Instance Examples: Empirical Evaluation of a Theoretical Approach”. In ICML, pages 21-29.
  4. Bach, F. R., Jenatton, R., Mairal, J., and Obozinski, G. (2012). Optimization with sparsity-inducing penalties. Foundations and Trends in Machine Learning, 4(1):1-106.
  5. Bottou, L. (1991). ”Stochastic Gradient Learning in Neural Networks”. In Neuro-Nˆimes.
  6. Bradski, G. (2000). OpenCV library. Dr. Dobb's Journal of Software Tools.
  7. Chen, Q., Song, Z., Feris, R., Datta, A., Cao, L., Huang, Z., and Yan, S. (2013). Efficient maximum appearance search for large-scale object detection. In CVPR, pages 3190-3197.
  8. Chum, O., Perdoch, M., and Matas, J. (2009). Geometric min-hashing: Finding a (thick) needle in a haystack. In CVPR, pages 17-24.
  9. Chum, O. and Zisserman, A. (2007). An exemplar model for learning object classes. In CVPR.
  10. Cinbis, R. G., Verbeek, J. J., and Schmid, C. (2013). Segmentation driven object detection with fisher vectors. In ICCV, pages 2968-2975.
  11. Cinbis, R. G., Verbeek, J. J., and Schmid, C. (2014). Multifold MIL training for weakly supervised object localization. In CVPR, pages 2409-2416.
  12. Crandall, D. J. and Huttenlocher, D. P. (2006). ”Weakly Supervised Learning of Part-Based Spatial Models for Visual Object Recognition”. In ECCV, pages 16-29.
  13. Crowley, E. J. and Zisserman, A. (2013). Of gods and goats: Weakly supervised learning of figurative art. In BMVC.
  14. Csurka, G., Dance, C. R., Fan, L., Willamowski, J., and Bray, C. (2004). Visual categorization with bags of keypoints. In ECCV workshop, pages 1-22.
  15. Dalal, N. and Triggs, B. (2005). ”Histograms of Oriented Gradients for Human Detection”. In CVPR.
  16. Deselaers, T., Alexe, B., and Ferrari, V. (2012). ”Weakly Supervised Localization and Learning with Generic Knowledge”. IJCV, 100(3):275-293.
  17. Douze, M. and Jégou, H. (2009). https://gforge.inria.fr/projects/yael.
  18. Everingham, M., Gool, L. J. V., Williams, C. K. I., Winn, J. M., and Zisserman, A. (2010). ”The Pascal Visual Object Classes (VOC) Challenge”. IJCV, 88(2):303- 338.
  19. Felzenszwalb, P. F., Girshick, R. B., McAllester, D. A., and Ramanan, D. (2010). ”Object Detection with Discriminatively Trained Part-Based Models”. PAMI, 32(9):1627-1645.
  20. Galleguillos, C., Babenko, B., Rabinovich, A., and Belongie, S. J. (2008). ”Weakly Supervised Object Localization with Stable Segmentations”. In ECCV, pages 193-207.
  21. Gosselin, P.-H., Murray, N., Jégou, H., and Perronnin, F. (2013). ”Inria+Xerox@FGcomp: Boosting the Fisher vector for fine-grained classification”. Technical report, INRIA.
  22. Jaakkola, T. and Haussler, D. (1998). Exploiting generative models in discriminative classifiers. In NIPS, pages 487-493.
  23. Lampert, C. H., Blaschko, M. B., and Hofmann, T. (2009). ”Efficient Subwindow Search: A Branch and Bound Framework for Object Localization”. PAMI, 31(12):2129-2142.
  24. Lowe, D. G. (2004). ”Distinctive Image Features from Scale-Invariant Keypoints”. IJCV, 60(2):91-110.
  25. Munder, S. and Gavrila, D. M. (2006). An experimental study on pedestrian classification. PAMI, 28(11):1863-1868.
  26. Nguyen, M. H., Torresani, L., la Torre, F. D., and Rother, C. (2014). ”Learning discriminative localization from weakly labeled data”. Pattern Recognition, 47(3):1523-1534.
  27. Pandey, M. and Lazebnik, S. (2011). Scene recognition and weakly supervised object localization with deformable part-based models. In ICCV, pages 1307- 1314.
  28. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825-2830.
  29. Perronnin, F., Sánchez, J., and Mensink, T. (2010). ”Improving the Fisher Kernel for Large-Scale Image Classification”. In ECCV, pages 143-156.
  30. Sánchez, J., Perronnin, F., Mensink, T., and Verbeek, J. J. (2013). ”Image Classification with the Fisher Vector: Theory and Practice”. IJCV, 105(3):222-245.
  31. Siva, P. and Xiang, T. (2011). ”Weakly supervised object detector learning with model drift detection”. In ICCV, pages 343-350.
  32. Sivic, J. and Zisserman, A. (2003). Video google: a text retrieval approach to object matching in videos. In ICCV, pages 1470-1477.
  33. Vedaldi, A. and Fulkerson, B. (2008). VLFeat: An open and portable library of computer vision algorithms. http://www.vlfeat.org/.
  34. Viola, P. A. and Jones, M. J. (2004). ”Robust Real-Time Face Detection”. IJCV, 57(2):137-154.

Paper Citation

in Harvard Style

Krapac J. and Šegvić S. (2015). Weakly Supervised Object Localization with Large Fisher Vectors . In Proceedings of the 10th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2015) ISBN 978-989-758-090-1, pages 44-53. DOI: 10.5220/0005294900440053

in Bibtex Style

author={Josip Krapac and Siniša Šegvić},
title={Weakly Supervised Object Localization with Large Fisher Vectors},
booktitle={Proceedings of the 10th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2015)},

in EndNote Style

JO - Proceedings of the 10th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2015)
TI - Weakly Supervised Object Localization with Large Fisher Vectors
SN - 978-989-758-090-1
AU - Krapac J.
AU - Šegvić S.
PY - 2015
SP - 44
EP - 53
DO - 10.5220/0005294900440053