Dictionary based Pooling for Object Categorization

Sean Ryan Fanello, Nicoletta Noceti, Giorgio Metta, Francesca Odone


It is well known that image representations learned through ad-hoc dictionaries improve the overall results in object categorization problems. Following the widely accepted coding-pooling visual recognition pipeline, these representations are often tightly coupled with a coding stage. In this paper we show how to exploit ad-hoc representations both within the coding and the pooling phases. We learn a dictionary for each object class and then use local descriptors encoded with the learned atoms to guide the pooling operator. We exhaustively evaluate the proposed approach in both single instance object recognition and object categorization problems. From the applications standpoint we consider a classical image retrieval scenario with the Caltech 101, as well as a typical robot vision task with data acquired by the iCub humanoid robot.


  1. Bay, H., Ess, A., Tuytelaars, T., and Vangool, L. (2008). Speeded-up robust features. CVIU, 110:346-359.
  2. Boureau, Y.-L., Bach, F., LeCun, Y., and Ponce, J. (2010). Learning mid-level features for recognition. In CVPR.
  3. Boureau, Y.-L., Le Roux, N., Bach, F., Ponce, J., and LeCun, Y. (2011). Ask the locals: multi-way local pooling for image recognition. In ICCV.
  4. Chen, Q., Song, Z., Hua, Z., Y., H., and Yan, S. (2012). Hierarchical matching with side information for image classification. In CVPR.
  5. Collet, A., Martinez, M., and Srinivasa, S. S. (2011). The MOPED framework: Object Recognition and Pose Estimation for Manipulation. The International Journal of Robotics Research.
  6. Csurka, G., Dance, C., Fan, L., Willamowski, J., and BrayLixin, C. (2004). Visual categorization with bags of keypoints. In In Workshop on Statistical Learning in Computer Vision, ECCV.
  7. Destrero, A., De Mol, C., Odone, F., and A., V. (2009). A sparsity-enforcing method for learning face features. IP, 18:188-201.
  8. Ekvall, S., Kragic, D., and Hoffmann, F. (2003). Object recognition and pose estimation using color cooccurrence histograms and geometric modeling. In Image Vision Computing.
  9. Fanello, S., Ciliberto, C., Santoro, M., Natale, L., Metta, G., Rosasco, L., and Odone, F. (2013a). icub world: Fanello, S. R., Ciliberto, C., Natale, L., and Metta, G. (2013b). Weakly supervised strategies for natural object recognition in robotics. ICRA.
  10. Fanello, S. R., Noceti, N., Metta, G., and Odone, F. (2013c). Multi-class image classification: Sparsity does it better. VISAPP.
  11. Fei-Fei, L., Fergus, R., and Perona, P. (2004). Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. CVPRW.
  12. Fei-fei, L. and Perona, P. (2005). A bayesian hierarchical model for learning natural scene categories. In CVPR, pages 524-531.
  13. Feng, J., Ni, B., Tian, Q., and Yan, S. (2011). Geometric lp-norm feature pooling for image classification. In CVPR, pages 2609-2704.
  14. Gordon, I. and Lowe, D. (2006). What and where: 3d object recognition with accurate pose. In Lecture Notes in Computer Science.
  15. Huang, K. and Aviyente, S. (2008). Wavelet feature selection for image classification. IP, 17:1709-1720.
  16. Jia, Y., Huang, C., and Darrell, T. (2012). Beyond spatial pyramids: Receptive field learning for pooled image features. In CVPR, pages 3370-3377.
  17. Kong, S. and Wang, D. (2012). A dictionary learning approach for classification: separating the particularity and the commonality. In ECCV.
  18. Lazebnik, S., Schmid, C., and Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR, volume 2, pages 2169-2178.
  19. Lowe, D. G. (2004). Distinctive image features from scaleinvariant keypoints. IJCV, 60:91-110.
  20. Metta, G., Sandini, G., Vernon, D., Natale, L., and Nori, F. (2008). The icub humanoid robot: an open platform for research in embodied cognition. In 8th Work. on Performance Metrics for Intelligent Systems. Website: http://www.icub.org.
  21. Perronnin, F., Sánchez, J., and Mensink, T. (2010). Improving the fisher kernel for large-scale image classification. In ECCV.
  22. Russakovsky, O., Lin, Y., Yu, K., and Fei-Fei, L. (2012). Object-centric spatial pooling for image classification. In ECCV.
  23. Taylor, G. and Kleeman, L. (2003). Fusion of multimodal visual cues for model-based object tracking. In ACRA.
  24. Vapnik, V. (1998). Statistical Learning Theory. John Wiley and Sons, Inc.
  25. Viola, P. and Jones, M. (2004). Robust real-time face detection. IJCV, 57:137-154.
  26. Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., and Gong, Y. (2010). Locality-constrained linear coding for image classification. In CVPR.
  27. Yang, J., Yu, K., Gong, Y., and Huang, T. (2009). Linear spatial pyramid matching using sparse coding for image classification. In CVPR.

Paper Citation

in Harvard Style

Fanello S., Noceti N., Metta G. and Odone F. (2014). Dictionary based Pooling for Object Categorization . In Proceedings of the 9th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2014) ISBN 978-989-758-004-8, pages 269-274. DOI: 10.5220/0004654602690274

in Bibtex Style

author={Sean Ryan Fanello and Nicoletta Noceti and Giorgio Metta and Francesca Odone},
title={Dictionary based Pooling for Object Categorization},
booktitle={Proceedings of the 9th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2014)},

in EndNote Style

JO - Proceedings of the 9th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2014)
TI - Dictionary based Pooling for Object Categorization
SN - 978-989-758-004-8
AU - Fanello S.
AU - Noceti N.
AU - Metta G.
AU - Odone F.
PY - 2014
SP - 269
EP - 274
DO - 10.5220/0004654602690274