Efficient Bag of Scenes Analysis for Image Categorization

Sébastien Paris, Xanadu Halkias, Hervé Glotin


In this paper, we address the general problem of image/object categorization with a novel approach referred to as Bag-of-Scenes (BoS).Our approach is efficient for low semantic applications such as texture classification as well as for higher semantic tasks such as natural scenes recognition or fine-grained visual categorization (FGVC). It is based on the widely used combination of i) Sparse coding (Sc), ii) Max-pooling and iii) Spatial Pyramid Matching (SPM) techniques applied to histograms of multi-scale Local Binary/Ternary Patterns (LBP/LTP) and its improved variants. This approach can be considered as a two-layer hierarchical architecture: the first layer encodes the local spatial patch structure via histograms of LBP/LTP while the second encodes the relationships between pre-analyzed LBP/LTP-scenes/objects. Our method outperforms SIFT-based approaches using Sc techniques and can be trained efficiently with a simple linear SVM.


  1. Avila, S. E. F., Thome, N., Cord, M., Valle, E., and de Albuquerque Araújo, A. (2011). Bossa: Extended bow formalism for image classification. In ICIP' 11.
  2. Bianconi, F. and Fernández, A. (2011). On the occurrence probability of local binary patterns: A theoretical study. Journal of Mathematical Imaging and Vision, 40(3):259-268.
  3. Bianconi, F., González, E., Fernández, A., and Saetta, S. A. (2012). Automatic classification of granite tiles through colour and texture features. Expert Syst. Appl., 39(12):11212-11218.
  4. Bo, L., Lai, K., Ren, X., and Fox, D. (2011a). Object recognition with hierarchical kernel descriptors. In CVPR' 11.
  5. Bo, L., Ren, X., and Fox, D. (2010). Kernel descriptors for visual recognition. In NIPS' 10.
  6. Bo, L., Ren, X., and Fox, D. (2011b). Hierarchical matching pursuit for image classification: Architecture and fast algorithms. In NIPS' 11, pages 2115-2123.
  7. Boiman, O., Shechtman, E., and Irani, M. (2008). In defense of nearest-neighbor based image classification. In CVPR' 08.
  8. Bosch, A., Zisserman, A., and Munoz, X. (2007). Image classification using random forests and ferns. In ICCV' 07.
  9. Boureau, Y., Bach, F., LeCun, Y., and Ponce, J. (2010a). Learning mid-level features for recognition. In CVPR' 10.
  10. Boureau, Y., Le Roux, N., Bach, F., Ponce, J., and LeCun, Y. (2011). Ask the locals: multi-way local pooling for image recognition. In ICCV' 11.
  11. Boureau, Y., Ponce, J., and LeCun, Y. (2010b). A theoretical analysis of feature pooling in vision algorithms. In ICML' 10.
  12. Chai, Y., Lempitsky, V. S., and Zisserman, A. (2011). Bicos: A bi-level co-segmentation method for image classification. In ICCV' 11.
  13. Chatfield, K., Lempitsky, V., Vedaldi, A., and Zisserman, A. (2011). The devil is in the details: an evaluation of recent feature encoding methods. In BMVC.
  14. Chen, J., Shan, S., He, C., Zhao, G., Pietikainen, M., Chen, X., and Gao, W. (2010). Wld: A robust local image descriptor. IEEE Trans. PAMI, 32(9).
  15. Choi, J., Schwartz, W. R., Guo, H., and Davis, L. S. (2012). A complementary local feature descriptor for face identification. In WACV' 12.
  16. Dalal, N. and Triggs, B. (2005). Histograms of oriented gradients for human detection. In CVPR' 05.
  17. Deselaers, T. and Ferrari, V. (2010). Global and efficient self-similarity for object classification and detection. In CVPR' 10.
  18. Duchenne, O., Joulin, A., and Ponce, J. (2011). A graphmatching kernel for object categorization. In ICCV' 11.
  19. Elfiky, N. M., Khan, F. S., van de Weijer, J., and Gonzàlez, J. (2012). Discriminative compact pyramids for object and scene recognition. Pattern Recognition, 45(4):1627-1636.
  20. Fei-Fei, L., Fergus, R., and Perona, P. (2007). Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. Comput. Vis. Image Underst., 106(1):59- 70.
  21. Fröba, B. and Ernst, A. (2004). Face detection with the modified census transform. In FGR' 04.
  22. Gao, S., Tsang, I. W.-H., Chia, L.-T., and Zhao, P. (2010). Local features are not lonely laplacian sparse coding for image classification. In CVPR 7810.
  23. Heikkilä, M., Pietikäinen, M., and Schmid, C. (2006). Description of interest regions with center-symmetric local binary patterns. In CVGIP 7806.
  24. Hsieh, C., Chang, K., Lin, C., and Keerthi, S. (2008). A dual coordinate descent method for large-scale linear svm.
  25. Huang, D., Shan, C., Ardabilian, M., Wang, Y., and Chen, L. (2011). Local Binary Patterns and Its Application to Facial Image Analysis: A Survey. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 41(4):1-17.
  26. Hussain, S. u. and Triggs, W. (2012). Visual recognition using local quantized patterns. In CVPR' 12.
  27. Jia, Y., Huang, C., and Darrell, T. (2011). Beyond Spatial Pyramids: Receptive Field Learning for Pooled Image Features. In NIPS 7811.
  28. Jun, B. and Kim, D. (2012). Robust face detection using local gradient patterns and evidence accumulation. Pattern Recognition, 45(9):3304-3316.
  29. Khan, F. S., van de Weijer, J., Bagdanov, A. D., and Vanrell, M. (2011). Portmanteau vocabularies for multi-cue image representation. In NIPS' 11.
  30. Khosla, A., Jayadevaprakash, N., Yao, B., and Fei-Fei, L. (2011a). Novel dataset for fine-grained image categorization. In CVPR 7811.
  31. Khosla, A., Jayadevaprakash, N., Yao, B., and Fei-Fei, L. (2011b). Novel dataset for fine-grained image categorization. In First Workshop on Fine-Grained Visual Categorization, CVPR 7811.
  32. Krapac, J., Verbeek, J., and Jurie, F. (2011). Modeling Spatial Layout with Fisher Vectors for Image Categorization. In ICCV 7811.
  33. Larios, N., Lin, J., Zhang, M., Lytle, D., Moldenke, a., Shapiro, L., and Dietterich, T. (2011). Stacked spatialpyramid kernel: An object-class recognition method to combine scores from random trees. In WACV' 11.
  34. Lazebnik, S., Schmid, C., and Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR' 06.
  35. Lee, H., Chung, Y., Kim, J., and Park, D. (2010). Face image retrieval using sparse representation classifier with gabor-lbp histogram. In WISA' 10.
  36. Li, F., Carreira, J., and Sminchisescu, C. (2010). Object recognition as ranking holistic figure-ground hypotheses. In CVPR' 10.
  37. Li, L. (2007). What, where and who? classifying event by scene and object recognition. In CVPR 7807.
  38. Liao, S., Zhu, X., Lei, Z., Zhang, L., and Li, S. Z. (2007). Learning multi-scale block local binary patterns for face recognition. In ICB.
  39. Lowe, D. G. (2009). Object recognition from local scaleinvariant features. In ICCV' 99.
  40. Mairal, J., Bach, F., Ponce, J., and Sapiro, G. (2009). Online dictionary learning for sparse coding. In ICML 7809.
  41. Marcel, S., Rodriguez, Y., and Heusch, G. (2007). On the recent use of local binary patterns for face authentication. International Journal on Image and Video Processing Special Issue on Facial Image Processing.
  42. Ojala, T., Pietikäinen, M., and Mäenpää, T. (2002). Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. PAMI, 24(7).
  43. Oliva, A. and Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42.
  44. Oliveira, G. L., Nascimento, E. R., Viera, A. W., and Campos, M. F. M. (2012). Sparse spatial coding: A novel approach for efficient and accurate object recognition. ICRA' 12.
  45. Paris, S. and Glotin, H. (2010). Pyramidal multi-level features for the robot vision@icpr 2010 challenge. In ICPR' 10.
  46. Perronnin, F., Sánchez, J., and Mensink, T. Improving the fisher kernel for large-scale image classification. In ECCV' 10.
  47. Sadat, R. M. N., Teng, S. W., Lu, G., and Hasan, S. F. (2011). Texture classification using multimodal invariant local binary pattern. In WACV 7811.
  48. Shalev-Shwartz, S., Singer, Y., Srebro, N., and Cotter, A. (2007). Pegasos: Primal estimated sub-gradient solver for svm.
  49. Sohn, K., Jung, D. Y., Lee, H., and Hero III, A. O. (2011). Efficient Learning of Sparse , Distributed , Convolutional Feature Representations for Object Recognition. ICCV' 11.
  50. Tan, X. and Triggs, B. (2010). Enhanced local texture feature sets for face recognition under difficult lighting conditions. Trans. Img. Proc., 19(6):1635-1650.
  51. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society (Series B), 58.
  52. Todorovic, S. and Ahuja, N. (2008). Learning subcategory relevances for category recognition. In CVPR' 08.
  53. Viola, P. and Jones, M. (2004). Robust real-time face detection. International Journal of Computer Vision, 57.
  54. Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., and Perona, P. (2010). Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology.
  55. Willamowski, J., Arregui, D., Csurka, G., Dance, C. R., and Fan, L. (2004). Categorizing nine visual classes using local appearance descriptors. In ICPR' 04.
  56. Wu, J., Geyer, C., and Rehg, J. M. (2011). Real-time human detection using contour cues. In ICRA' 11.
  57. Wu, J. and Rehg, J. (2009). Beyond the euclidean distance: Creating effective visual codebooks using the histogram intersection kernel. In ICCV' 09.
  58. Wu, J. and Rehg, J. M. (2008). Where am i: Place instance and category recognition using spatial pact. CVPR' 2008.
  59. Yang, J., Li, Y., Tian, Y., Duan, L., and Gao, W. (2009a). Group-sensitive multiple kernel learning for object categorization. In ICCV' 09.
  60. Yang, J., Yu, K., Gong, Y., and Huang, T. S. (2009b). Linear spatial pyramid matching using sparse coding for image classification. In CVPR' 09.
  61. Yao, B. and Bradski, G. (2012). A Codebook-Free and Annotation-Free Approach for Fine-Grained Image Categorization. In CVPR' 12.
  62. Yao, B., Khosla, A., and Li, F.-F. (2011). Combining randomization and discrimination for fine-grained image categorization. In CVPR' 11.
  63. Zhang, B., Gao, Y., Zhao, S., and Liu, J. (2010). Local derivative pattern versus local binary pattern: Face recognition with high-order local pattern descriptor. IEEE Trans. Img. Proc., 19(2).
  64. Zhang, L., Chu, R., Xiang, S., Liao, S., and Li, S. Z. (2007). Face detection based on multi-block lbp representation. In ICB' 07.
  65. Zhang, W., Shan, S., Qing, L., Chen, X., and Gao, W. (2009). Are gabor phases really useless for face recognition? Pattern Anal. Appl., 12(3):301-307.
  66. Zheng, Y., Shen, C., Hartley, R. I., and Huang, X. (2010). Effective pedestrian detection using center-symmetric local binary/trinary patterns. CoRR, abs/1009.0892.

Paper Citation

in Harvard Style

Paris S., Halkias X. and Glotin H. (2013). Efficient Bag of Scenes Analysis for Image Categorization . In Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-8565-41-9, pages 335-344. DOI: 10.5220/0004198303350344

in Bibtex Style

author={Sébastien Paris and Xanadu Halkias and Hervé Glotin},
title={Efficient Bag of Scenes Analysis for Image Categorization},
booktitle={Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},

in EndNote Style

JO - Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - Efficient Bag of Scenes Analysis for Image Categorization
SN - 978-989-8565-41-9
AU - Paris S.
AU - Halkias X.
AU - Glotin H.
PY - 2013
SP - 335
EP - 344
DO - 10.5220/0004198303350344