Efficient Bag of Scenes Analysis for Image Categorization
Sébastien Paris, Xanadu Halkias, Hervé Glotin
2013
Abstract
In this paper, we address the general problem of image/object categorization with a novel approach referred to as Bag-of-Scenes (BoS).Our approach is efficient for low semantic applications such as texture classification as well as for higher semantic tasks such as natural scenes recognition or fine-grained visual categorization (FGVC). It is based on the widely used combination of i) Sparse coding (Sc), ii) Max-pooling and iii) Spatial Pyramid Matching (SPM) techniques applied to histograms of multi-scale Local Binary/Ternary Patterns (LBP/LTP) and its improved variants. This approach can be considered as a two-layer hierarchical architecture: the first layer encodes the local spatial patch structure via histograms of LBP/LTP while the second encodes the relationships between pre-analyzed LBP/LTP-scenes/objects. Our method outperforms SIFT-based approaches using Sc techniques and can be trained efficiently with a simple linear SVM.
References
- Avila, S. E. F., Thome, N., Cord, M., Valle, E., and de Albuquerque Araújo, A. (2011). Bossa: Extended bow formalism for image classification. In ICIP' 11.
- Bianconi, F. and Fernández, A. (2011). On the occurrence probability of local binary patterns: A theoretical study. Journal of Mathematical Imaging and Vision, 40(3):259-268.
- Bianconi, F., González, E., Fernández, A., and Saetta, S. A. (2012). Automatic classification of granite tiles through colour and texture features. Expert Syst. Appl., 39(12):11212-11218.
- Bo, L., Lai, K., Ren, X., and Fox, D. (2011a). Object recognition with hierarchical kernel descriptors. In CVPR' 11.
- Bo, L., Ren, X., and Fox, D. (2010). Kernel descriptors for visual recognition. In NIPS' 10.
- Bo, L., Ren, X., and Fox, D. (2011b). Hierarchical matching pursuit for image classification: Architecture and fast algorithms. In NIPS' 11, pages 2115-2123.
- Boiman, O., Shechtman, E., and Irani, M. (2008). In defense of nearest-neighbor based image classification. In CVPR' 08.
- Bosch, A., Zisserman, A., and Munoz, X. (2007). Image classification using random forests and ferns. In ICCV' 07.
- Boureau, Y., Bach, F., LeCun, Y., and Ponce, J. (2010a). Learning mid-level features for recognition. In CVPR' 10.
- Boureau, Y., Le Roux, N., Bach, F., Ponce, J., and LeCun, Y. (2011). Ask the locals: multi-way local pooling for image recognition. In ICCV' 11.
- Boureau, Y., Ponce, J., and LeCun, Y. (2010b). A theoretical analysis of feature pooling in vision algorithms. In ICML' 10.
- Chai, Y., Lempitsky, V. S., and Zisserman, A. (2011). Bicos: A bi-level co-segmentation method for image classification. In ICCV' 11.
- Chatfield, K., Lempitsky, V., Vedaldi, A., and Zisserman, A. (2011). The devil is in the details: an evaluation of recent feature encoding methods. In BMVC.
- Chen, J., Shan, S., He, C., Zhao, G., Pietikainen, M., Chen, X., and Gao, W. (2010). Wld: A robust local image descriptor. IEEE Trans. PAMI, 32(9).
- Choi, J., Schwartz, W. R., Guo, H., and Davis, L. S. (2012). A complementary local feature descriptor for face identification. In WACV' 12.
- Dalal, N. and Triggs, B. (2005). Histograms of oriented gradients for human detection. In CVPR' 05.
- Deselaers, T. and Ferrari, V. (2010). Global and efficient self-similarity for object classification and detection. In CVPR' 10.
- Duchenne, O., Joulin, A., and Ponce, J. (2011). A graphmatching kernel for object categorization. In ICCV' 11.
- Elfiky, N. M., Khan, F. S., van de Weijer, J., and Gonzàlez, J. (2012). Discriminative compact pyramids for object and scene recognition. Pattern Recognition, 45(4):1627-1636.
- Fei-Fei, L., Fergus, R., and Perona, P. (2007). Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. Comput. Vis. Image Underst., 106(1):59- 70.
- Fröba, B. and Ernst, A. (2004). Face detection with the modified census transform. In FGR' 04.
- Gao, S., Tsang, I. W.-H., Chia, L.-T., and Zhao, P. (2010). Local features are not lonely laplacian sparse coding for image classification. In CVPR 7810.
- Heikkilä, M., Pietikäinen, M., and Schmid, C. (2006). Description of interest regions with center-symmetric local binary patterns. In CVGIP 7806.
- Hsieh, C., Chang, K., Lin, C., and Keerthi, S. (2008). A dual coordinate descent method for large-scale linear svm.
- Huang, D., Shan, C., Ardabilian, M., Wang, Y., and Chen, L. (2011). Local Binary Patterns and Its Application to Facial Image Analysis: A Survey. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 41(4):1-17.
- Hussain, S. u. and Triggs, W. (2012). Visual recognition using local quantized patterns. In CVPR' 12.
- Jia, Y., Huang, C., and Darrell, T. (2011). Beyond Spatial Pyramids: Receptive Field Learning for Pooled Image Features. In NIPS 7811.
- Jun, B. and Kim, D. (2012). Robust face detection using local gradient patterns and evidence accumulation. Pattern Recognition, 45(9):3304-3316.
- Khan, F. S., van de Weijer, J., Bagdanov, A. D., and Vanrell, M. (2011). Portmanteau vocabularies for multi-cue image representation. In NIPS' 11.
- Khosla, A., Jayadevaprakash, N., Yao, B., and Fei-Fei, L. (2011a). Novel dataset for fine-grained image categorization. In CVPR 7811.
- Khosla, A., Jayadevaprakash, N., Yao, B., and Fei-Fei, L. (2011b). Novel dataset for fine-grained image categorization. In First Workshop on Fine-Grained Visual Categorization, CVPR 7811.
- Krapac, J., Verbeek, J., and Jurie, F. (2011). Modeling Spatial Layout with Fisher Vectors for Image Categorization. In ICCV 7811.
- Larios, N., Lin, J., Zhang, M., Lytle, D., Moldenke, a., Shapiro, L., and Dietterich, T. (2011). Stacked spatialpyramid kernel: An object-class recognition method to combine scores from random trees. In WACV' 11.
- Lazebnik, S., Schmid, C., and Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR' 06.
- Lee, H., Chung, Y., Kim, J., and Park, D. (2010). Face image retrieval using sparse representation classifier with gabor-lbp histogram. In WISA' 10.
- Li, F., Carreira, J., and Sminchisescu, C. (2010). Object recognition as ranking holistic figure-ground hypotheses. In CVPR' 10.
- Li, L. (2007). What, where and who? classifying event by scene and object recognition. In CVPR 7807.
- Liao, S., Zhu, X., Lei, Z., Zhang, L., and Li, S. Z. (2007). Learning multi-scale block local binary patterns for face recognition. In ICB.
- Lowe, D. G. (2009). Object recognition from local scaleinvariant features. In ICCV' 99.
- Mairal, J., Bach, F., Ponce, J., and Sapiro, G. (2009). Online dictionary learning for sparse coding. In ICML 7809.
- Marcel, S., Rodriguez, Y., and Heusch, G. (2007). On the recent use of local binary patterns for face authentication. International Journal on Image and Video Processing Special Issue on Facial Image Processing.
- Ojala, T., Pietikäinen, M., and Mäenpää, T. (2002). Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. PAMI, 24(7).
- Oliva, A. and Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42.
- Oliveira, G. L., Nascimento, E. R., Viera, A. W., and Campos, M. F. M. (2012). Sparse spatial coding: A novel approach for efficient and accurate object recognition. ICRA' 12.
- Paris, S. and Glotin, H. (2010). Pyramidal multi-level features for the robot vision@icpr 2010 challenge. In ICPR' 10.
- Perronnin, F., Sánchez, J., and Mensink, T. Improving the fisher kernel for large-scale image classification. In ECCV' 10.
- Sadat, R. M. N., Teng, S. W., Lu, G., and Hasan, S. F. (2011). Texture classification using multimodal invariant local binary pattern. In WACV 7811.
- Shalev-Shwartz, S., Singer, Y., Srebro, N., and Cotter, A. (2007). Pegasos: Primal estimated sub-gradient solver for svm.
- Sohn, K., Jung, D. Y., Lee, H., and Hero III, A. O. (2011). Efficient Learning of Sparse , Distributed , Convolutional Feature Representations for Object Recognition. ICCV' 11.
- Tan, X. and Triggs, B. (2010). Enhanced local texture feature sets for face recognition under difficult lighting conditions. Trans. Img. Proc., 19(6):1635-1650.
- Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society (Series B), 58.
- Todorovic, S. and Ahuja, N. (2008). Learning subcategory relevances for category recognition. In CVPR' 08.
- Viola, P. and Jones, M. (2004). Robust real-time face detection. International Journal of Computer Vision, 57.
- Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., and Perona, P. (2010). Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology.
- Willamowski, J., Arregui, D., Csurka, G., Dance, C. R., and Fan, L. (2004). Categorizing nine visual classes using local appearance descriptors. In ICPR' 04.
- Wu, J., Geyer, C., and Rehg, J. M. (2011). Real-time human detection using contour cues. In ICRA' 11.
- Wu, J. and Rehg, J. (2009). Beyond the euclidean distance: Creating effective visual codebooks using the histogram intersection kernel. In ICCV' 09.
- Wu, J. and Rehg, J. M. (2008). Where am i: Place instance and category recognition using spatial pact. CVPR' 2008.
- Yang, J., Li, Y., Tian, Y., Duan, L., and Gao, W. (2009a). Group-sensitive multiple kernel learning for object categorization. In ICCV' 09.
- Yang, J., Yu, K., Gong, Y., and Huang, T. S. (2009b). Linear spatial pyramid matching using sparse coding for image classification. In CVPR' 09.
- Yao, B. and Bradski, G. (2012). A Codebook-Free and Annotation-Free Approach for Fine-Grained Image Categorization. In CVPR' 12.
- Yao, B., Khosla, A., and Li, F.-F. (2011). Combining randomization and discrimination for fine-grained image categorization. In CVPR' 11.
- Zhang, B., Gao, Y., Zhao, S., and Liu, J. (2010). Local derivative pattern versus local binary pattern: Face recognition with high-order local pattern descriptor. IEEE Trans. Img. Proc., 19(2).
- Zhang, L., Chu, R., Xiang, S., Liao, S., and Li, S. Z. (2007). Face detection based on multi-block lbp representation. In ICB' 07.
- Zhang, W., Shan, S., Qing, L., Chen, X., and Gao, W. (2009). Are gabor phases really useless for face recognition? Pattern Anal. Appl., 12(3):301-307.
- Zheng, Y., Shen, C., Hartley, R. I., and Huang, X. (2010). Effective pedestrian detection using center-symmetric local binary/trinary patterns. CoRR, abs/1009.0892.
Paper Citation
in Harvard Style
Paris S., Halkias X. and Glotin H. (2013). Efficient Bag of Scenes Analysis for Image Categorization . In Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-8565-41-9, pages 335-344. DOI: 10.5220/0004198303350344
in Bibtex Style
@conference{icpram13,
author={Sébastien Paris and Xanadu Halkias and Hervé Glotin},
title={Efficient Bag of Scenes Analysis for Image Categorization},
booktitle={Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2013},
pages={335-344},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004198303350344},
isbn={978-989-8565-41-9},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - Efficient Bag of Scenes Analysis for Image Categorization
SN - 978-989-8565-41-9
AU - Paris S.
AU - Halkias X.
AU - Glotin H.
PY - 2013
SP - 335
EP - 344
DO - 10.5220/0004198303350344