Are Large Scale Training Images or Discriminative Features Important for Codebook Construction?

Veerapathirapillai Vinoharan, Amirthalingam Ramanan

Abstract

Advances in machine learning and image feature representations have led to great progress in pattern recognition approaches in recognising up to 1000 visual object categories. However, the human brain solves this problem effortlessly as it can recognise about 10000 to 100000 objects with a small number of examples. In recent years bag-of-features approach has proved to yield state-of-the-art performance in large scale evaluations. In such systems a visual codebook plays a crucial role. For constructing a codebook researchers cover a large-scale of training image set. But this brings up the issue of scalability. A large volume of training data becomes difficult to process whereas the high dimensional image representation could make many machine learning algorithms become inefficient or even a breakdown. In this work we investigate whether the dominant bag-of-features approach used in object recognition will continue significantly to improve with large training image set or not. We have validated a one-pass clustering algorithm to construct visual codebooks for object classification tasks on the PASCAL VOC Challenge image set. Our testing results show that adding more training images do not contribute significantly to increase the performance of classification but it increases the overall model complexity in terms of increased storage requirement and greater computational time. This study further suggests an alternative view to the community working with the patch-based object recognition to enforce retaining more discriminative descriptors rather than the reminiscent of the BIG data hypothesis.

References

  1. Cortes, C. and Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3):273-297.
  2. Csurka, G., Dance, C., Fan, L., Willamowski, J., and Bray, C. (2004). Visual categorization with bags of keypoints. In Workshop on statistical learning in computer vision, volume 1, pages 1-2.
  3. Everingham, M., Eslami, S. M. A., Gool, L. V., Williams, C. K. I., Winn, J., and Zisserman, A. (2010). The PASCAL Visual Object Classes VOC Challenge. International Journal of Computer Vision (IJCV), 88(2):303- 338.
  4. Karmakar, P., Teng, S. W., Lu, G., and Zhang, D. (2015). Rotation invariant spatial pyramid matching for image classification. In Proceedings of the International Conference on Digital Image Computing: Techniques and Applications (DICTA), pages 653-660.
  5. Kim, S. (2011). Robust object categorization and segmentation motivated by visual contexts in the human visual system. EURASIP Journal on Advances in Signal Processing.
  6. Kirishanthy, T. and Ramanan, A. (2015). Creating compact and discriminative visual vocabularies using visual bits. In International Conference on Digital Image Computing: Techniques and Applications (DICTA), pages 258-263.
  7. Li, T., Mei, T., and Kweon, I. S. (2008). Learning optimal compact codebook for efficient object categorization. In IEEE Workshop on Applications of Computer Vision, pages 1-6.
  8. Lowe, D. (2004). Distinctive image features from scaleinvariant keypoints. International Journal of Computer Vision, 60(2):91-110.
  9. Ramanan, A. and Niranjan, M. (2010). A one-pass resource-allocating codebook for patch-based visual object recognition. In IEEE International Workshop on Machine Learning for Signal Processing, pages 35-40.
  10. Ramanan, A. and Niranjan, M. (2011). A review of codebook models in patch-based visual object recognition. Journal of Signal Processing Systems, Springer, 68(3):333-352.
  11. Ullman, S., Vidal-Naquet, M., and sali, E. (2002). Visual features of intermediate complexity and their use in classification. Nature neuroscience, 5(7):682-687.
  12. Winn, J., Criminisi, A., and Minka, T. (2005). Object categorization by learned universal visual dictionary. In IEEE International Conference on Computer Vision, volume 2, pages 1800-1807.
  13. Yang, L., Jin, R., Sukthankar, H., and Jurie, F. (2008). Unifying discriminative visual codebook generation with classifier training for object category recognition. In proceeding of IEEE conference on Computer Vision and Pattern Recognition (CVPR 2008), pages 1-8.
  14. Zhu, X., Vondrick, C., Ramanan, D., and Fowlkes, C. (2012). Do we need more training data or better models for object detection? In British Machine Vision Conference (BMVC).
Download


Paper Citation


in Harvard Style

Vinoharan V. and Ramanan A. (2016). Are Large Scale Training Images or Discriminative Features Important for Codebook Construction? . In Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-173-1, pages 193-198. DOI: 10.5220/0005676201930198


in Bibtex Style

@conference{icpram16,
author={Veerapathirapillai Vinoharan and Amirthalingam Ramanan},
title={Are Large Scale Training Images or Discriminative Features Important for Codebook Construction?},
booktitle={Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2016},
pages={193-198},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005676201930198},
isbn={978-989-758-173-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - Are Large Scale Training Images or Discriminative Features Important for Codebook Construction?
SN - 978-989-758-173-1
AU - Vinoharan V.
AU - Ramanan A.
PY - 2016
SP - 193
EP - 198
DO - 10.5220/0005676201930198