Object Detection Oriented Feature Pooling for Video Semantic Indexing

Kazuya Ueki, Tetsunori Kobayashi

2017

Abstract

We propose a new feature extraction method for video semantic indexing. Conventional methods extract features densely and uniformly across an entire image, whereas the proposed method exploits the object detector to extract features from image windows with high objectness. This feature extraction method focuses on ``objects.'' Therefore, we can eliminate the unnecessary background information, and keep the useful information such as the position, the size, and the aspect ratio of a object. Since these object detection oriented features are complementary to features from entire images, the performance of video semantic indexing can be further improved. Experimental comparisons using large-scale video dataset of the TRECVID benchmark demonstrated that the proposed method substantially improved the performance of video semantic indexing.

References

  1. Ayache, S. and Quénot, G. (2008). Video corpus annotation using active learning. In 30h European Conference on Information Retrieval (ECIRf08), pages 187-198.
  2. Blanc-Talon, J., Philips, W., Popescu, D. C., Scheunders, P., and Zemcík, P. (2012). Advanced concepts for intelligent vision systems. In Proceedings of 14th International Conference, ACIVS 2012.
  3. Csurka, G., Bray, C., Dance, C., and Fan, L. (2004). Visual categorization with bags of keypoints. In Proceedings of ECCV Workshop on Statistical Learning in Computer Vision, pages 1-22.
  4. Dalal, N. and Triggs, B. (2005). Histograms of oriented gradients for human detection. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, volume 1, pages 886-893.
  5. Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., and Zisserman, A. (2010). The pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2):303-338.
  6. Lowe, D. G. (1999). Object recognition from local scale invariant features. In Proceedings of IEEE International Conference on Computer Vision, pages 1150-1157.
  7. Lowe, D. G. (2004). Distinctive image features from scaleinvariant keypoints. International Journal of Computer Vision, 60(2):91-110.
  8. Mikolajczyk, K. and Schmid, C. (2004). Scale & affine invariant interest point detectors. International Journal of Computer Vision, 60(1):63-86.
  9. N. Inoue, T. H. Dang, R. Y. and Shinoda, K. (2015). TokyoTech at TRECVID 2015. In TRECVID 2015.
  10. Ojala, T., Pietikäinen, M., and Harwood, D. (1994). Performance evaluation of texture measures with classification based on kullback discrimination of distributions. In Proceedings of the IAPR International Conference, volume 1, pages 582-585.
  11. Over, P., Awad, G., Michel, M., Fiscus, J., Kraaij, W., Smeaton, A. F., Quénot, G., and Ordelman, R. (2015). TRECVID 2015 - An overview of the goals, tasks, data, evaluation mechanisms and metrics. In Proceedings of TRECVID 2015. NIST, USA.
  12. Ren, S., He, K., Girshick, R. B., and Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. CoRR, abs/1506.01497.
  13. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., and Fei-Fei, L. (2015). ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211-252.
  14. Sánchez, J., Perronnin, F., Mensink, T., and Verbeek, J. (2013). Image classification with the Fisher vector: Theory and practice. International Journal of Computer Vision, 105(3):222-245.
  15. Schmid, C. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of CVPR 2006, pages 2169- 2178.
  16. Simonyan, K. and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556.
  17. Smeaton, A. F., Over, P., and Kraaij, W. (2006). Evaluation campaigns and TRECVid. In MIR 7806: Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval, pages 321-330, New York, NY, USA. ACM Press.
  18. Snoek, C. G. M., Cappallo, S., van Gemert, J., Habibian, A., Mensink, T., Mettes, P., Tao, R., Koelma, D. C., and Smeulders, A. W. M. (2015). Qualcomm Research and University of Amsterdam at TRECVID 2015: Recognizing Concepts, Objects, and Events in Video. In TRECVID 2015.
  19. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S. E., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2014). Going deeper with convolutions. CoRR, abs/1409.4842.
  20. Ueki, K. and Kobayashi, T. (2015). Waseda at TRECVID 2015: Semantic Indexing. In TRECVID 2015.
  21. Varma, M. and Ray, D. (2007). Learning the discriminative power-invariance trade-off. In Proceedings of the IEEE International Conference on Computer Vision, Rio de Janeiro, Brazil.
  22. Zeiler, M. D. and Fergus, R. (2013). Visualizing and understanding convolutional networks. CoRR, abs/1311.2901.
Download


Paper Citation


in Harvard Style

Ueki K. and Kobayashi T. (2017). Object Detection Oriented Feature Pooling for Video Semantic Indexing . In Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 5: VISAPP, (VISIGRAPP 2017) ISBN 978-989-758-226-4, pages 44-51. DOI: 10.5220/0006099600440051


in Bibtex Style

@conference{visapp17,
author={Kazuya Ueki and Tetsunori Kobayashi},
title={Object Detection Oriented Feature Pooling for Video Semantic Indexing},
booktitle={Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 5: VISAPP, (VISIGRAPP 2017)},
year={2017},
pages={44-51},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006099600440051},
isbn={978-989-758-226-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 5: VISAPP, (VISIGRAPP 2017)
TI - Object Detection Oriented Feature Pooling for Video Semantic Indexing
SN - 978-989-758-226-4
AU - Ueki K.
AU - Kobayashi T.
PY - 2017
SP - 44
EP - 51
DO - 10.5220/0006099600440051