Action Recognition using the Rf Transform on Optical Flow Images

Josep Maria Carmona, Joan Climent

2017

Abstract

The objective of this paper is the automatic recognition of human actions in video sequences. The use of spatio-temporal features for action recognition has become very popular in recent literature Instead of extracting the spatio-temporal features from the raw video sequence, some authors propose to project the sequence to a single template first. As a contribution we propose the use of several variants of the R transform for projecting the image sequences to templates. The R transform projects the whole sequence to a single image, retaining information concerning movement direction and magnitude. Spatio-temporal features are extracted from the template, they are combined using a bag of words paradigm, and finally fed to a SVM for action classification. The method presented is shown to improve the state-of-art results on the standard Weizmann action dataset

References

  1. Arodz, T. (2005). Invariant object recognition using radonbased transform. Computers and Artificial Intelligence, 24:183-199.
  2. Blank, M., Gorelick, L., Shechtman, E., Irani, M., and Basri, R. (2005). Actions as space-time shapes. Computer Vision, IEEE International Conference on, 2:1395-1402 Vol. 2.
  3. Bosch, A., Zisserman, A., and Munoz, X. (2007). Image classification using random forests and ferns. Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on, pages 1-8.
  4. Csurka, G., Dance, C. R., Fan, L., Willamowski, J., and Bray, C. A. (2004). Visual categorization with bags of keypoints. pages 1-22.
  5. Goudelis, G., Karpouzis, K., and Kollias, S. (2013). Exploring trace transform for robust human action recognition. Pattern Recognition, 46(12):3238 - 3248.
  6. Jhuang, H., Serre, T., Wolf, L., and Poggio, T. (2007). A biologically inspired system for action recognition. Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on, pages 1-8.
  7. Karlsson, S. and Bigun, J. (2012). Lip-motion events analysis and lip segmentation using optical flow. pages 138-145.
  8. Kläser, A., Marszaek, M., and Schmid, C. (2008). A spatio-temporal descriptor based on 3d-gradients. In In BMVC08.
  9. Niebles, J., Wang, H., and Fei-Fei, L. (2008). Unsupervised learning of human action categories using spatialtemporal words. International Journal of Computer Vision, 79(3):299-318.
  10. Radon, J. (1917). Über die Bestimmung von Funktionen durch ihre Integralwerte längs gewisser Mannigfaltigkeiten. Akad. Wiss., 69:262-277.
  11. Schuldt, C., Laptev, I., and Caputo, B. (2004). Recognizing human actions: A local svm approach. In Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03, ICPR 7804, pages 32-36, Washington, DC, USA. IEEE Computer Society.
  12. Scovanner (2007). A 3-dimensional sift descriptor and its application to action recognition. pages 357-360.
  13. Shalev-Shwartz, S. and Zhang, T. (2013). Stochastic dual coordinate ascent methods for regularized loss. J. Mach. Learn. Res., 14(1):567-599.
  14. Souvenir, R. and Parrigan, K. (2009). Viewpoint manifolds for action recognition. J. Image Video Process., 2009:1:1-1:1.
  15. Tabbone, S., Wendling, L., and Salmon, J.-P. (2006). A new shape descriptor defined on the radon transform. Comput. Vis. Image Underst., 102(1):42-51.
  16. Vedaldi, A. and Zisserman, A. (2012). Efficient additive kernels via explicit feature maps. IEEE Trans. Pattern Anal. Mach. Intell., 34(3):480-492.
  17. Vishwakarma, D., Dhiman, A., Maheshwari, R., and Kapoor, R. (2015). Human motion analysis by fusion of silhouette orientation and shape features. Procedia Computer Science, 57:438 - 447.
  18. Wang, Y., Huang, K., and Tan, T. (2007). Human activity recognition based on r transform. In In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, pages 1-8.
  19. Zhu, P., Hu, W., Li, L., and Wei, Q. (2009). Human Activity Recognition Based on R Transform and Fourier Mellin Transform, pages 631-640. Springer Berlin Heidelberg, Berlin, Heidelberg.
Download


Paper Citation


in Harvard Style

Carmona J. and Climent J. (2017). Action Recognition using the Rf Transform on Optical Flow Images . In Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, (VISIGRAPP 2017) ISBN 978-989-758-225-7, pages 266-271. DOI: 10.5220/0006218002660271


in Bibtex Style

@conference{visapp17,
author={Josep Maria Carmona and Joan Climent},
title={Action Recognition using the Rf Transform on Optical Flow Images},
booktitle={Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, (VISIGRAPP 2017)},
year={2017},
pages={266-271},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006218002660271},
isbn={978-989-758-225-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, (VISIGRAPP 2017)
TI - Action Recognition using the Rf Transform on Optical Flow Images
SN - 978-989-758-225-7
AU - Carmona J.
AU - Climent J.
PY - 2017
SP - 266
EP - 271
DO - 10.5220/0006218002660271