A FAST VOTING-BASED TECHNIQUE FOR HUMAN ACTION RECOGNITION IN VIDEO SEQUENCES

Duc-Hieu Tran; Wooi-Boon Goh

doi:10.5220/0003850606130619

A FAST VOTING-BASED TECHNIQUE FOR HUMAN ACTION RECOGNITION IN VIDEO SEQUENCES

Duc-Hieu Tran, Wooi-Boon Goh

2012

Abstract

Human action recognition has been an active research area in recent years. However, building a robust human action recognition system still remains a challenging task due to the large variations in action classes, varying human appearances, illumination changes, camera motion, occlusions and background clutter. Most previous work focus on the goal of improving recognition rates. This paper describes a computationally fast votingbased approach for human action recognition, in which the action in the video sequence is recognized based on the support of the local spatio-temporal features. The proposed technique requires no parameter tuning and can produce recognition rates that are comparable to those in recent published literature. Moreover, the technique can localize the single human action in the video sequence without much additional computation. Recognition results on the KTH and Weizmann action dataset are presented.

References

Aggarwal, J. and Ryoo, M. S. (2011). Human Activity Analysis : A Review. ACM Computing Surveys.
Blank, M., Gorelick, L., Shechtman, E., Irani, M., and Basri, R. (2005). Actions as space-time shapes. In CVPR, volume 29, pages 1395-1402.
Bobick, A. and Davis, J. (2002). The recognition of human movement using temporal templates. TPAMI, 23(3):257-267.
Boiman, O., Shechtman, E., and Irani, M. (2008). In defense of Nearest-Neighbor based image classification. In CVPR, pages 1-8.
Breiman, L. (2001). Random forests. ML, 45(1):5-32.
Chang, C.-C. and Lin, C.-J. (2001). Libsvm: a library for support vector machines.
Dollar, P., Rabaud, V., Cottrell, G., and Serge, B. (2005). Behavior Recognition via Sparse Spatio-Temporal Features. In VS-PETS, pages 65-72.
Gall, J. and Lempitsky, V. (2009). Class-Specific Hough Forests for Object Detection. In CVPR, pages 1022- 1029.
Grauman, K. and Darrell, T. (2007). The pyramid match kernel: Efficient learning with sets of features. JMLR, 8(2):725-760.
Klaser, A., Marszalek, M., and Schmid, C. (2008). A spatiotemporal descriptor based on 3D-gradients. In BMVC.
Laptev, I. (2005). On Space-Time Interest Points. IJCV, 64(2-3):107-123.
Laptev, I., Marszalek, M., Schmid, C., and Rozenfeld, B. (2008). Learning realistic human actions from movies. In CVPR, pages 1-8.
Lepetit, V., Lagger, P., and Fua, P. (2005). Randomized trees for real-time keypoint recognition. In CVPR, pages 775-781.
Liu, J., Ali, S., and Shah, M. (2008). Recognizing human actions using multiple features. In CVPR, pages 1-8.
Moosmann, F., Triggs, B., and Jurie, F. (2006). Fast discriminative visual codebooks using randomized clustering forests. In NIPS, pages 985-992. MIT Press.
Muja, M. and Lowe, D. (2009). Fast approximate nearest neighbors with automatic algorithm configuration. In VISAPP, pages 331-340.
Niebles, J. C., Wang, H., and Fei-Fei, L. (2008). Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words. IJCV, 79(3):299-318.
Schindler, K. and Gool, L. V. (2008). Action Snippets: How many frames does human action recognition require? In CVPR, pages 1-8.
Schuldt, C., Laptev, I., and Caputo, B. (2004). Recognizing human actions: a local SVM approach. In ICPR, volume 3, pages 32-36.
Scovanner, P., Ali, S., and Shah, M. (2007). A 3- dimensional sift descriptor and its application to action recognition. In ACM MM, page 357.
Silpa-Anan, C. and Hartley, R. (2008). Optimised KD-trees for fast image descriptor matching. In CVPR.
Wang, H., Ullah, M. M., Klaser, A., Laptev, I., and Schmid, C. (2009). Evaluation of local spatio-temporal features for action recognition. In BMVC, pages 1-11.
Weinland, D., Ronfard, R., and Boyer, E. (2006). Free viewpoint action recognition using motion history volumes. CVIU, 104(2-3):249-257.
Willems, G., Tuytelaars, T., and Gool, L. V. (2008). An efficient dense and scale-invariant spatio-temporal interest point detector. In ECCV, pages 650-663.
Yao, A., Gall, J., and Gool, L. V. (2010). A Hough Transform-Based Voting Framework for Action Recognition. In CVPR.
Yilmaz, A. and Shah, M. (2005). Actions sketch: A novel action representation. In CVPR, volume 1, pages 984- 989.
Yuan, J., Liu, Z., and Wu, Y. (2009). Discriminative subvolume search for efficient action detection. In CVPR, pages 2442-2449.

Download

Paper Citation

in Harvard Style

Tran D. and Goh W. (2012). A FAST VOTING-BASED TECHNIQUE FOR HUMAN ACTION RECOGNITION IN VIDEO SEQUENCES . In Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2012) ISBN 978-989-8565-03-7, pages 613-619. DOI: 10.5220/0003850606130619

in Bibtex Style

@conference{visapp12,
author={Duc-Hieu Tran and Wooi-Boon Goh},
title={A FAST VOTING-BASED TECHNIQUE FOR HUMAN ACTION RECOGNITION IN VIDEO SEQUENCES},
booktitle={Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2012)},
year={2012},
pages={613-619},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003850606130619},
isbn={978-989-8565-03-7},
}

in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2012)
TI - A FAST VOTING-BASED TECHNIQUE FOR HUMAN ACTION RECOGNITION IN VIDEO SEQUENCES
SN - 978-989-8565-03-7
AU - Tran D.
AU - Goh W.
PY - 2012
SP - 613
EP - 619
DO - 10.5220/0003850606130619