Action Categorization based on Arm Pose Modeling

Chongguo Li, Nelson H. C. Yung

2014

Abstract

This paper proposes a novel method to categorize human action based on arm pose modeling. Traditionally, human action categorization relies much on the extracted features from video or images. In this research, we exploit the relationship between action categorization and arm pose modeling, which can be visualized in a graphic model. Given visual observations, both states can be estimated by maximum a posteriori (MAP) in that arm poses are first estimated under the hypothesis of action category by dynamic programming, and then action category hypothesis is validated by soft-max model based on the estimated arm poses. The prior distribution for every action is estimated by a semi-parametric estimator in advance, and pixel-based dense features including LBP, SIFT, colour-SIFT, and texton are utilized to enhance the likelihood computation by the joint Adaboosting algorithm. The proposed method has been evaluated on videos of walking, waving and jog from the HumanEva-I dataset. It is found to have arm pose modeling performance better than the method of mixtures of parts, and action categorization success rate of 96.69%.

References

  1. Blank, M., Gorelick, L., Shechtman, E., Irani, M., and Basri, R. (2005). Actions as space-time shapes. In ICCV, volume 2, pages 1395-1402. IEEE.
  2. Conaire, C. O., O'Connor, N. E., and Smeaton, A. F. (2007). Detector adaptation by maximising agreement between independent data sources. In CVPR, pages 1- 6. IEEE.
  3. Davison, A. C. and Smith, R. L. (1990). Models for exceedances over high thresholds. Journal of the Royal Statistical Society. Series B (Methodological), pages 393-442.
  4. Dollár, P., Rabaud, V., Cottrell, G., and Belongie, S. (2005). Behavior recognition via sparse spatio-temporal features. In VS-PETS, pages 65-72. IEEE.
  5. Duan, K., Keerthi, S. S., Chu, W., Shevade, S. K., and Poo, A. N. (2003). Multi-category classification by softmax combination of binary classifiers. In Multiple Classifier Systems, pages 125-134. Springer.
  6. Elgammal, A., Shet, V., Yacoob, Y., and Davis, L. S. (2003). Learning dynamics for exemplar-based gesture recognition. In CVPR, volume 1, pages I-571. IEEE.
  7. Fathi, A. and Mori, G. (2008). Action recognition by learning mid-level motion features. In CVPR, pages 1-8. IEEE.
  8. Felzenszwalb, P. F. and Zabih, R. (2011). Dynamic programming and graph algorithms in computer vision. PAMI, 33(4):721-740.
  9. Ferrari, V., Marin-Jimenez, M., and Zisserman, A. (2008). Progressive search space reduction for human pose estimation. In CVPR, pages 1-8. IEEE.
  10. Gong, W. et al. (2013). 3D Motion Data aided Human Action Recognition and Pose Estimation. PhD thesis, Universitat Autònoma de Barcelona.
  11. Laptev, I. (2005). On space-time interest points. IJCV, 64(2- 3):107-123.
  12. Li, C. and Yung, N. (2012). Arm pose modeling for visual surveillance. In IPCV, pages 340-347.
  13. Martin, D. R., Fowlkes, C. C., and Malik, J. (2004). Learning to detect natural image boundaries using local brightness, color, and texture cues. PAMI, 26(5):530- 549.
  14. Moeslund, T. B., Hilton, A., Krüger, V., and Sigal, L. (2011). Visual analysis of humans: looking at people. Springer.
  15. Natarajan, P. and Nevatia, R. (2012). Hierarchical multichannel hidden semi markov graphical models for activity recognition. CVIU.
  16. Niebles, J. C., Wang, H., and Fei-Fei, L. (2008). Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words. IJCV, 79(3):299-318.
  17. Rodriguez, M., Ahmed, J., and Shah, M. (2008). Action mach a spatio-temporal maximum average correlation height filter for action recognition. In CVPR, pages 1-8.
  18. Sadanand, S. and Corso, J. J. (2012). Action bank: A highlevel representation of activity in video. In CVPR, pages 1234-1241. IEEE.
  19. Scarrott, C. and MacDonald, A. (2012). A review of extreme value threshold es-timation and uncertainty quantification. REVSTAT-Statistical Journal, 10(1):33-60.
  20. Schuldt, C., Laptev, I., and Caputo, B. (2004). Recognizing human actions: a local svm approach. In ICPR, volume 3, pages 32-36. IEEE.
  21. Sigal, L. and Black, M. J. (2006). Humaneva: Synchronized video and motion capture dataset for evaluation of articulated human motion. Brown Univertsity TR, 120.
  22. Torralba, A., Murphy, K. P., and Freeman, W. T. (2004). Sharing features: efficient boosting procedures for multiclass object detection. In CVPR, volume 2, pages II-762. IEEE.
  23. Wang, L. and Yung, N. H. (2010). Extraction of moving objects from their background based on multiple adaptive thresholds and boundary evaluation. ITS, 11(1):40-51.
  24. Xu, R., Agarwal, P., Kumar, S., Krovi, V. N., and Corso, J. J. (2012). Combining skeletal pose with local motion for human activity recognition. In Articulated Motion and Deformable Objects, pages 114-123. Springer.
  25. Yamato, J., Ohya, J., and Ishii, K. (1992). Recognizing human action in time-sequential images using hidden markov model. In CVPR, pages 379-385. IEEE.
  26. Yang, Y. and Ramanan, D. (2011). Articulated pose estimation with flexible mixtures-of-parts. In CVPR, pages 1385-1392. IEEE.
  27. Yao, A., Gall, J., Fanelli, G., and Van Gool, L. (2011). Does human action recognition benefit from pose estimation?”. In BMVC, pages 67.1-67.11.
Download


Paper Citation


in Harvard Style

Li C. and Yung N. (2014). Action Categorization based on Arm Pose Modeling . In Proceedings of the 9th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2014) ISBN 978-989-758-004-8, pages 39-47. DOI: 10.5220/0004671500390047


in Bibtex Style

@conference{visapp14,
author={Chongguo Li and Nelson H. C. Yung},
title={Action Categorization based on Arm Pose Modeling},
booktitle={Proceedings of the 9th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2014)},
year={2014},
pages={39-47},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004671500390047},
isbn={978-989-758-004-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 9th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2014)
TI - Action Categorization based on Arm Pose Modeling
SN - 978-989-758-004-8
AU - Li C.
AU - Yung N.
PY - 2014
SP - 39
EP - 47
DO - 10.5220/0004671500390047