Skeleton Point Trajectories for Human Daily Activity Recognition

Adrien Chan-Hon-Tong, Nicolas Ballas, Bertrand Delezoide, Catherine Achard, Laurent Lucat, Patrick Sayd, Françoise Prêteux

2013

Abstract

Automatic human action annotation is a challenging problem, which overlaps with many computer vision fields such as video-surveillance, human-computer interaction or video mining. In this work, we offer a skeleton based algorithm to classify segmented human-action sequences. Our contribution is twofold. First, we offer and evaluate different trajectory descriptors on skeleton datasets. Six short term trajectory features based on position, speed or acceleration are first introduced. The last descriptor is the most original since it extends the well-known bag-of-words approach to the bag-of-gestures ones for 3D position of articulations. All these descriptors are evaluated on two public databases with state-of-the art machine learning algorithms. The second contribution is to measure the influence of missing data on algorithms based on skeleton. Indeed skeleton extraction algorithms commonly fail on real sequences, with side or back views and very complex postures. Thus on these real data, we offer to compare recognition methods based on image and those based on skeleton with many missing data.

References

  1. Aggarwal, J. K. and Ryoo, M. S. (2011). Human activity analysis: A review. ACM Comput. Surv.
  2. Baak, A., Muller, M., Bharaj, G., Seidel, H., and Theobalt, C. (2011). A data-driven approach for real-time full body pose reconstruction from a depth camera. In Computer Vision (ICCV), 2011 IEEE International Conference on, pages 1092-1099. IEEE.
  3. Ballas, N., Delezoide, B., and Preˆteux, F. (2011). Trajectories based descriptor for dynamic events annotation. In Proceedings of the 2011 joint ACM workshop on Modeling and representing events, pages 13-18. ACM.
  4. Barnachon, M., Bouakaz, S., Guillou, E., and Boufama, B. (2012). Interprétation de mouvements temps réel. In RFIA.
  5. Bashir, F., Khokhar, A., and Schonfeld, D. (2007). Object trajectory-based activity classification and recognition using hidden markov models. Image Processing, IEEE Transactions on, 16(7):1912-1919.
  6. Breiman, L. (1992). Probability. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA.
  7. Campbell, L. and Bobick, A. (1995). Recognition of human body motion using phase space constraints. In Computer Vision, 1995. Proceedings., Fifth International Conference on, pages 624-630. IEEE.
  8. Chang, C. and Lin, C. (2011). Libsvm: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3):27.
  9. Fengjun, L., Nevatia, R., and Lee, M. W. (2005). 3d human action recognition using spatio-temporal motion templates. ICCV'05.
  10. Girshick, R., Shotton, J., Kohli, P., Criminisi, A., and Fitzgibbon, A. (2011). Efficient regression of generalactivity human poses from depth images. In Computer Vision (ICCV), 2011 IEEE International Conference on, pages 415-422. IEEE.
  11. He, H. and Ghodsi, A. (2010). Rare class classification by support vector machine. In Pattern Recognition (ICPR), 2010 20th International Conference on, pages 548-551. IEEE.
  12. Heikkilä, M., Pietikäinen, M., and Schmid, C. (2009). Description of interest regions with local binary patterns. Pattern Recognition, 42(3):425-436.
  13. Hsu, C. and Lin, C. (2002). A comparison of methods for multiclass support vector machines. Neural Networks, IEEE Transactions on, 13(2):415-425.
  14. Johansson, G. (1973). Visual perception of biological motion and a model for its analysis. Attention, Perception, & Psychophysics, 14(2):201-211.
  15. Just, A., Marcel, S., and Bernier, O. (2004). Hmm and iohmm for the recognition of mono-and bi-manual 3d hand gestures. In ICPR workshop on Visual Observation of Deictic Gestures (POINTING04).
  16. Laptev, I., Marszalek, M., Schmid, C., and Rozenfeld, B. (2008). Learning realistic human actions from movies. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pages 1-8. IEEE.
  17. Lazebnik, S., Schmid, C., and Ponce, J. (2005). A sparse texture representation using local affine regions. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 27(8):1265-1278.
  18. Li, W., Zhang, Z., and Liu, Z. (2010). Action recognition based on a bag of 3d points. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 IEEE Computer Society Conference on, pages 9-14. IEEE.
  19. Liu, J., Luo, J., and Shah, M. (2009). Recognizing realistic actions from videos 'in the wild'. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 1996-2003. IEEE.
  20. Lowe, D. (1999). Object recognition from local scaleinvariant features. In Computer Vision, 1999. The Proceedings of the Seventh IEEE International Conference on, volume 2, pages 1150-1157. Ieee.
  21. Matikainen, P., Hebert, M., and Sukthankar, R. (2010). Trajectons: Action recognition through the motion analysis of tracked features. In Computer Vision Workshops (ICCV Workshops), 2009 IEEE 12th International Conference on.
  22. Messing, R., Pal, C., and Kautz, H. (2009). Activity recognition using the velocity histories of tracked keypoints. In Computer Vision, 2009 IEEE 12th International Conference on, pages 104-111. IEEE.
  23. Mezaris, V., Dimou, A., and Kompatsiaris, I. (2010). Local invariant feature tracks for high-level video feature extraction. In Image Analysis for Multimedia Interactive Services (WIAMIS), 2010 11th International Workshop on, pages 1-4. IEEE.
  24. Müller, M. and Röder, T. (2006). Motion templates for automatic classification and retrieval of motion capture data. In Proc. of the 2006 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pages 137-146.
  25. Ni, B., Wang, G., and Moulin, P. (2011). Rgbd-hudaact: A color-depth video database for human daily activity recognition. In Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on, pages 1147-1153. IEEE.
  26. Ni, B., Yan, S., and Kassim, A. (2009). Contextualizing histogram. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 1682-1689. Ieee.
  27. Parsa, K., Angeles, J., and Misra, A. (2004). Rigid-body pose and twist estimation using an accelerometer array. Archive of Applied Mechanics, 74(3):223-236.
  28. Raptis, M., Kirovski, D., and Hoppe, H. (2011). Real-time classification of dance gestures from skeleton animation. In Proceedings of the 10th Annual ACM SIGGRAPH/Eurographics Symposium on Computer Animation, SCA 2011, pages 147-156.
  29. Raptis, M. and Soatto, S. (2010). Tracklet Descriptors for Action Modeling and Video Analysis. Computer Vision-ECCV 2010, pages 577-590.
  30. Rodriguez, M. D., Ahmed, J., and Shah, M. (2008). Action mach: a spatio-temporal maximum average correlation height filter for action recognition. In In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition.
  31. Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., and Blake, A. (2011). Real-time human pose recognition in parts from single depth images. In CVPR, volume 2, page 7.
  32. Sivic, J. and Zisserman, A. (2003). Video google: A text retrieval approach to object matching in videos. In Computer Vision, 2003. Proceedings. Ninth IEEE International Conference on, pages 1470-1477. Ieee.
  33. Sonnenburg, S., Ratsh, G., Henschel, S., and C., W. (2010). The shogun machine learning toolbox. The Journal of Machine Learning Research, 99:1799-1802.
  34. Sun, J., Wu, X., Yan, S., Cheong, L., Chua, T., and Li, J. (2009). Hierarchical spatio-temporal context modeling for action recognition. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 2004-2011. Ieee.
  35. Sung, J., Ponce, C., Selman, B., and Saxena, A. (2011). Human activity detection from rgbd images. In AAAI workshop on Pattern, Activity and Intent Recognition (PAIR).
  36. Tenorth, M., Bandouch, J., and Beetz, M. (2009). The tum kitchen data set of everyday manipulation activities for motion tracking and action recognition. In Computer Vision Workshops (ICCV Workshops), 2009 IEEE 12th International Conference on, pages 1089- 1096. IEEE.
  37. Yao, A., Gall, J., Fanelli, G., and Van Gool., L. (2011). Does human action recognition benefit from pose estimation? In BMVC.
  38. Yao, A., Gall, J., and Van Gool, L. (2010). A hough transform-based voting framework for action recognition. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 2061- 2068. IEEE.
Download


Paper Citation


in Harvard Style

Chan-Hon-Tong A., Ballas N., Delezoide B., Achard C., Lucat L., Sayd P. and Prêteux F. (2013). Skeleton Point Trajectories for Human Daily Activity Recognition . In Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2013) ISBN 978-989-8565-47-1, pages 520-529. DOI: 10.5220/0004202805200529


in Bibtex Style

@conference{visapp13,
author={Adrien Chan-Hon-Tong and Nicolas Ballas and Bertrand Delezoide and Catherine Achard and Laurent Lucat and Patrick Sayd and Françoise Prêteux},
title={Skeleton Point Trajectories for Human Daily Activity Recognition},
booktitle={Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2013)},
year={2013},
pages={520-529},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004202805200529},
isbn={978-989-8565-47-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2013)
TI - Skeleton Point Trajectories for Human Daily Activity Recognition
SN - 978-989-8565-47-1
AU - Chan-Hon-Tong A.
AU - Ballas N.
AU - Delezoide B.
AU - Achard C.
AU - Lucat L.
AU - Sayd P.
AU - Prêteux F.
PY - 2013
SP - 520
EP - 529
DO - 10.5220/0004202805200529