Learning Weighted Joint-based Features for Action Recognition using Depth Camera

Guang Chen, Daniel Clarke, Alois Knoll

Abstract

Human action recognition based on joints is a challenging task. The 3D positions of the tracked joints are very noisy if occlusions occur, which increases the intra-class variations in the actions. In this paper, we propose a novel approach to recognize human actions with weighted joint-based features. Previous work has focused on hand-tuned joint-based features, which are difficult and time-consuming to be extended to other modalities. In contrast, we compute the joint-based features using an unsupervised learning approach. To capture the intra-class variance, a multiple kernel learning approach is employed to learn the skeleton structure that combine these joints-base features. We test our algorithm on action application using Microsoft Research Action3D (MSRAction3D) dataset. Experimental evaluation shows that the proposed approach outperforms state-of-the art action recognition algorithms on depth videos.

References

  1. Bengio, Y., Lamblin, P., Popovici, D., and Larochelle, H. (2007). Greedy layer-wise training of deep networks. pages 153-160.
  2. Cheng, Z., Qin, L., Ye, Y., Huang, Q., and Tian, Q. (2012). Human daily action analysis with multi-view and color-depth data. In Proceedings of the 12th international conference on Computer Vision - Volume 2, ECCV'12, pages 52-61, Berlin, Heidelberg. SpringerVerlag.
  3. Dalal, N. and Triggs, B. (2005). Histograms of oriented gradients for human detection. In In CVPR, pages 886- 893.
  4. Han, L., Wu, X., Liang, W., Hou, G., and Jia, Y. (2010). Discriminative human action recognition in the learned hierarchical manifold space. Image Vision Comput., 28(5):836-849.
  5. Hinton, G. E., Osindero, S., and Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18(7):1527-1554.
  6. Hyvrinen, A., Hurri, J., and Hoyer, P. O. (2009). Natural Image Statistics: A Probabilistic Approach to Early Computational Vision. Springer Publishing Company, Incorporated, 1st edition.
  7. Laptev, I. (2005). On space-time interest points. Int. J. Comput. Vision, 64(2-3):107-123.
  8. Laptev, I., Marszalek, M., Schmid, C., and Rozenfeld, B. (2008). Learning realistic human actions from movies. In Conference on Computer Vision & Pattern Recognition.
  9. Lazebnik, S., Schmid, C., and Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, volume 2, pages 2169-2178.
  10. Le, Q., Zou, W., Yeung, S., and Ng, A. (2011). Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 3361-3368.
  11. Li, W., Zhang, Z., and Liu, Z. (2008). Expandable datadriven graphical modeling of human actions based on salient postures. IEEE Trans. Cir. and Sys. for Video Technol., 18(11):1499-1510.
  12. Li, W., Zhang, Z., and Liu, Z. (2010). Action recognition based on a bag of 3d points.
  13. Lv, F. and Nevatia, R. (2006). Recognition and segmentation of 3-d human action using hmm and multi-class adaboost. In Leonardis, A., Bischof, H., and Pinz, A., editors, Computer Vision ECCV 2006, volume 3954 of Lecture Notes in Computer Science, pages 359- 372. Springer Berlin Heidelberg.
  14. Müller, M. and Röder, T. (2006). Motion templates for automatic classification and retrieval of motion capture data. In Proceedings of the 2006 ACM SIGGRAPH/Eurographics symposium on Computer animation, SCA 7806, pages 137-146, Aire-la-Ville, Switzerland, Switzerland. Eurographics Association.
  15. Oreifej, O. and Liu, Z. (2013). Hon4d: Histogram of oriented 4d normals for activity recognition from depth sequences. In Computer Vision and Pattern Recognition (CVPR).
  16. Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., and Blake, A. (2011). Real-time human pose recognition in parts from single depth images. In Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 7811, pages 1297-1304, Washington, DC, USA. IEEE Computer Society.
  17. Vishwanathan, S. V. N., Sun, Z., Theera-Ampornpunt, N., and Varma, M. (2010). Multiple kernel learning and the SMO algorithm. In Advances in Neural Information Processing Systems.
  18. Wang, J., Liu, Z., Chorowski, J., Chen, Z., and Wu, Y. (2012a). Robust 3d action recognition with random occupancy patterns. In Proceedings of the 12th European conference on Computer Vision - Volume Part II, ECCV'12, pages 872-885, Berlin, Heidelberg. Springer-Verlag.
  19. Wang, J., Liu, Z., Wu, Y., and Yuan, J. (2012b). Mining actionlet ensemble for action recognition with depth cameras. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 1290- 1297.
  20. Xia, L. and Aggarwal, J. (2013). Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera. In Computer Vision and Pattern Recognition (CVPR).
  21. Xia, L., Chen, C.-C., and Aggarwal, J. (2012). View invariant human action recognition using histograms of 3d joints. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEE Computer Society Conference on, pages 20-27.
  22. Yang, X. and Tian, Y. (2012). Eigenjoints-based action recognition using nave-bayes-nearest-neighbor. In CVPR Workshops, pages 14-19. IEEE.
  23. Zhang, H. and Parker, L. (2011). 4-dimensional local spatio-temporal features for human activity recognition. In Intelligent Robots and Systems (IROS), 2011 IEEE/RSJ International Conference on, pages 2044- 2049.
  24. Zhao, Y., Liu, Z., Yang, L., and Cheng, H. (2012). Combing rgb and depth map features for human activity recognition. In Signal Information Processing Association Annual Summit and Conference (APSIPA ASC), 2012 Asia-Pacific, pages 1-4.
Download


Paper Citation


in Harvard Style

Chen G., Clarke D. and Knoll A. (2014). Learning Weighted Joint-based Features for Action Recognition using Depth Camera . In Proceedings of the 9th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2014) ISBN 978-989-758-004-8, pages 549-556. DOI: 10.5220/0004735705490556


in Bibtex Style

@conference{visapp14,
author={Guang Chen and Daniel Clarke and Alois Knoll},
title={Learning Weighted Joint-based Features for Action Recognition using Depth Camera},
booktitle={Proceedings of the 9th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2014)},
year={2014},
pages={549-556},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004735705490556},
isbn={978-989-758-004-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 9th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2014)
TI - Learning Weighted Joint-based Features for Action Recognition using Depth Camera
SN - 978-989-758-004-8
AU - Chen G.
AU - Clarke D.
AU - Knoll A.
PY - 2014
SP - 549
EP - 556
DO - 10.5220/0004735705490556