Learning Weighted Joint-based Features for Action Recognition using Depth Camera
Guang Chen, Daniel Clarke, Alois Knoll
2014
Abstract
Human action recognition based on joints is a challenging task. The 3D positions of the tracked joints are very noisy if occlusions occur, which increases the intra-class variations in the actions. In this paper, we propose a novel approach to recognize human actions with weighted joint-based features. Previous work has focused on hand-tuned joint-based features, which are difficult and time-consuming to be extended to other modalities. In contrast, we compute the joint-based features using an unsupervised learning approach. To capture the intra-class variance, a multiple kernel learning approach is employed to learn the skeleton structure that combine these joints-base features. We test our algorithm on action application using Microsoft Research Action3D (MSRAction3D) dataset. Experimental evaluation shows that the proposed approach outperforms state-of-the art action recognition algorithms on depth videos.
References
- Bengio, Y., Lamblin, P., Popovici, D., and Larochelle, H. (2007). Greedy layer-wise training of deep networks. pages 153-160.
- Cheng, Z., Qin, L., Ye, Y., Huang, Q., and Tian, Q. (2012). Human daily action analysis with multi-view and color-depth data. In Proceedings of the 12th international conference on Computer Vision - Volume 2, ECCV'12, pages 52-61, Berlin, Heidelberg. SpringerVerlag.
- Dalal, N. and Triggs, B. (2005). Histograms of oriented gradients for human detection. In In CVPR, pages 886- 893.
- Han, L., Wu, X., Liang, W., Hou, G., and Jia, Y. (2010). Discriminative human action recognition in the learned hierarchical manifold space. Image Vision Comput., 28(5):836-849.
- Hinton, G. E., Osindero, S., and Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18(7):1527-1554.
- Hyvrinen, A., Hurri, J., and Hoyer, P. O. (2009). Natural Image Statistics: A Probabilistic Approach to Early Computational Vision. Springer Publishing Company, Incorporated, 1st edition.
- Laptev, I. (2005). On space-time interest points. Int. J. Comput. Vision, 64(2-3):107-123.
- Laptev, I., Marszalek, M., Schmid, C., and Rozenfeld, B. (2008). Learning realistic human actions from movies. In Conference on Computer Vision & Pattern Recognition.
- Lazebnik, S., Schmid, C., and Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, volume 2, pages 2169-2178.
- Le, Q., Zou, W., Yeung, S., and Ng, A. (2011). Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 3361-3368.
- Li, W., Zhang, Z., and Liu, Z. (2008). Expandable datadriven graphical modeling of human actions based on salient postures. IEEE Trans. Cir. and Sys. for Video Technol., 18(11):1499-1510.
- Li, W., Zhang, Z., and Liu, Z. (2010). Action recognition based on a bag of 3d points.
- Lv, F. and Nevatia, R. (2006). Recognition and segmentation of 3-d human action using hmm and multi-class adaboost. In Leonardis, A., Bischof, H., and Pinz, A., editors, Computer Vision ECCV 2006, volume 3954 of Lecture Notes in Computer Science, pages 359- 372. Springer Berlin Heidelberg.
- Müller, M. and Röder, T. (2006). Motion templates for automatic classification and retrieval of motion capture data. In Proceedings of the 2006 ACM SIGGRAPH/Eurographics symposium on Computer animation, SCA 7806, pages 137-146, Aire-la-Ville, Switzerland, Switzerland. Eurographics Association.
- Oreifej, O. and Liu, Z. (2013). Hon4d: Histogram of oriented 4d normals for activity recognition from depth sequences. In Computer Vision and Pattern Recognition (CVPR).
- Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., and Blake, A. (2011). Real-time human pose recognition in parts from single depth images. In Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 7811, pages 1297-1304, Washington, DC, USA. IEEE Computer Society.
- Vishwanathan, S. V. N., Sun, Z., Theera-Ampornpunt, N., and Varma, M. (2010). Multiple kernel learning and the SMO algorithm. In Advances in Neural Information Processing Systems.
- Wang, J., Liu, Z., Chorowski, J., Chen, Z., and Wu, Y. (2012a). Robust 3d action recognition with random occupancy patterns. In Proceedings of the 12th European conference on Computer Vision - Volume Part II, ECCV'12, pages 872-885, Berlin, Heidelberg. Springer-Verlag.
- Wang, J., Liu, Z., Wu, Y., and Yuan, J. (2012b). Mining actionlet ensemble for action recognition with depth cameras. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 1290- 1297.
- Xia, L. and Aggarwal, J. (2013). Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera. In Computer Vision and Pattern Recognition (CVPR).
- Xia, L., Chen, C.-C., and Aggarwal, J. (2012). View invariant human action recognition using histograms of 3d joints. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEE Computer Society Conference on, pages 20-27.
- Yang, X. and Tian, Y. (2012). Eigenjoints-based action recognition using nave-bayes-nearest-neighbor. In CVPR Workshops, pages 14-19. IEEE.
- Zhang, H. and Parker, L. (2011). 4-dimensional local spatio-temporal features for human activity recognition. In Intelligent Robots and Systems (IROS), 2011 IEEE/RSJ International Conference on, pages 2044- 2049.
- Zhao, Y., Liu, Z., Yang, L., and Cheng, H. (2012). Combing rgb and depth map features for human activity recognition. In Signal Information Processing Association Annual Summit and Conference (APSIPA ASC), 2012 Asia-Pacific, pages 1-4.
Paper Citation
in Harvard Style
Chen G., Clarke D. and Knoll A. (2014). Learning Weighted Joint-based Features for Action Recognition using Depth Camera . In Proceedings of the 9th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2014) ISBN 978-989-758-004-8, pages 549-556. DOI: 10.5220/0004735705490556
in Bibtex Style
@conference{visapp14,
author={Guang Chen and Daniel Clarke and Alois Knoll},
title={Learning Weighted Joint-based Features for Action Recognition using Depth Camera},
booktitle={Proceedings of the 9th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2014)},
year={2014},
pages={549-556},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004735705490556},
isbn={978-989-758-004-8},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 9th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2014)
TI - Learning Weighted Joint-based Features for Action Recognition using Depth Camera
SN - 978-989-758-004-8
AU - Chen G.
AU - Clarke D.
AU - Knoll A.
PY - 2014
SP - 549
EP - 556
DO - 10.5220/0004735705490556