Discriminative Sequence Back-constrained GP-LVM for MOCAP based Action Recognition

Valsamis Ntouskos, Panagiotis Papadakis, Fiora Pirri

2013

Abstract

In this paper we address the problem of human action recognition within Motion Capture sequences. We introduce a method based on Gaussian Process Latent Variable Models and Alignment Kernels. We build a new discriminative latent variable model with back-constraints induced by the similarity of the original sequences. We compare the proposed method with a standard sequence classification method based on Dynamic Time Warping and with the recently introduced V-GPDS model which is able to model highly dimensional dynamical systems. The proposed methodology exhibits high performance even for datasets that have not been manually preprocessed while it further allows fast inference by exploiting the back constraints.

References

  1. Aggarwal, J. K. and Cai, Q. (1999). Human motion analysis: A review. Computer Vision and Image Understanding, 73(3):428-440.
  2. Bahlmann, C., Haasdonk, B., and Burkhardt, H. (2002). On-line handwriting recognition with support vector machines-a kernel approach. In International Workshop on Frontiers in Handwriting Recognition, pages 49-54.
  3. CMU (2003). Carnegie-mellon mocap database. http:// mocap.cs.cmu.edu/.
  4. Cuturi, M., Vert, J.-P., Birkenes, O., and Matsui, T. (2006). A kernel for time series based on global alignments. Compute Research Repository.
  5. Damianou, A. C., Titsias, M. K., and Lawrence, N. D. (2011). Variational gaussian process dynamical systems. In Neural Information Processing Systems Conference, pages 2510-2518.
  6. Gong, D. and Medioni, G. (2011). Dynamic manifold warping for view invariant action recognition. In International Conference on Computer Vision.
  7. Härdle, W. and Simar, W. (2003). Applied Multivariate Statistical Analysis. Springer Verlag.
  8. Lawrence, N. D. (2003). Gaussian process latent variable models for visualisation of high dimensional data. In Neural Information Processing Systems Conference.
  9. Lawrence, N. D. and Quin˜onero Candela, J. (2006). Local distance preservation in the gp-lvm through back constraints. In International Conference on Machine learning, pages 513-520.
  10. Li, Y., Fermüller, C., Aloimonos, Y., and Ji, H. (2010). Learning shift-invariant sparse representation of actions. In International Conference on Computer Vision and Pattern Recognition, pages 2630-2637.
  11. Microsoft, C. (2010). en-US/kinect.
  12. mocapdata.com (2011). www.mocapdata.com/.
  13. Moeslund, T. B., Hilton, A., and Krüger, V. (2006). A survey of advances in vision-based human motion capture and analysis. Computer Vision and Image Understanding, 104(2-3):90-126.
  14. Mordohai, P. and Medioni, G. G. (2010). Dimensionality estimation, manifold learning and function approximation using tensor voting. Journal of Machine Learning Research, 11:411-450.
  15. Müller, M. (2007). Information Retrieval for Music and Motion. Springer Verlag.
  16. Müller, M., Röder, T., and Clausen, M. (2005). Efficient content-based retrieval of motion capture data. In SIGGRAPH, pages 677-685.
  17. Muller, M., Roder, T., Clausen, M., Eberhardt, B., Krüger, B., and Weber, A. (2007). Documentation mocap database hdm05. Technical Report CG-2007-2, Universität Bonn.
  18. Ntouskos, V., Papadakis, P., and Pirri, F. (2012). A comprehensive analysis of human motion capture data for action recognition. In Proceedings of the International Conference on Computer Vision Theory and Applications, pages 647-652.
  19. Poggio, T. (1985). Early vision: From computational structure to algorithms and parallel hardware. Computer Vision, Graphics, and Image Processing, 31(2):139- 155.
  20. Rasmussen, C. and Williams, C. (2006). Gaussian processes for machine learning. Adaptive computation and machine learning. MIT Press.
  21. Roweis, S. and Saul, L. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323-2326.
  22. Sheikh, Y., Sheikh, M., and Shah, M. (2005). Exploring the space of a human action. International Conference on Computer Vision, 1:144-149.
  23. Shimodaira, H., Noma, K., Nakai, M., and Sagayama, S. (2001). Dynamic Time-Alignment Kernel in Support Vector Machine. Neural Information Processing Systems Conference, 2:921-928.
  24. Taylor, G. W., Hinton, G. E., and Roweis, S. T. (2006). Modeling human motion using binary latent variables. In Neural Information Processing Systems Conference, pages 1345-1352.
  25. Tenenbaum, J. B., Silva, V. D., and Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science.
  26. Titsias, M. K. and Lawrence, N. D. (2010). Bayesian gaussian process latent variable model. Journal of Machine Learning Research - Proceedings Track, 9:844- 851.
  27. Turaga, P. K., Chellappa, R., Subrahmanian, V. S., and Udrea, O. (2008). Machine recognition of human activities: A survey. IEEE Transasctions on Circuits and Systems for Video Technology, 18(11):1473-1488.
  28. Urtasun, R. and Darrell, T. (2007). Discriminative gaussian process latent variable model for classification. In International Conference on Machine Learning, pages 927-934.
  29. Urtasun, R., Fleet, D. J., and Fua, P. (2006). 3d people tracking with gaussian process dynamical models. In International Conference on Computer Vision and Pattern Recognition, pages 238-245.
  30. Urtasun, R., Fleet, D. J., Geiger, A., Popovic, J., Darrell, T., and Lawrence, N. D. (2008). Topologicallyconstrained latent variable models. In International Conference on Machine Learning, pages 1080-1087.
  31. Waltisberg, D., Yao, A., Gall, J., and Van Gool, L. (2010). Variations of a hough-voting action recognition system. In International conference on Pattern Recognition, pages 306-312.
  32. Wang, J. M., Fleet, D. J., and Hertzmann, A. (2006). Gaussian process dynamical models. In Neural Information Processing Systems Conference, volume 18, pages 1441-1448.
  33. Yao, A., Gall, J., Fanelli, G., and Gool, L. V. (2011). Does human action recognition benefit from pose estimation? In British Machine Vision Conference, pages 67.1-67.11.
  34. Yao, A., Gall, J., and Gool, L. J. V. (2010). A hough transform-based voting framework for action recognition. In International Conference on Computer Vision and Pattern Recognition, pages 2061-2068.
  35. Zhang, X. and Fan, G. (2011). Joint gait-pose manifold for video-based human motion estimation. In European Conference on Computer Vision, pages 47-54.
Download


Paper Citation


in Harvard Style

Ntouskos V., Papadakis P. and Pirri F. (2013). Discriminative Sequence Back-constrained GP-LVM for MOCAP based Action Recognition . In Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-8565-41-9, pages 87-96. DOI: 10.5220/0004268600870096


in Bibtex Style

@conference{icpram13,
author={Valsamis Ntouskos and Panagiotis Papadakis and Fiora Pirri},
title={Discriminative Sequence Back-constrained GP-LVM for MOCAP based Action Recognition},
booktitle={Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2013},
pages={87-96},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004268600870096},
isbn={978-989-8565-41-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - Discriminative Sequence Back-constrained GP-LVM for MOCAP based Action Recognition
SN - 978-989-8565-41-9
AU - Ntouskos V.
AU - Papadakis P.
AU - Pirri F.
PY - 2013
SP - 87
EP - 96
DO - 10.5220/0004268600870096