REAL-TIME POSE ESTIMATION USING TREE STRUCTURES BUILT FROM SKELETONISED VOLUME SEQUENCES

Rune Havnung Bakken, Adrian Hilton

Abstract

Pose estimation in the context of human motion analysis is the process of approximating the body configuration in each frame of a motion sequence. We propose a novel pose estimation method based on constructing tree structures from skeletonised visual hulls reconstructed from multi-view video. The pose is estimated independently in each frame, so the method can recover from errors in previous frames, which overcomes the problems of tracking. Publically available datasets were used to evaluate the method. On real data the method performs at a framerate of 15–64 fps depending on the resolution of the volume. Using synthetic data the positions of the extremities were determined with a mean error of 47–53 mm depending on the resolution.

References

  1. Bertrand, G. and Couprie, M. (2006). A New 3D Parallel Thinning Scheme Based on Critical Kernels. Discrete Geometry for Computer Imagery (LNCS), 4245:580- 591.
  2. Blum, H. (1967). A transformation for extracting new descriptors of shape. Models for the perception of speech and visual form, 19(5):362-380.
  3. Brostow, G. J., Essa, I., Steedly, D., and Kwatra, V. (2004). Novel skeletal representation for articulated creatures. Computer Vision - ECCV (LNCS), 3023:66-78.
  4. Caillette, F., Galata, A., and Howard, T. (2008). Real-time 3-d human body tracking using learnt models of behaviour. Computer Vision and Image Understanding, 109(2):112-125.
  5. Chen, Y.-l. and Chai, J. (2009). 3D Reconstruction of Human Motion and Skeleton from Uncalibrated Monocular Video. Computer Vision - ACCV (LNCS), 5994:71-82.
  6. Chu, C.-W., Jenkins, O. C., and Mataric, M. J. (2003). Markerless Kinematic Model and Motion Capture from Volume Sequences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 475-482.
  7. Cornea, N. D., Silver, D., and Min, P. (2007). Curveskeleton properties, applications, and algorithms. IEEE Transactions on Visualization and Computer Graphics, 13(3):530-548.
  8. Fauske, E., Eliassen, L. M., and Bakken, R. H. (2009). A Comparison of Learning Based Background Subtraction Techniques Implemented in CUDA. In Proceedings of the First Norwegian Artificial Intelligence Symposium, pages 181-192.
  9. Gkalelis, N., Kim, H., Hilton, A., Nikolaidis, N., and Pitas, I. (2009). The i3DPost multi-view and 3D human action/interaction database. In Proceedings of the Conference for Visual Media Production, pages 159-168.
  10. Laurentini, A. (1994). The Visual Hull Concept for Silhouette-Based Image Understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(2):150-162.
  11. Menier, C., Boyer, E., and Raffin, B. (2006). 3D SkeletonBased Body Pose Recovery. In Proceedings of the Third International Symposium on 3D Data Processing, Visualization, and Transmission, pages 389-396.
  12. Michoud, B., Guillou, E., and Bouakaz, S. (2007). Realtime and markerless 3D human motion capture using multiple views. Human Motion - Understanding, Modeling, Capture and Animation (LNCS), 4814:88- 103.
  13. Moeslund, T. B., Hilton, A., and Krüger, V. (2006). A survey of advances in vision-based human motion capture and analysis. Computer Vision and Image Understanding, 104:90-126.
  14. Moschini, D. and Fusiello, A. (2009). Tracking Human Motion with Multiple Cameras Using an Articulated Model. Computer Vision/Computer Graphics Collaboration Techniques (LNCS), 5496:1-12.
  15. Poppe, R. (2007). Vision-based human motion analysis: An overview. Computer Vision and Image Understanding, 108(1-2):4-18.
  16. Raynal, B., Couprie, M., and Nozick, V. (2010). Generic Initialization for Motion Capture from 3D Shape. Image Analysis and Recognition (LNCS), 6111:306-315.
  17. Starck, J., Maki, A., Nobuhara, S., Hilton, A., and Matsuyama, T. (2009). The Multiple-Camera 3-D Production Studio. IEEE Transactions on Circuits and Systems for Video Technology, 19(6):856-869.
  18. Sundaresan, A. and Chellappa, R. (2008). Model-driven segmentation of articulating humans in Laplacian Eigenspace. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(10):1771-1785.
  19. Svensson, S., Nyström, I., and Sanniti di Baja, G. (2002). Curve skeletonization of surface-like objects in 3D images guided by voxel classification. Pattern Recognition Letters, 23:1419-1426.
  20. Theobalt, C., de Aguiar, E., Magnor, M. A., Theisel, H., and Seidel, H.-P. (2004). Marker-free kinematic skeleton estimation from sequences of volume data. Proceedings of the ACM symposium on Virtual reality software and technology - VRST 7804, D:57.
Download


Paper Citation


in Harvard Style

Havnung Bakken R. and Hilton A. (2012). REAL-TIME POSE ESTIMATION USING TREE STRUCTURES BUILT FROM SKELETONISED VOLUME SEQUENCES . In Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2012) ISBN 978-989-8565-04-4, pages 181-190. DOI: 10.5220/0003858501810190


in Bibtex Style

@conference{visapp12,
author={Rune Havnung Bakken and Adrian Hilton},
title={REAL-TIME POSE ESTIMATION USING TREE STRUCTURES BUILT FROM SKELETONISED VOLUME SEQUENCES},
booktitle={Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2012)},
year={2012},
pages={181-190},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003858501810190},
isbn={978-989-8565-04-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2012)
TI - REAL-TIME POSE ESTIMATION USING TREE STRUCTURES BUILT FROM SKELETONISED VOLUME SEQUENCES
SN - 978-989-8565-04-4
AU - Havnung Bakken R.
AU - Hilton A.
PY - 2012
SP - 181
EP - 190
DO - 10.5220/0003858501810190