STATIC POSE ESTIMATION FROM DEPTH IMAGES USING RANDOM REGRESSION FORESTS AND HOUGH VOTING

Brian Holt, Richard Bowden

Abstract

Robust and fast algorithms for estimating the pose of a human given an image would have a far reaching impact on many fields in and outside of computer vision. We address the problem using depth data that can be captured inexpensively using consumer depth cameras such as the Kinect sensor. To achieve robustness and speed on a small training dataset, we formulate the pose estimation task within a regression and Hough voting framework. Our approach uses random regression forests to predict joint locations from each pixel and accumulate these predictions with Hough voting. The Hough accumulator images are treated as likelihood distributions where maxima correspond to joint location hypotheses. We demonstrate our approach and compare to the state-of-the-art on a publicly available dataset.

References

  1. Agarwal, A. and Triggs, B. (2006). Recovering 3D human pose from monocular images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(1):44 - 58.
  2. Amit, Y. and Geman, D. (1997). Shape quantization and recognition with randomized trees. Neural Computation, 9(7):1545-1588.
  3. Andriluka, M., Roth, S., and Schiele, B. (2009). Pictorial structures revisited: People detection and articulated pose estimation. In (IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2009), pages 1014 -1021.
  4. Bo, L. and Sminchisescu, C. (2010). Twin gaussian processes for structured prediction. International Journal of Computer Vision, 87:28-52.
  5. Bourdev, L., Maji, S., Brox, T., and Malik, J. (2010). Detecting people using mutually consistent poselet activations. In (ECCV, 2010), pages 168 - 181.
  6. Breiman, L. (2001). Random Forests. Machine Learning, 45:5-32.
  7. Breiman, L., Friedman, J., Olshen, R., and Stone, C. (1984). Classification and regression trees. Chapman and Hall.
  8. Criminisi, A., Shotton, J., Robertson, D., and Konukoglu, E. (2011). Regression forests for efficient anatomy detection and localization in CT studies. In Medical Computer Vision. Recognition Techniques and Applications in Medical Imaging, volume 6533 of Lecture Notes in Computer Science, pages 106-117. Springer.
  9. CVPR (2008). CVPR, Anchorage, AK, USA.
  10. CVPR (2010). CVPR, San Francisco, USA.
  11. CVPR (2011). CVPR, Colorado Springs, USA.
  12. ECCV (2010). ECCV, Heraklion, Crete.
  13. Eichner, M., Ferrari, V., and Zurich, S. (2009). Better appearance models for pictorial structures. In Proceedings of the BMVA British Machine Vision Conference, volume 2, page 6, London, UK.
  14. Fanelli, G., Gall, J., and Van Gool, L. (2011). Real time head pose estimation with random regression forests. In (CVPR, 2011), pages 617 -624.
  15. Felzenszwalb, P. and Huttenlocher, D. (2005). Pictorial structures for object recognition. International Journal of Computer Vision, 61(1):55 - 79.
  16. Ferrari, V., Marin-Jimenez, M., and Zisserman, A. (2008). Progressive search space reduction for human pose estimation. In (CVPR, 2008), pages 1 - 8.
  17. Gall, J. and Lempitsky, V. (2009). Class-specific hough forests for object detection. In (IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2009), pages 1022-1029.
  18. Ganapathi, V., Plagemann, C., Koller, D., and Thrun, S. (2010). Real time motion capture using a single timeof-flight camera. In (CVPR, 2010), pages 755 -762.
  19. Holt, B., Ong, E. J., Cooper, H., and Bowden, R. (2011). Putting the pieces together: Connected poselets for human pose estimation. In Proceedings of the IEEE Workshop on Consumer Depth Cameras for Computer Vision, Barcelona, Spain.
  20. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2009). CVPR, Miami, FL, USA.
  21. Lepetit, V. and Fua, P. (2006). Keypoint recognition using randomized trees. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(9):1465-1479.
  22. Moeslund, T., Hilton, A., and Kr├╝ ger, V. (2006). A survey of advances in vision-based human motion capture and analysis. Computer Vision and Image Understanding, 104(2-3):90 - 126.
  23. Montillo, A. and Ling, H. (2009). Age regression from faces using random forests. In ICIP09, pages 2465- 2468.
  24. Ramanan, D. (2006). Learning to parse images of articulated bodies. In Proceedings of the NIPS, volume 19, page 1129, Vancouver, B.C., Canada. Citeseer.
  25. Reynolds, M., Dobos?, J., Peel, L., Weyrich, T., and Brostow, G. (2011). Capturing time-of-flight data with confidence. In (CVPR, 2011).
  26. Rogez, G., Rihan, J., Ramalingam, S., Orrite, C., and Torr, P. H. S. (2008). Randomized trees for human pose detection. In (CVPR, 2008), pages 1-8.
  27. Sapp, B., Jordan, C., and Taskar, B. (2010). Adaptive pose priors for pictorial structures. In (CVPR, 2010), pages 422 -429.
  28. Shakhnarovich, G., Viola, P., and Darrell, T. (2003). Fast pose estimation with parameter-sensitive hashing. In Proceedings of the IEEE International Conference on Computer Vision, page 750, Nice, France.
  29. Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., and Blake, A. (2011). Real-time human pose recognition in parts from a single depth image. In (CVPR, 2011).
  30. Sigal, L. and Black, M. (2006). Measure locally, reason globally: Occlusion-sensitive articulated pose estimation. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 2041 - 2048, New York, NY, USA.
  31. Singh, V. K., Nevatia, R., and Huang, C. (2010). Efficient inference with multiple heterogeneous part detectors for human pose estimation. In (ECCV, 2010), pages 314 - 327.
  32. Tian, T.-P. and Sclaroff, S. (2010). Fast globally optimal 2d human detection with loopy graph models. In (CVPR, 2010), pages 81 -88.
  33. Tran, D. and Forsyth, D. (2010). Improved human parsing with a full relational model. In (ECCV, 2010), pages 227-240.
  34. Wang, Y. and Mori, G. (2008). Multiple tree models for occlusion and spatial constraints in human pose estimation. In Proceedings of the European Conference on Computer Vision, Marseille, France.
  35. Wang, Y., Tran, D., and Liao, Z. (2011). Learning hierarchical poselets for human parsing. In (CVPR, 2011).
  36. Zhu, Y. and Fujimura, K. (2010). A bayesian framework for human body pose tracking from depth image sequences. Sensors, 10(5):5280 - 5293.
Download


Paper Citation


in Harvard Style

Holt B. and Bowden R. (2012). STATIC POSE ESTIMATION FROM DEPTH IMAGES USING RANDOM REGRESSION FORESTS AND HOUGH VOTING . In Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2012) ISBN 978-989-8565-03-7, pages 557-564. DOI: 10.5220/0003868005570564


in Bibtex Style

@conference{visapp12,
author={Brian Holt and Richard Bowden},
title={STATIC POSE ESTIMATION FROM DEPTH IMAGES USING RANDOM REGRESSION FORESTS AND HOUGH VOTING},
booktitle={Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2012)},
year={2012},
pages={557-564},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003868005570564},
isbn={978-989-8565-03-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2012)
TI - STATIC POSE ESTIMATION FROM DEPTH IMAGES USING RANDOM REGRESSION FORESTS AND HOUGH VOTING
SN - 978-989-8565-03-7
AU - Holt B.
AU - Bowden R.
PY - 2012
SP - 557
EP - 564
DO - 10.5220/0003868005570564