non-linear regression. We show how Random Regres-
sion Forests are trained, and then subsequently used
on test image with Hough voting to accurately pre-
dict joint locations. We demonstrate our approach and
compare to the state-of-the-art on a publicly available
dataset. Even though our system is implemented in an
unoptimised high level language, it runs in seconds
per frame on a single core. As future work we plan
to apply these results with the temporal constraints of
a tracking framework for increased accuracy and tem-
poral coherency. Finally, we would like to apply these
results to other areas of cognitive vision such as HCI
and gesture recognition.
ACKNOWLEDGEMENTS
This work was supported by the EC project
FP7-ICT-23113 Dicta-Sign and the EPSRC project
EP/I011811/1. Thanks to Eng-Jon Ong and Helen
Cooper for their insights and stimulating discussions.
REFERENCES
Agarwal, A. and Triggs, B. (2006). Recovering 3D human
pose from monocular images. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 28(1):44 –
58.
Amit, Y. and Geman, D. (1997). Shape quantization and
recognition with randomized trees. Neural computa-
tion, 9(7):1545–1588.
Andriluka, M., Roth, S., and Schiele, B. (2009). Pictorial
structures revisited: People detection and articulated
pose estimation. In (IEEE Computer Society Con-
ference on Computer Vision and Pattern Recognition,
2009), pages 1014 –1021.
Bo, L. and Sminchisescu, C. (2010). Twin gaussian pro-
cesses for structured prediction. International Journal
of Computer Vision, 87:28–52.
Bourdev, L., Maji, S., Brox, T., and Malik, J. (2010). De-
tecting people using mutually consistent poselet acti-
vations. In (ECCV, 2010), pages 168 – 181.
Breiman, L. (2001). Random forests. Machine Learning,
45:5–32.
Breiman, L., Friedman, J., Olshen, R., and Stone, C. (1984).
Classification and regression trees. Chapman and
Hall.
Criminisi, A., Shotton, J., Robertson, D., and Konukoglu,
E. (2011). Regression forests for efficient anatomy
detection and localization in CT studies. In Medical
Computer Vision. Recognition Techniques and Appli-
cations in Medical Imaging, volume 6533 of Lecture
Notes in Computer Science, pages 106–117. Springer.
CVPR (2008). CVPR, Anchorage, AK, USA.
CVPR (2010). CVPR, San Francisco, USA.
CVPR (2011). CVPR, Colorado Springs, USA.
ECCV (2010). ECCV, Heraklion, Crete.
Eichner, M., Ferrari, V., and Zurich, S. (2009). Better ap-
pearance models for pictorial structures. In Proceed-
ings of the BMVA British Machine Vision Conference,
volume 2, page 6, London, UK.
Fanelli, G., Gall, J., and Van Gool, L. (2011). Real time
head pose estimation with random regression forests.
In (CVPR, 2011), pages 617 –624.
Felzenszwalb, P. and Huttenlocher, D. (2005). Pictorial
structures for object recognition. International Jour-
nal of Computer Vision, 61(1):55 – 79.
Ferrari, V., Marin-Jimenez, M., and Zisserman, A. (2008).
Progressive search space reduction for human pose es-
timation. In (CVPR, 2008), pages 1 – 8.
Gall, J. and Lempitsky, V. (2009). Class-specific hough
forests for object detection. In (IEEE Computer So-
ciety Conference on Computer Vision and Pattern
Recognition, 2009), pages 1022–1029.
Ganapathi, V., Plagemann, C., Koller, D., and Thrun, S.
(2010). Real time motion capture using a single time-
of-flight camera. In (CVPR, 2010), pages 755 –762.
Holt, B., Ong, E. J., Cooper, H., and Bowden, R. (2011).
Putting the pieces together: Connected poselets for
human pose estimation. In Proceedings of the IEEE
Workshop on Consumer Depth Cameras for Computer
Vision, Barcelona, Spain.
IEEE Computer Society Conference on Computer Vision
and Pattern Recognition (2009). CVPR, Miami, FL,
USA.
Lepetit, V. and Fua, P. (2006). Keypoint recognition us-
ing randomized trees. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 28(9):1465–1479.
Moeslund, T., Hilton, A., and Kr
¨
uger, V. (2006). A sur-
vey of advances in vision-based human motion cap-
ture and analysis. Computer Vision and Image Under-
standing, 104(2-3):90 – 126.
Montillo, A. and Ling, H. (2009). Age regression from
faces using random forests. In ICIP09, pages 2465–
2468.
Ramanan, D. (2006). Learning to parse images of articu-
lated bodies. In Proceedings of the NIPS, volume 19,
page 1129, Vancouver, B.C., Canada. Citeseer.
Reynolds, M., Dobo
ˇ
s, J., Peel, L., Weyrich, T., and Brostow,
G. (2011). Capturing time-of-flight data with confi-
dence. In (CVPR, 2011).
Rogez, G., Rihan, J., Ramalingam, S., Orrite, C., and Torr,
P. H. S. (2008). Randomized trees for human pose
detection. In (CVPR, 2008), pages 1–8.
Sapp, B., Jordan, C., and Taskar, B. (2010). Adaptive pose
priors for pictorial structures. In (CVPR, 2010), pages
422 –429.
Shakhnarovich, G., Viola, P., and Darrell, T. (2003). Fast
pose estimation with parameter-sensitive hashing. In
Proceedings of the IEEE International Conference on
Computer Vision, page 750, Nice, France.
Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio,
M., Moore, R., Kipman, A., and Blake, A. (2011).
Real-time human pose recognition in parts from a sin-
gle depth image. In (CVPR, 2011).
STATIC POSE ESTIMATION FROM DEPTH IMAGES USING RANDOM REGRESSION FORESTS AND HOUGH
VOTING
563