AUTOMATIC INITIALIZATION FOR BODY TRACKING - Using Appearance to Learn a Model for Tracking Human Upper Body Motions

Joachim Schmidt, Modesto Castrillón-Santana

2008

Abstract

Social robots require the ability to communicate and recognize the intention of a human interaction partner. Humans commonly make use of gestures for interactive purposes. For a social robot, recognition of gestures is therefore a necessary skill. As a common intermediate step, the pose of an individual is tracked over time making use of a body model. The acquisition of a suitable body model, i.e. self-starting the tracker, however, is a complex and challenging task. This paper presents an approach to facilitate the acquisition of the body model during interaction. Taking advantage of a robust face detection algorithm provides the opportunity for automatic and markerless acquisition of a 3D body model using a monocular color camera. For the given human robot interaction scenario, a prototype has been developed for a single user configuration. It provides automatic initialization and failure recovery of a 3D body tracker based on head and hand detection information, delivering promising results.

References

  1. Bissacco, A., Yang, M.-H., and Soatto, S. (2007). Fast human pose estimation using appearance and motion via multi-dimensional boosting regression. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR).
  2. Brox, T., Rosenhahn, B., Kersting, U., and Cremers, D. (2006). Nonparametric density estimation for human pose tracking. In Franke, K., Mueller, R., Nickolay, B., and Schaefer, R., editors, Pattern Recognition 2006, DAGM, volume 4174, pages 546-555, Berlin. LNCS, Springer-Verlag, Berlin Heidelberg.
  3. Fritsch, J., Lang, S., Kleinehagenbrock, M., Fink, G. A., and Sagerer, G. (2002). Improving adaptive skin color segmentation by incorporating results from face detection. In Int. Workshop on Robot and Human Interactive Communication (ROMAN), pages 337-343.
  4. Fritsch, J. and Wrede, S. (2007). Software Engineering for Experimental Robotics, volume 30 of Springer Tracts in Advanced Robotics, chapter An Integration Framework for Developing Interactive Robots, pages 291- 305. Springer, Berlin.
  5. Gavrila, D. M. (1999). The visual analysis of human movement: A survey. Computer Vision and Image Understanding: CVIU, 73(1):82-98.
  6. Haasch, A., Hofemann, N., Fritsch, J., and Sagerer, G. (2005). A multi-modal object attention system for a mobile robot. In Int. Conf. on Intelligent Robots and Systems, pages 1499-1504.
  7. Humanoid Animation Working Group (2007). Information technology - Computer graphics and image processing - Humanoid animation (H-Anim). http://www.hanim.org/.
  8. Intel (2006). Intel Open Source Computer Vision Library, v1.0. www.intel.com/research/mrl/research/opencv.
  9. Johansson, G. (1973). Visual perception of biological motion and a model for its analysis. Perception and Psychophysics, 14:201-211.
  10. Kölsch, M. and Turk, M. (2004). Robust hand detection. In Proceedings of the International Conference on Automatic Face and Gesture Recognition.
  11. Lee, M. and Cohen, I. (2004). Human upper body pose estimation in static images. In Proc. of European Conference on Computer Vision ECCV), pages 126-138.
  12. Lömker, F., Wrede, S., Hanheide, M., and Fritsch, J. (2006). Building modular vision systems with a graphical plugin environment. In Proc. of International Conference on Vision Systems, page 2, St. Johns University, Manhattan, New York City, USA. IEEE.
  13. McNeill, D. (1992). Hand and Mind: What Gestures Reveal about Thought. University of Chicago Press.
  14. Moeslund, T. B. and Granum, E. (2001). A survey of computer vision-based human motion capture. Computer Vision and Image Understanding: CVIU, 81(3):231- 268.
  15. Ramanan, D. and Forsyth, D. A. (2003). Finding and tracking people from the bottom up. In Conf. on Computer Vision and Pattern Recognition, volume 2, pages 467- 474.
  16. S. Knoop, S. Vacek, R. D. (2006). Sensor fusion for 3d human body tracking with an articulated 3d body model. In Proceedings of the IEEE International Conference on Robotics and Automation, pages 1686-1691, Walt Disney Resort, Orlando, Florida.
  17. Schmidt, J., Kwolek, B., and Fritsch, J. (2006). Kernel Particle Filter for Real-Time 3D Body Tracking in Monocular Color Images. In Proc. of Automatic Face and Gesture Recognition, pages 567-572, Southampton, UK. IEEE.
  18. Schneiderman, H. and Kanade, T. (2000). A statistical method for 3d object detection applied to faces and cars. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1746-1759.
  19. Sidenbladh, H., Black, M., and Fleet, D. (2000). Stochastic tracking of 3D human figures using 2D image motion. In Europ. Conf. on Computer Vision, pages 702-718.
  20. Sigal, L., Bhatia, S., Roth, S., Black, M. J., and Isard, M. (2004). Tracking loose-limbed people. In Conf. on Computer Vision and Pattern Recognition, volume 1, pages 421-428.
  21. Sigal, L. and Black, M. J. (2006a). Predicting 3d people from 2d pictures. In IV Conference on Articulated Motion and Deformable Objects - AMDO 2006, volume 4069, pages 185-195, Mallorca, Spain. IEEE Computer Society, LNCS.
  22. Sigal, L. and Black, M. J. (2006b). Synchronized video and motion capture dataset for evaluation of articulated human motion. Technical Report Techniacl Report CS-06-08, Brown University.
  23. Sinha, P. and Poggio, T. (1996). I think I know that face ... Nature, 384(6608):384-404.
  24. Sminchisescu, C. and Triggs, B. (2005). Mapping minima and transitions of visual models. Int. J. of Computer Vision, 61(1).
  25. Stenger, B., Thayananthan, A., Torr, P., and Cipolla:, R. (2004). Hand pose estimation using hierarchical detection. In ECCV Workshop on HCI, pages 102-112.
  26. Storring, M., Moeslund, T., Y.Liu, and Granum, E. (2004). Computer vision-based gesture recognition for an augmented reality interface. In 4th IASTED International Conference on VISUALIZATION, IMAGING, AND IMAGE PROCESSING, pages 766-771.
  27. Swain, M. J. and Ballard, D. H. (1991). Color indexing. International Journal on Computer Vision, 7(1):11- 32.
  28. Taycher, L., Shakhnarovich, G., Demirdjian, D., and Darrell, T. (2006). Conditional random people: Tracking humans with crfs and grid filters. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 222-229.
  29. Urtasun, R., Fleet, D., and Fua, P. (2005). Monocular 3d tracking of the golf swing. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), page 1199, San Diego.
  30. Viola, P. and Jones, M. J. (2004). Robust real-time face detection. International Journal of Computer Vision, 57(2):151-173.
Download


Paper Citation


in Harvard Style

Schmidt J. and Castrillón-Santana M. (2008). AUTOMATIC INITIALIZATION FOR BODY TRACKING - Using Appearance to Learn a Model for Tracking Human Upper Body Motions . In Proceedings of the Third International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2008) ISBN 978-989-8111-21-0, pages 535-542. DOI: 10.5220/0001071005350542


in Bibtex Style

@conference{visapp08,
author={Joachim Schmidt and Modesto Castrillón-Santana},
title={AUTOMATIC INITIALIZATION FOR BODY TRACKING - Using Appearance to Learn a Model for Tracking Human Upper Body Motions},
booktitle={Proceedings of the Third International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2008)},
year={2008},
pages={535-542},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001071005350542},
isbn={978-989-8111-21-0},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Third International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2008)
TI - AUTOMATIC INITIALIZATION FOR BODY TRACKING - Using Appearance to Learn a Model for Tracking Human Upper Body Motions
SN - 978-989-8111-21-0
AU - Schmidt J.
AU - Castrillón-Santana M.
PY - 2008
SP - 535
EP - 542
DO - 10.5220/0001071005350542