AUTOMATIC INITIALIZATION FOR BODY TRACKING - Using Appearance to Learn a Model for Tracking Human Upper Body Motions
Joachim Schmidt, Modesto Castrillón-Santana
2008
Abstract
Social robots require the ability to communicate and recognize the intention of a human interaction partner. Humans commonly make use of gestures for interactive purposes. For a social robot, recognition of gestures is therefore a necessary skill. As a common intermediate step, the pose of an individual is tracked over time making use of a body model. The acquisition of a suitable body model, i.e. self-starting the tracker, however, is a complex and challenging task. This paper presents an approach to facilitate the acquisition of the body model during interaction. Taking advantage of a robust face detection algorithm provides the opportunity for automatic and markerless acquisition of a 3D body model using a monocular color camera. For the given human robot interaction scenario, a prototype has been developed for a single user configuration. It provides automatic initialization and failure recovery of a 3D body tracker based on head and hand detection information, delivering promising results.
References
- Bissacco, A., Yang, M.-H., and Soatto, S. (2007). Fast human pose estimation using appearance and motion via multi-dimensional boosting regression. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR).
- Brox, T., Rosenhahn, B., Kersting, U., and Cremers, D. (2006). Nonparametric density estimation for human pose tracking. In Franke, K., Mueller, R., Nickolay, B., and Schaefer, R., editors, Pattern Recognition 2006, DAGM, volume 4174, pages 546-555, Berlin. LNCS, Springer-Verlag, Berlin Heidelberg.
- Fritsch, J., Lang, S., Kleinehagenbrock, M., Fink, G. A., and Sagerer, G. (2002). Improving adaptive skin color segmentation by incorporating results from face detection. In Int. Workshop on Robot and Human Interactive Communication (ROMAN), pages 337-343.
- Fritsch, J. and Wrede, S. (2007). Software Engineering for Experimental Robotics, volume 30 of Springer Tracts in Advanced Robotics, chapter An Integration Framework for Developing Interactive Robots, pages 291- 305. Springer, Berlin.
- Gavrila, D. M. (1999). The visual analysis of human movement: A survey. Computer Vision and Image Understanding: CVIU, 73(1):82-98.
- Haasch, A., Hofemann, N., Fritsch, J., and Sagerer, G. (2005). A multi-modal object attention system for a mobile robot. In Int. Conf. on Intelligent Robots and Systems, pages 1499-1504.
- Humanoid Animation Working Group (2007). Information technology - Computer graphics and image processing - Humanoid animation (H-Anim). http://www.hanim.org/.
- Intel (2006). Intel Open Source Computer Vision Library, v1.0. www.intel.com/research/mrl/research/opencv.
- Johansson, G. (1973). Visual perception of biological motion and a model for its analysis. Perception and Psychophysics, 14:201-211.
- Kölsch, M. and Turk, M. (2004). Robust hand detection. In Proceedings of the International Conference on Automatic Face and Gesture Recognition.
- Lee, M. and Cohen, I. (2004). Human upper body pose estimation in static images. In Proc. of European Conference on Computer Vision ECCV), pages 126-138.
- Lömker, F., Wrede, S., Hanheide, M., and Fritsch, J. (2006). Building modular vision systems with a graphical plugin environment. In Proc. of International Conference on Vision Systems, page 2, St. Johns University, Manhattan, New York City, USA. IEEE.
- McNeill, D. (1992). Hand and Mind: What Gestures Reveal about Thought. University of Chicago Press.
- Moeslund, T. B. and Granum, E. (2001). A survey of computer vision-based human motion capture. Computer Vision and Image Understanding: CVIU, 81(3):231- 268.
- Ramanan, D. and Forsyth, D. A. (2003). Finding and tracking people from the bottom up. In Conf. on Computer Vision and Pattern Recognition, volume 2, pages 467- 474.
- S. Knoop, S. Vacek, R. D. (2006). Sensor fusion for 3d human body tracking with an articulated 3d body model. In Proceedings of the IEEE International Conference on Robotics and Automation, pages 1686-1691, Walt Disney Resort, Orlando, Florida.
- Schmidt, J., Kwolek, B., and Fritsch, J. (2006). Kernel Particle Filter for Real-Time 3D Body Tracking in Monocular Color Images. In Proc. of Automatic Face and Gesture Recognition, pages 567-572, Southampton, UK. IEEE.
- Schneiderman, H. and Kanade, T. (2000). A statistical method for 3d object detection applied to faces and cars. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1746-1759.
- Sidenbladh, H., Black, M., and Fleet, D. (2000). Stochastic tracking of 3D human figures using 2D image motion. In Europ. Conf. on Computer Vision, pages 702-718.
- Sigal, L., Bhatia, S., Roth, S., Black, M. J., and Isard, M. (2004). Tracking loose-limbed people. In Conf. on Computer Vision and Pattern Recognition, volume 1, pages 421-428.
- Sigal, L. and Black, M. J. (2006a). Predicting 3d people from 2d pictures. In IV Conference on Articulated Motion and Deformable Objects - AMDO 2006, volume 4069, pages 185-195, Mallorca, Spain. IEEE Computer Society, LNCS.
- Sigal, L. and Black, M. J. (2006b). Synchronized video and motion capture dataset for evaluation of articulated human motion. Technical Report Techniacl Report CS-06-08, Brown University.
- Sinha, P. and Poggio, T. (1996). I think I know that face ... Nature, 384(6608):384-404.
- Sminchisescu, C. and Triggs, B. (2005). Mapping minima and transitions of visual models. Int. J. of Computer Vision, 61(1).
- Stenger, B., Thayananthan, A., Torr, P., and Cipolla:, R. (2004). Hand pose estimation using hierarchical detection. In ECCV Workshop on HCI, pages 102-112.
- Storring, M., Moeslund, T., Y.Liu, and Granum, E. (2004). Computer vision-based gesture recognition for an augmented reality interface. In 4th IASTED International Conference on VISUALIZATION, IMAGING, AND IMAGE PROCESSING, pages 766-771.
- Swain, M. J. and Ballard, D. H. (1991). Color indexing. International Journal on Computer Vision, 7(1):11- 32.
- Taycher, L., Shakhnarovich, G., Demirdjian, D., and Darrell, T. (2006). Conditional random people: Tracking humans with crfs and grid filters. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 222-229.
- Urtasun, R., Fleet, D., and Fua, P. (2005). Monocular 3d tracking of the golf swing. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), page 1199, San Diego.
- Viola, P. and Jones, M. J. (2004). Robust real-time face detection. International Journal of Computer Vision, 57(2):151-173.
Paper Citation
in Harvard Style
Schmidt J. and Castrillón-Santana M. (2008). AUTOMATIC INITIALIZATION FOR BODY TRACKING - Using Appearance to Learn a Model for Tracking Human Upper Body Motions . In Proceedings of the Third International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2008) ISBN 978-989-8111-21-0, pages 535-542. DOI: 10.5220/0001071005350542
in Bibtex Style
@conference{visapp08,
author={Joachim Schmidt and Modesto Castrillón-Santana},
title={AUTOMATIC INITIALIZATION FOR BODY TRACKING - Using Appearance to Learn a Model for Tracking Human Upper Body Motions},
booktitle={Proceedings of the Third International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2008)},
year={2008},
pages={535-542},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001071005350542},
isbn={978-989-8111-21-0},
}
in EndNote Style
TY - CONF
JO - Proceedings of the Third International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2008)
TI - AUTOMATIC INITIALIZATION FOR BODY TRACKING - Using Appearance to Learn a Model for Tracking Human Upper Body Motions
SN - 978-989-8111-21-0
AU - Schmidt J.
AU - Castrillón-Santana M.
PY - 2008
SP - 535
EP - 542
DO - 10.5220/0001071005350542