Vision-based Hand Pose Estimation - A Mixed Bottom-up and Top-down Approach

Davide Periquito, Jacinto C. Nascimento, Alexandre Bernardino, João Sequeira

2013

Abstract

Tracking a human hand position and orientation in image sequences is nowadays possible with local search methods, given that a good initialization is provided and that the hand pose and appearance have small frame-to-frame changes. However, if the target moves too quickly or disappears from the field of view, reinitialization of the tracker is necessary. Fully automatic initialization is a very challenging problem due to multiple factors, including the difficulty in identifying landmarks on individual fingers and reconstructing the hand pose from their position. In this paper, we propose an appearance based approach to generate candidates for hand postures given a single image. The method is based on matching hand silhouettes to a previously trained database, therefore circumventing the need for explicit geometric pose reconstruction. A dense sampling of the hand appearance space is obtained through a simulation environment and the corresponding silhouettes stored in a database. In run time, the acquired silhouettes are efficiently retrieved from the database using a mixture of bottom-up and top-down processes. We assess the performance of our approach in a series of simulations, evaluating the influence of the bottom-up and top-down processes in terms of estimation error and computation time, and show promising results obtained with real sequences.

References

  1. Borenstein, E. and Ullman, S. (2008). Combined topdown/bottom-up segmentation. IEEE transactions on pattern analysis and machine intelligence, 30:12:4- 18.
  2. Brandao, M., Bernardino, A., and Santos-Victor, J. (2011). Image driven generation of pose hypotheses for 3d model-based tracking. In 12th IAPR Conference on Machine Vision Applications. MVA 2011.
  3. Delamarre, Q. and Faugeras, O. (2001). 3d articulated models and multi-view tracking with physical forces.
  4. Diankov, R. (2008). Openrave: A planning architecture for autonomous robotics. Robotics Institute, Pittsburgh, PA, Tech. Rep., (July).
  5. Erol, A., Bebis, G., Nicolescu, M., Boyle, R. D., and Twombly, X. (2005). A review on vision-based full dof hand motion estimation. In (CVPR'05)Computer Society Conference on Computer Vision and Pattern Recognition.
  6. Gavrila, D. M. (1999). The visual analysis of human movement: A survey. Computer Vision and Image Understanding, 73:82-98.
  7. Gavrila, D. M. and Davis, L. S. (1996). Tracking of humans in action: a 3-d model-based approach. In In Proc. ARPA Image Understanding Workshop, pages 737-746.
  8. Hammoude, A. (1988). Computer-assited Endocardial Border Identification from a Sequence of Twodimensional Echocardiographic Images. PhD thesis, University Washington.
  9. Kyrki, V. (2005). Integration of model-based and modelfree cues for visual object tracking in 3cd. In Proc. of the IEEE Int. Conf on Robotics and Automation (ICRA'05), pages 1554-1560.
  10. Micilotta, A. S., Ong, E., and Bowden, R. (2006). Realtime upper body detection and 3d pose estimation in monoscopic images. In In European Conference on Computer Vision, pages 139-150.
  11. Nascimento, J. C. and Marques, J. S. (2008). Robust shape tracking with multiple models in ultrasound images. IEEE Transactions on image processing, vol. 17, no. 3.
  12. Okuma, K., Taleghani, A., Freitas, N. D., Freitas, O. D., Little, J. J., and Lowe, D. G. (2004). A boosted particle filter: Multitarget detection and tracking. In In ECCV, pages 28-39.
  13. Poppe, R. (2007). Vision-based human motion analysis: An overview. Computer Vision and image Understanding, 108:1-17.
  14. Ramanan, D., Forsyth, D. A., and Zisserman, A. (2007a). Tracking people by learning their appearance. Pattern Analysis and Machine Intelligence, IEEE Transaction on.
  15. Ramanan, D., Forsyth, D. A., and Zisserman, A. (2007b). Tracking people by learning their appearance. Pattern Analysis and Machine Intelligence, IEEE Transaction on.
  16. Rehg, J. M. and Kanade, T. (1994). Visual tracking of high dof articulated structures: an application to human hand tracking. In Lecture Notes in Computer Science, 1994, Volume 801/1994, pp. 35-46. Springer.
  17. Shoemake, K. (1995). Animating rotation with quaternion curves. In SIGGRAPH 7885 Proceedings of the 12th annual conference on Computer graphics and interactive techniques, pages 245-254. ACM New York, NY, USA.
  18. Swain, M. J. and Ballard, D. H. (1991). Color indexing. Int. Journal od Comp. Vision, 7(1):11-32.
  19. Zhiguo, L. V. and Yan, L. I. (2010). Efficient 3d hand posture estimation with self-occlusion from multiview images. In 2010 Second International Conference on Intelligent Human-Machine Systems and Cybernetics. IEEE.
Download


Paper Citation


in Harvard Style

Periquito D., Nascimento J., Bernardino A. and Sequeira J. (2013). Vision-based Hand Pose Estimation - A Mixed Bottom-up and Top-down Approach . In Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2013) ISBN 978-989-8565-47-1, pages 566-573. DOI: 10.5220/0004295805660573


in Bibtex Style

@conference{visapp13,
author={Davide Periquito and Jacinto C. Nascimento and Alexandre Bernardino and João Sequeira},
title={Vision-based Hand Pose Estimation - A Mixed Bottom-up and Top-down Approach},
booktitle={Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2013)},
year={2013},
pages={566-573},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004295805660573},
isbn={978-989-8565-47-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2013)
TI - Vision-based Hand Pose Estimation - A Mixed Bottom-up and Top-down Approach
SN - 978-989-8565-47-1
AU - Periquito D.
AU - Nascimento J.
AU - Bernardino A.
AU - Sequeira J.
PY - 2013
SP - 566
EP - 573
DO - 10.5220/0004295805660573