AN ACTION-TUNED NEURAL NETWORK ARCHITECTURE FOR HAND POSE ESTIMATION

Giovanni Tessitore, Francesco Donnarumma, Roberto Prevete

2010

Abstract

There is a growing interest in developing computational models of grasping action recognition. This interest is increasingly motivated by a wide range of applications in robotics, neuroscience, HCI, motion capture and other research areas. In many cases, a vision-based approach to grasping action recognition appears to be more promising. For example, in HCI and robotic applications, such an approach often allows for simpler and more natural interaction. However, a vision-based approach to grasping action recognition is a challenging problem due to the large number of hand self-occlusions which make the mapping from hand visual appearance to the hand pose an inverse ill-posed problem. The approach proposed here builds on the work of Santello and co-workers which demonstrate a reduction in hand variability within a given class of grasping actions. The proposed neural network architecture introduces specialized modules for each class of grasping actions and viewpoints, allowing for a more robust hand pose estimation. A quantitative analysis of the proposed architecture obtained by working on a synthetic data set is presented and discussed as a basis for further work.

References

  1. Aleotti, J. and Caselli, S. (2006). Grasp recognition in virtual reality for robot pregrasp planning by demonstration. In ICRA 2006, pages 2801-2806.
  2. Bishop, C. M. (1995). Neural Networks for Pattern Recognition. Oxford University Press.
  3. Chang, L. Y., Pollard, N., Mitchell, T., and Xing, E. P. (2007). Feature selection for grasp recognition from optical markers. In IROS 2007, pages 2944 - 2950.
  4. Dalal, N. and Triggs, B. (2005). Histograms of oriented gradients for human detection. In CVPR'05 - Volume 1, pages 886-893, Washington, DC, USA. IEEE Computer Society.
  5. Erol, A., Bebis, G., Nicolescu, M., Boyle, R. D., and Twombly, X. (2007). Vision-based hand pose estimation: A review. Computer Vision and Image Understanding, 108(1-2):52-73.
  6. Friston, K. (2005). A theory of cortical responses. Philos Trans R Soc Lond B Biol Sci, 360(1456):815-836.
  7. Ju, Z., Liu, H., Zhu, X., and Xiong, Y. (2008). Dynamic grasp recognition using time clustering, gaussian mixture models and hidden markov models. In ICIRA 7808, pages 669-678, Berlin, Heidelberg. Springer-Verlag.
  8. Keni, B., Koichi, O., Katsushi, I., and Ruediger, D. (2003). A hidden markov model based sensor fusion approach for recognizing continuous human grasping sequences. In Third IEEE Int. Conf. on Humanoid Robots.
  9. Kilner, J., James, Friston, K., Karl, Frith, C., and Chris (2007). Predictive coding: an account of the mirror neuron system. Cognitive Processing, 8(3):159-166.
  10. Napier, J. R. (1956). The prehensile movements of the human hand. The Journal of Bone and Joint Surgery, 38B:902-913.
  11. Palm, R., Iliev, B., and Kadmiry, B. (2009). Recognition of human grasps by time-clustering and fuzzy modeling. Robot. Auton. Syst., 57(5):484-495.
  12. Poppe, R. (2007). Vision-based human motion analysis: An overview. Computer Vision and Image Understanding, 108(1-2):4 - 18. Special Issue on Vision for Human-Computer Interaction.
  13. Prevete, R., Tessitore, G., Catanzariti, E., and Tamburrini, G. (2010). Perceiving affordances: a computational investigation of grasping affordances. Accepted for publication in Cognitive System Research.
  14. Prevete, R., Tessitore, G., Santoro, M., and Catanzariti, E. (2008). A connectionist architecture for viewindependent grip-aperture computation. Brain Research, 1225:133-145.
  15. Romero, J., Kjellstrom, H., and Kragic, D. (2009). Monocular real-time 3d articulated hand pose estimation . In IEEE-RAS International Conference on Humanoid Robots (Humanoids09).
  16. Santello, M., Flanders, M., and Soechting, J. F. (2002). Patterns of hand motion during grasping and the influence of sensory guidance. Journal of Neuroscience, 22(4):1426-1235.
  17. Weinland, D., Ronfard, R., and Boyer, E. (2010). A Survey of Vision-Based Methods for Action Representation, Segmentation and Recognition. Technical report, INRIA.
Download


Paper Citation


in Harvard Style

Tessitore G., Donnarumma F. and Prevete R. (2010). AN ACTION-TUNED NEURAL NETWORK ARCHITECTURE FOR HAND POSE ESTIMATION . In Proceedings of the International Conference on Fuzzy Computation and 2nd International Conference on Neural Computation - Volume 1: ICNC, (IJCCI 2010) ISBN 978-989-8425-32-4, pages 358-363. DOI: 10.5220/0003086403580363


in Bibtex Style

@conference{icnc10,
author={Giovanni Tessitore and Francesco Donnarumma and Roberto Prevete},
title={AN ACTION-TUNED NEURAL NETWORK ARCHITECTURE FOR HAND POSE ESTIMATION},
booktitle={Proceedings of the International Conference on Fuzzy Computation and 2nd International Conference on Neural Computation - Volume 1: ICNC, (IJCCI 2010)},
year={2010},
pages={358-363},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003086403580363},
isbn={978-989-8425-32-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Fuzzy Computation and 2nd International Conference on Neural Computation - Volume 1: ICNC, (IJCCI 2010)
TI - AN ACTION-TUNED NEURAL NETWORK ARCHITECTURE FOR HAND POSE ESTIMATION
SN - 978-989-8425-32-4
AU - Tessitore G.
AU - Donnarumma F.
AU - Prevete R.
PY - 2010
SP - 358
EP - 363
DO - 10.5220/0003086403580363