AN ACTION-TUNED NEURAL NETWORK ARCHITECTURE FOR HAND POSE ESTIMATION
Giovanni Tessitore, Francesco Donnarumma, Roberto Prevete
2010
Abstract
There is a growing interest in developing computational models of grasping action recognition. This interest is increasingly motivated by a wide range of applications in robotics, neuroscience, HCI, motion capture and other research areas. In many cases, a vision-based approach to grasping action recognition appears to be more promising. For example, in HCI and robotic applications, such an approach often allows for simpler and more natural interaction. However, a vision-based approach to grasping action recognition is a challenging problem due to the large number of hand self-occlusions which make the mapping from hand visual appearance to the hand pose an inverse ill-posed problem. The approach proposed here builds on the work of Santello and co-workers which demonstrate a reduction in hand variability within a given class of grasping actions. The proposed neural network architecture introduces specialized modules for each class of grasping actions and viewpoints, allowing for a more robust hand pose estimation. A quantitative analysis of the proposed architecture obtained by working on a synthetic data set is presented and discussed as a basis for further work.
References
- Aleotti, J. and Caselli, S. (2006). Grasp recognition in virtual reality for robot pregrasp planning by demonstration. In ICRA 2006, pages 2801-2806.
- Bishop, C. M. (1995). Neural Networks for Pattern Recognition. Oxford University Press.
- Chang, L. Y., Pollard, N., Mitchell, T., and Xing, E. P. (2007). Feature selection for grasp recognition from optical markers. In IROS 2007, pages 2944 - 2950.
- Dalal, N. and Triggs, B. (2005). Histograms of oriented gradients for human detection. In CVPR'05 - Volume 1, pages 886-893, Washington, DC, USA. IEEE Computer Society.
- Erol, A., Bebis, G., Nicolescu, M., Boyle, R. D., and Twombly, X. (2007). Vision-based hand pose estimation: A review. Computer Vision and Image Understanding, 108(1-2):52-73.
- Friston, K. (2005). A theory of cortical responses. Philos Trans R Soc Lond B Biol Sci, 360(1456):815-836.
- Ju, Z., Liu, H., Zhu, X., and Xiong, Y. (2008). Dynamic grasp recognition using time clustering, gaussian mixture models and hidden markov models. In ICIRA 7808, pages 669-678, Berlin, Heidelberg. Springer-Verlag.
- Keni, B., Koichi, O., Katsushi, I., and Ruediger, D. (2003). A hidden markov model based sensor fusion approach for recognizing continuous human grasping sequences. In Third IEEE Int. Conf. on Humanoid Robots.
- Kilner, J., James, Friston, K., Karl, Frith, C., and Chris (2007). Predictive coding: an account of the mirror neuron system. Cognitive Processing, 8(3):159-166.
- Napier, J. R. (1956). The prehensile movements of the human hand. The Journal of Bone and Joint Surgery, 38B:902-913.
- Palm, R., Iliev, B., and Kadmiry, B. (2009). Recognition of human grasps by time-clustering and fuzzy modeling. Robot. Auton. Syst., 57(5):484-495.
- Poppe, R. (2007). Vision-based human motion analysis: An overview. Computer Vision and Image Understanding, 108(1-2):4 - 18. Special Issue on Vision for Human-Computer Interaction.
- Prevete, R., Tessitore, G., Catanzariti, E., and Tamburrini, G. (2010). Perceiving affordances: a computational investigation of grasping affordances. Accepted for publication in Cognitive System Research.
- Prevete, R., Tessitore, G., Santoro, M., and Catanzariti, E. (2008). A connectionist architecture for viewindependent grip-aperture computation. Brain Research, 1225:133-145.
- Romero, J., Kjellstrom, H., and Kragic, D. (2009). Monocular real-time 3d articulated hand pose estimation . In IEEE-RAS International Conference on Humanoid Robots (Humanoids09).
- Santello, M., Flanders, M., and Soechting, J. F. (2002). Patterns of hand motion during grasping and the influence of sensory guidance. Journal of Neuroscience, 22(4):1426-1235.
- Weinland, D., Ronfard, R., and Boyer, E. (2010). A Survey of Vision-Based Methods for Action Representation, Segmentation and Recognition. Technical report, INRIA.
Paper Citation
in Harvard Style
Tessitore G., Donnarumma F. and Prevete R. (2010). AN ACTION-TUNED NEURAL NETWORK ARCHITECTURE FOR HAND POSE ESTIMATION . In Proceedings of the International Conference on Fuzzy Computation and 2nd International Conference on Neural Computation - Volume 1: ICNC, (IJCCI 2010) ISBN 978-989-8425-32-4, pages 358-363. DOI: 10.5220/0003086403580363
in Bibtex Style
@conference{icnc10,
author={Giovanni Tessitore and Francesco Donnarumma and Roberto Prevete},
title={AN ACTION-TUNED NEURAL NETWORK ARCHITECTURE FOR HAND POSE ESTIMATION},
booktitle={Proceedings of the International Conference on Fuzzy Computation and 2nd International Conference on Neural Computation - Volume 1: ICNC, (IJCCI 2010)},
year={2010},
pages={358-363},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003086403580363},
isbn={978-989-8425-32-4},
}
in EndNote Style
TY - CONF
JO - Proceedings of the International Conference on Fuzzy Computation and 2nd International Conference on Neural Computation - Volume 1: ICNC, (IJCCI 2010)
TI - AN ACTION-TUNED NEURAL NETWORK ARCHITECTURE FOR HAND POSE ESTIMATION
SN - 978-989-8425-32-4
AU - Tessitore G.
AU - Donnarumma F.
AU - Prevete R.
PY - 2010
SP - 358
EP - 363
DO - 10.5220/0003086403580363