Towards Human Pose Semantic Synthesis in 3D based on Query Keywords

Mo'taz Al-Hami, Rolf Lakaemper

Abstract

The work presented in this paper is part of a project to enable humanoid robots to build a semantic understanding of their environment adopting unsupervised self-learning techniques. Here, we propose an approach to learn 3-dimensional human-pose conformations, i.e. structural arrangements of a (simplified) human skeleton model, given only a minimal verbal description of a human posture (e.g. "sitting", "standing", "tree pose"). The only tools given to the robot are knowledge about the skeleton model, as well as a connection to the labeled images database "google images". Hence the main contribution of this work is to filter relevant results from an images database, given a human-pose specific query words, and to transform the information in these (2D) images into a 3D pose that is the most likely to fit the human understanding of the keywords. Steps to achieve this goal integrate available 2D human-pose estimators using still images, clustering techniques to extract representative 2D human skeleton poses, and the 3D-pose from 2D-pose estimation. We evaluate the approach using different query keywords representing different postures.

References

  1. Al-Hami, M. and Lakaemper, R. (2014). Sitting pose generation using genetic algorithm for nao humanoid robot. In IEEE Workshop on Advanced Robotics and its Social Impacts (ARSO), 2014, pages 137-142. IEEE.
  2. Eichner, M., Marin-Jimenez, M., Zisserman, A., and Ferrari, V. (2012). 2d articulated human pose estimation and retrieval in (almost) unconstrained still images. International Journal of Computer Vision, 99(2):190- 214.
  3. Fergus, R., Perona, P., and Zisserman, A. (2003). Object class recognition by unsupervised scale-invariant learning. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR). 2003, volume 2, pages II-264. IEEE.
  4. Ferrari, V., Marin-Jimenez, M., and Zisserman, A. (2008). Progressive search space reduction for human pose estimation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2008, pages 1-8. IEEE.
  5. Fischler, M. A. and Elschlager, R. A. (1973). The representation and matching of pictorial structures. IEEE Transactions on Computers, 22(1):67-92.
  6. Gavrila, D. (2000). Pedestrian detection from a moving vehicle. In Computer Vision ECCV 2000, pages 37-49. Springer.
  7. Ikemoto, S., Amor, H. B., Minato, T., Jung, B., and Ishiguro, H. (2012). Physical human-robot interaction: Mutual learning and adaptation. Robotics & Automation Magazine, IEEE, 19(4):24-35.
  8. Ioffe, S. and Forsyth, D. A. (2001). Probabilistic methods for finding people. International Journal of Computer Vision, 43(1):45-68.
  9. Jokinen, K. and Wilcock, G. (2014). Multimodal opendomain conversations with the nao robot. In Natural Interaction with Robots, Knowbots and Smartphones, pages 213-224. Springer.
  10. Lan, X. and Huttenlocher, D. P. (2004). A unified spatiotemporal articulated model for tracking. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2004, volume 1, pages I-722. IEEE.
  11. Lan, X. and Huttenlocher, D. P. (2005). Beyond trees: Common-factor models for 2d human pose recovery. In Tenth IEEE International Conference on Computer Vision (ICCV), 2005, volume 1, pages 470-477. IEEE.
  12. Mori, G. and Malik, J. (2002). Estimating human body configurations using shape context matching. In Computer Vision ECCV 2002, pages 666-680. Springer.
  13. Mu, Y. and Yin, Y. (2010). Human-humanoid robot interaction system based on spoken dialogue and vision. In 3rd IEEE International Conference on Computer Science and Information Technology (ICCSIT), 2010, volume 6, pages 328-332. IEEE.
  14. Ramakrishna, V., Kanade, T., and Sheikh, Y. (2012). Reconstructing 3d human pose from 2d image landmarks. In Computer Vision ECCV 2012, pages 573- 586. Springer.
  15. Ramanan, D. (2006). Learning to parse images of articulated bodies. In Advances in Neural Information Processing Systems, pages 1129-1136.
  16. Ramanan, D. and Sminchisescu, C. (2006). Training deformable models for localization. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2006, volume 1, pages 206-213. IEEE.
  17. Ren, X., Berg, A. C., and Malik, J. (2005). Recovering human body configurations using pairwise constraints between parts. In Tenth IEEE International Conference on Computer Vision (ICCV), 2005, volume 1, pages 824-831. IEEE.
  18. Yao, B. and Fei-Fei, L. (2010). Modeling mutual context of object and human pose in human-object interaction activities. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010, pages 17-24. IEEE.
Download


Paper Citation


in Harvard Style

Al-Hami M. and Lakaemper R. (2015). Towards Human Pose Semantic Synthesis in 3D based on Query Keywords . In Proceedings of the 10th International Conference on Computer Vision Theory and Applications - Volume 3: VISAPP, (VISIGRAPP 2015) ISBN 978-989-758-091-8, pages 420-427. DOI: 10.5220/0005258704200427


in Bibtex Style

@conference{visapp15,
author={Mo'taz Al-Hami and Rolf Lakaemper},
title={Towards Human Pose Semantic Synthesis in 3D based on Query Keywords},
booktitle={Proceedings of the 10th International Conference on Computer Vision Theory and Applications - Volume 3: VISAPP, (VISIGRAPP 2015)},
year={2015},
pages={420-427},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005258704200427},
isbn={978-989-758-091-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 10th International Conference on Computer Vision Theory and Applications - Volume 3: VISAPP, (VISIGRAPP 2015)
TI - Towards Human Pose Semantic Synthesis in 3D based on Query Keywords
SN - 978-989-758-091-8
AU - Al-Hami M.
AU - Lakaemper R.
PY - 2015
SP - 420
EP - 427
DO - 10.5220/0005258704200427