the estimated poses in this case would not pro-
duce a sufficient representative cluster.
6 CONCLUSION
In this paper, we presented a hierarchical binary clus-
tering approach which bridges the gap between 2D
human-pose estimation in an image, and 3D human-
pose reconstruction from 2D human-pose skeleton.
Using this approach we are able to translate query
keywords representing a human-pose conformation
into an approximate 3D human-pose skeleton. The
work in (Al-Hami and Lakaemper, 2014) uses the ge-
netic algorithm to adjust a humanoid robot pose, so
the robot would fit well on an unknown sittable object.
Extending such approach could be accomplished by
allowing the humanoid robot to adopt self-leaning us-
ing simple verbals describing a specific human-pose.
This style of query analysis allows us to extend the
humanoid robots ability toward self-motivated learn-
ing, allowing them to move forward in many applica-
tions. We discussed a hierarchical binary clustering
approach to extract a consistent representative subset
of human-poses. Silhouette value measurement was
used to capture a cluster consistency, and cost trans-
formation was used to rank poses within a cluster ac-
cording to their closeness to the approximate model.
For future work, we want to improve pose estimation
accuracy in 2D images. Also we want to improve the
3D reconstruction performance by forcing joint valid
rotation ranges, such that the constructed 3D pose is
within a valid joints rotations.
REFERENCES
Al-Hami, M. and Lakaemper, R. (2014). Sitting pose gener-
ation using genetic algorithm for nao humanoid robot.
In IEEE Workshop on Advanced Robotics and its So-
cial Impacts (ARSO), 2014, pages 137–142. IEEE.
Eichner, M., Marin-Jimenez, M., Zisserman, A., and Fer-
rari, V. (2012). 2d articulated human pose estimation
and retrieval in (almost) unconstrained still images.
International Journal of Computer Vision, 99(2):190–
214.
Fergus, R., Perona, P., and Zisserman, A. (2003). Ob-
ject class recognition by unsupervised scale-invariant
learning. In IEEE Computer Society Conference on
Computer Vision and Pattern Recognition (CVPR).
2003, volume 2, pages II–264. IEEE.
Ferrari, V., Marin-Jimenez, M., and Zisserman, A. (2008).
Progressive search space reduction for human pose es-
timation. In IEEE Conference on Computer Vision
and Pattern Recognition (CVPR), 2008, pages 1–8.
IEEE.
Fischler, M. A. and Elschlager, R. A. (1973). The repre-
sentation and matching of pictorial structures. IEEE
Transactions on Computers, 22(1):67–92.
Gavrila, D. (2000). Pedestrian detection from a moving ve-
hicle. In Computer Vision ECCV 2000, pages 37–49.
Springer.
Ikemoto, S., Amor, H. B., Minato, T., Jung, B., and Ishig-
uro, H. (2012). Physical human-robot interaction:
Mutual learning and adaptation. Robotics & Automa-
tion Magazine, IEEE, 19(4):24–35.
Ioffe, S. and Forsyth, D. A. (2001). Probabilistic methods
for finding people. International Journal of Computer
Vision, 43(1):45–68.
Jokinen, K. and Wilcock, G. (2014). Multimodal open-
domain conversations with the nao robot. In Natural
Interaction with Robots, Knowbots and Smartphones,
pages 213–224. Springer.
Lan, X. and Huttenlocher, D. P. (2004). A unified spatio-
temporal articulated model for tracking. In IEEE
Computer Society Conference on Computer Vision
and Pattern Recognition (CVPR), 2004, volume 1,
pages I–722. IEEE.
Lan, X. and Huttenlocher, D. P. (2005). Beyond trees:
Common-factor models for 2d human pose recovery.
In Tenth IEEE International Conference on Computer
Vision (ICCV), 2005, volume 1, pages 470–477. IEEE.
Mori, G. and Malik, J. (2002). Estimating human body con-
figurations using shape context matching. In Com-
puter Vision ECCV 2002, pages 666–680. Springer.
Mu, Y. and Yin, Y. (2010). Human-humanoid robot inter-
action system based on spoken dialogue and vision.
In 3rd IEEE International Conference on Computer
Science and Information Technology (ICCSIT), 2010,
volume 6, pages 328–332. IEEE.
Ramakrishna, V., Kanade, T., and Sheikh, Y. (2012). Re-
constructing 3d human pose from 2d image land-
marks. In Computer Vision ECCV 2012, pages 573–
586. Springer.
Ramanan, D. (2006). Learning to parse images of articu-
lated bodies. In Advances in Neural Information Pro-
cessing Systems, pages 1129–1136.
Ramanan, D. and Sminchisescu, C. (2006). Training de-
formable models for localization. In IEEE Computer
Society Conference on Computer Vision and Pattern
Recognition, 2006, volume 1, pages 206–213. IEEE.
Ren, X., Berg, A. C., and Malik, J. (2005). Recovering
human body configurations using pairwise constraints
between parts. In Tenth IEEE International Confer-
ence on Computer Vision (ICCV), 2005, volume 1,
pages 824–831. IEEE.
Yao, B. and Fei-Fei, L. (2010). Modeling mutual context
of object and human pose in human-object interaction
activities. In IEEE Conference on Computer Vision
and Pattern Recognition (CVPR), 2010, pages 17–24.
IEEE.
TowardsHumanPoseSemanticSynthesisin3DbasedonQueryKeywords
427