of the small appearance change. In contrast, features
extracted from Pose-C-Net shows better results than
features obtained by PCA in such poses. We consider
the reason for this is that Pose-C-Net was trained con-
sidering pose information, so features extracted from
it can handle a pose change with slight appearance
change without deteriorating the pose estimation ac-
curacy of the other pose changes.
From the above results, we confirmed the effec-
tiveness of the proposed method.
5 CONCLUSION
In this paper, we proposed an accurate pose esti-
mation method named “Deep Manifold Embedding”
which is a supervised feature extraction method for
pose manifold using deep learning technique. We ob-
tained pose discriminative features from deep learn-
ing trained with pose information. Manifolds con-
structed from the features were effective for pose
estimation, especially in case of a pose change
with a slight appearance change. Experimental re-
sults showed that the proposed method is effective
compared with the conventional method which con-
structs manifolds from the features obtained by PCA.
Here we conducted pose estimation experiments only
around a specific rotation axis, but this method can
estimate poses around an arbitrary rotation axises if
there are corresponding training data.
As future work, we will consider a more suit-
able DCNN architecture, investigate the robustness to
complex background and various illumination condi-
tions, and compare with other state-of-the-art meth-
ods.
ACKNOWLEDGEMENTS
Parts of this research were supported by MEXT,
Grant-in-Aid for Scientific Research.
REFERENCES
Broekens, J., Heerink, M., and Rosendal, H. (2009). As-
sistive social robots in elderly care: A review. Geron-
technology, 8(2):94–103.
Chin, R. T. and Dyer, C. R. (1986). Model-based recog-
nition in robot vision. ACM Computing Surveys,
18(1):67–108.
Correll, N., Bekris, K. E., Berenson, D., Brock, O.,
Causo, A., Hauser, K., Okada, K., Rodriguez, A., Ro-
mano, J. M., and Wurman, P. R. (2016). Lessons
from the Amazon picking challenge. arXiv preprint
arXiv:1601.05484.
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei,
L. (2009). Imagenet: A large-scale hierarchical image
database. In Proc. 22nd IEEE Computer Society Conf.
on Computer Vision and Pattern Recognition, pages
248–255.
Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang,
N., Tzeng, E., and Darrell, T. (2013). DeCAF: A
deep convolutional activation feature for generic vi-
sual recognition. arXiv preprint arXiv:1310.1531.
Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I.,
and Salakhutdinov, R. R. (2012). Improving neural
networks by preventing co-adaptation of feature de-
tectors. arXiv preprint arXiv:1207.0580.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012).
Imagenet classification with deep convolutional neu-
ral networks. In Pereira, F., Burges, C. J. C., Bottou,
L., and Weinberger, K. Q., editors, Advances in Neu-
ral Information Processing Systems 25, pages 1097–
1105. Curran Associates, Inc.
Murase, H. and Nayar, S. K. (1995). Visual learning and
recognition of 3-D objects from appearance. Int. J.
Comput. Vision, 14(1):5–24.
Nair, V. and Hinton, G. E. (2010). Rectified linear units im-
prove restricted Boltzmann machines. In Furnkranz,
J. and Joachims, T., editors, Proc. 27th Int. Conf. on
Machine Learning, pages 807–814. Omnipress.
Nene, S. A., Nayar, S. K., and Murase, H. (1996). Columbia
object image library (COIL-20). Technical report,
CUCS-005-96, Department of Computer Science,
Columbia University.
Razavian, A. S., Azizpour, H., Sullivan, J., and Carlsson,
S. (2014). CNN features off-the-shelf: An astounding
baseline for recognition. In Proc. 27th IEEE Conf. on
Computer Vision and Pattern Recognition Workshops,
pages 512–519.
Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus,
R., and LeCun, Y. (2013). OverFeat: Integrated recog-
nition, localization and detection using convolutional
networks. arXiv preprint arXiv:1312.6229.