Optimized 4D DPM for Pose Estimation on RGBD Channels using Polisphere Models

Enrique Martinez; Oliver Nina; Antonio J. Sánchez; Carlos Ricolfe

doi:10.5220/0006133702810288

Optimized 4D DPM for Pose Estimation on RGBD Channels using Polisphere Models

Enrique Martinez, Oliver Nina, Antonio J. Sánchez, Carlos Ricolfe

2017

Abstract

The Deformable Parts Model (DPM) is a standard method to perform human pose estimation on RGB images, 3 channels. Although there has been much work to improve such method, little work has been done on utilizing DPM on other types of imagery such as RGBD data. In this paper, we describe a formulation of the DPM model that makes use of depth information channels in order to improve joint detection and pose estimation using 4 channels. In order to offset the time complexity and overhead added to the model due to extra channels to process, we propose an optimization for the proposed algorithm based on solving direct and inverse kinematic equations, that form we can reduce the interested points reducing, at the same time, the time complexity. Our results show a significant improvement on pose estimation over the standard DPM model on our own RGBD dataset and on the public CAD60 dataset.

References

Berti, E. M., Salmerón, A. J. S., and Benimeli, F. (2012). Human-robot interaction and tracking using low cost 3d vision systems. Romanian Journal of Technical Sciences - Applied Mechanics, 7(2):1-15.
Everingham, M., Van Gool, L., Williams, C. K., Winn, J., and Zisserman, A. (2010). The pascal visual object classes (voc) challenge. International journal of computer vision, 88(2):303-338.
Felzenszwalb, P., McAllester, D., and Ramanan, D. (2008). A discriminatively trained, multiscale, deformable part model. In 2013 IEEE Conference on Computer Vision and Pattern Recognition, pages 1-8. IEEE.
Felzenszwalb, P. F., Girshick, R. B., McAllester, D., and Ramanan, D. (2010). Object detection with discriminatively trained part-based models. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 32(9):1627-1645.
Felzenszwalb, P. F. and Huttenlocher, D. P. (2005). Pictorial structures for object recognition. International Journal of Computer Vision, 61(1):55-79.
Gupta, R., Chia, A. Y.-S., and Rajan, D. (2013). Human activities recognition using depth images. In Proceedings of the 21st ACM international conference on Multimedia, pages 283-292. ACM.
Khalil, W. and Dombre, E. (2004). Modeling, identification and control of robots. Butterworth-Heinemann.
Matas, J., Chum, O., Urban, M., and Pajdla, T. (2004). Robust wide-baseline stereo from maximally stable extremal regions. Image and Vision Computing, 22(10):761-767.
Ni, B., Pei, Y., Moulin, P., and Yan, S. (2013). Multilevel depth and image fusion for human activity detection. Cybernetics, IEEE Transactions on, 43(5):1383- 1394.
R. Faria, D., Premebida, C., and Nunes, U. (2014). A probalistic approach for human everyday activities recognition using body motion from rgb-d images. IEEE RO-MAN'14: IEEE International Symposium on Robot and Human Interactive Communication.
Saffari, A., Leistner, C., Santner, J., Godec, M., and Bischof, H. (2009). On-line random forests. In Computer Vision Workshops (ICCV Workshops), 2009 IEEE 12th International Conference on, pages 1393- 1400. IEEE.
Shan, J. and Akella, S. (2014). 3d human action segmentation and recognition using pose kinetic energy. IEEE Workshop on Advanced Robotics and its Social Impacts (ARSO).
Shotton, J., Sharp, T., Kipman, A., Fitzgibbon, A., Finocchio, M., Blake, A., Cook, M., and Moore, R. (2013). Real-time human pose recognition in parts from single depth images. Communications of the ACM, 56(1):116-124.
Song, S. and Xiao, J. (2014). Sliding shapes for 3d object detection in depth images. In Computer Vision-ECCV 2014, pages 634-651. Springer.
Viala, C. R., Salmeron, A. J. S., and Martinez-Berti, E. (2011). Calibration of a wide angle stereoscopic system. OPTICS LETTERS, ISSN 0146-9592, pag 3064- 3067.
Viala, C. R., Salmeron, A. J. S., and Martinez-Berti, E. (2012). Accurate calibration with highly distorted images. APPLIED OPTICS, ISSN 0003-6935, pag 89- 101.
Waldron Prof, K. and Schmiedeler Prof, J. (2008). Kinematics. Springer Berlin Heidelberg.
Wang, J., Liu, Z., and Wu, Y. (2014). Learning actionlet ensemble for 3d human action recognition. In Human Action Recognition with Depth Cameras, pages 11- 40. Springer.
Wang, Y., Tran, D., Liao, Z., and Forsyth, D. (2012). Discriminative hierarchical part-based models for human parsing and action recognition. The Journal of Machine Learning Research, 13(1):3075-3102.
Yang, Y. and Ramanan, D. (2013). Articulated human detection with flexible mixtures of parts. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 35(12):2878-2890.

Download

Paper Citation

in Harvard Style

Martinez E., Nina O., Sánchez A. and Ricolfe C. (2017). Optimized 4D DPM for Pose Estimation on RGBD Channels using Polisphere Models . In Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 5: VISAPP, (VISIGRAPP 2017) ISBN 978-989-758-226-4, pages 281-288. DOI: 10.5220/0006133702810288

in Bibtex Style

@conference{visapp17,
author={Enrique Martinez and Oliver Nina and Antonio J. Sánchez and Carlos Ricolfe},
title={Optimized 4D DPM for Pose Estimation on RGBD Channels using Polisphere Models},
booktitle={Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 5: VISAPP, (VISIGRAPP 2017)},
year={2017},
pages={281-288},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006133702810288},
isbn={978-989-758-226-4},
}

in EndNote Style

TY - CONF
JO - Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 5: VISAPP, (VISIGRAPP 2017)
TI - Optimized 4D DPM for Pose Estimation on RGBD Channels using Polisphere Models
SN - 978-989-758-226-4
AU - Martinez E.
AU - Nina O.
AU - Sánchez A.
AU - Ricolfe C.
PY - 2017
SP - 281
EP - 288
DO - 10.5220/0006133702810288