Human Recognition in RGBD Combining Object Detectors and Conditional Random Fields

Konstantinos Amplianitis; Ronny Hänsch; Ralf Reulke

doi:10.5220/0005786006550663

Human Recognition in RGBD Combining Object Detectors and Conditional Random Fields

Konstantinos Amplianitis, Ronny Hänsch, Ralf Reulke

2016

Abstract

This paper addresses the problem of detecting and segmenting human instances in a point cloud. Both fields have been well studied during the last decades showing impressive results, not only in accuracy but also in computational performance. With the rapid use of depth sensors, a resurgent need for improving existing state-of-the-art algorithms, integrating depth information as an additional constraint became more ostensible. Current challenges involve combining RGB and depth information for reasoning about location and spatial extent of the object of interest. We make use of an improved deformable part model algorithm, allowing to deform the individual parts across multiple scales, approximating the location of the person in the scene and a conditional random field energy function for specifying the object’s spatial extent. Our proposed energy function models up to pairwise relations defined in the RGBD domain, enforcing label consistency for regions sharing similar unary and pairwise measurements. Experimental results show that our proposed energy function provides a fairly precise segmentation even when the resulting detection box is imprecise. Reasoning about the detection algorithm could potentially enhance the quality of the detection box allowing capturing the object of interest as a whole.

References

Boykov, Y. and Kolmogorov, V. (2004). An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Trans. Pattern Anal. Mach. Intell., 26(38):1124-1137.
Dubout, C. and Fleuret, F. (2013). Deformable part models with individual part scaling. In Proceedings of the British Machine Vision Conference (BMVC), pages 28.1-28.10.
Everingham, M., Eslami, S. M. A., Van Gool, L., Williams, C. K. I., Winn, J., and Zisserman, A. (2015). The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision, 111(38):98-136.
Felzenszwalb, P. F., Girshick, R. B., McAllester, D., and Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell., 32(9):1627-1645.
Gupta, S., Girshick, R., Arbeláez, P., and Malik, J. (2014). Learning rich features from RGB-D images for object detection and segmentation. In Proceedings of the European Conference on Computer Vision (ECCV).
Hänsch, R. (2014). Generic object categorization in PolSAR images - and beyond. PhD thesis, Technische Universität Berlin, Germany.
Hariharan, B., Arbeláez, P., Girshick, R., and Malik, J. (2014). Simultaneous detection and segmentation. In Proceedings of the European Conference on Computer Vision (ECCV).
Joachims, T., Finley, T., and Yu, C.-N. J. (2009). Cuttingplane training of structural svms. Mach. Learn., 77(1):27-59.
Ladicky, L., Sturgess, P., Alahari, K., Russell, C., and Torr, P. H. S. (2010). What, where and how many? combining object detectors and crfs. In Daniilidis, K., Maragos, P., and Paragios, N., editors, ECCV, volume 6314 of Lecture Notes in Computer Science, pages 424- 437. Springer.
Lafferty, J. D., McCallum, A., and Pereira, F. C. N. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the Eighteenth International Conference on Machine Learning, ICML 7801, pages 282-289, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc.
Lai, K., Bo, L., Ren, X., and Fox, D. (2012). Detectionbased object labeling in 3d scenes. In IEEE International Conference on on Robotics and Automation.
Platt, J. C. (1999). Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In Advances in Large Margin Classifiers , pages 61-74. MIT Press.
Shotton, J., Girshick, R. B., Fitzgibbon, A. W., Sharp, T., Cook, M., Finocchio, M., Moore, R., Kohli, P., Criminisi, A., Kipman, A., and Blake, A. (2013). Efficient human pose estimation from single depth images. IEEE Trans. Pattern Anal. Mach. Intell., 35(12):2821- 2840.
Shu, G., Dehghan, A., and Shah, M. (2013). Improving an object detector and extracting regions using superpixels. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 7813, pages 3721-3727, Washington, DC, USA. IEEE Computer Society.
Szummer, M., Kohli, P., and Hoiem, D. (2008). Learning crfs using graph cuts. In European Conference on Computer Vision.
Teichman, A., Lussier, J. T., and Thrun, S. (2013). Learning to segment and track in rgbd. IEEE T. Automation Science and Engineering, pages 841-852.
Tsochantaridis, I., Joachims, T., Hofmann, T., and Altun, Y. (2005). Large margin methods for structured and interdependent output variables. J. Mach. Learn. Res., 6:1453-1484.
Vibhav Vineet, Jonathan Warrell, L. L. and Torr, P. (2011). Human instance segmentation from video using detector-based conditional random fields. InProceedings of the British Machine Vision Conference, pages 80.1-80.11. BMVA Press.

Download

Paper Citation

in Harvard Style

Amplianitis K., Hänsch R. and Reulke R. (2016). Human Recognition in RGBD Combining Object Detectors and Conditional Random Fields . In Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, (VISIGRAPP 2016) ISBN 978-989-758-175-5, pages 655-663. DOI: 10.5220/0005786006550663

in Bibtex Style

@conference{visapp16,
author={Konstantinos Amplianitis and Ronny Hänsch and Ralf Reulke},
title={Human Recognition in RGBD Combining Object Detectors and Conditional Random Fields},
booktitle={Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, (VISIGRAPP 2016)},
year={2016},
pages={655-663},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005786006550663},
isbn={978-989-758-175-5},
}

in EndNote Style

TY - CONF
JO - Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, (VISIGRAPP 2016)
TI - Human Recognition in RGBD Combining Object Detectors and Conditional Random Fields
SN - 978-989-758-175-5
AU - Amplianitis K.
AU - Hänsch R.
AU - Reulke R.
PY - 2016
SP - 655
EP - 663
DO - 10.5220/0005786006550663