5 CONCLUSIONS AND FUTURE
WORK
In this paper, we propose an approach for detecting
and segmenting human instances in a point cloud,
based on an accelerated version of the deformable part
model algorithm and a pairwise CRF energy function
defined over different RGBD features. Experiments
showed that the quality of the segmentation depends
highly on the detection box provided from the detec-
tion algorithm. Also, metric results between the dif-
ferent edge potentials did not provide a significant dif-
ference between them.
Current work in progress is in the direction of
improving the unary potentials, incorporating depth
based features for the decision tree ensemble but also
generating a score map taking into account the scores
returned by the detector.
In the future, we are planning to investigate the
extension of the proposed energy function for incor-
porating higher order potentials (defined over a set of
pixels) using appearance or depth information. We
believe that adding shape constraints will deliver bet-
ter segmentation results compared to the ones mod-
elling only up to pairwise relations. Furthermore, we
are also interested in looking into additional solutions
for improving the quality of the detection boxes. Last
but not least, our proposed algorithm will be tested
and evaluated on different objects for verifying its ro-
bustness, using pairwise but also higher order poten-
tials in the energy function.
REFERENCES
Boykov, Y. and Kolmogorov, V. (2004). An experimental
comparison of min-cut/max-flow algorithms for en-
ergy minimization in vision. IEEE Trans. Pattern
Anal. Mach. Intell., 26(38):1124–1137.
Dubout, C. and Fleuret, F. (2013). Deformable part mod-
els with individual part scaling. In Proceedings of the
British Machine Vision Conference (BMVC), pages
28.1–28.10.
Everingham, M., Eslami, S. M. A., Van Gool, L., Williams,
C. K. I., Winn, J., and Zisserman, A. (2015). The
pascal visual object classes challenge: A retrospec-
tive. International Journal of Computer Vision,
111(38):98–136.
Felzenszwalb, P. F., Girshick, R. B., McAllester, D., and
Ramanan, D. (2010). Object detection with discrimi-
natively trained part-based models. IEEE Trans. Pat-
tern Anal. Mach. Intell., 32(9):1627–1645.
Gupta, S., Girshick, R., Arbel
´
aez, P., and Malik, J. (2014).
Learning rich features from RGB-D images for object
detection and segmentation. In Proceedings of the Eu-
ropean Conference on Computer Vision (ECCV).
H
¨
ansch, R. (2014). Generic object categorization in Pol-
SAR images - and beyond. PhD thesis, Technische
Universit
¨
at Berlin, Germany.
Hariharan, B., Arbel
´
aez, P., Girshick, R., and Malik, J.
(2014). Simultaneous detection and segmentation.
In Proceedings of the European Conference on Com-
puter Vision (ECCV).
Joachims, T., Finley, T., and Yu, C.-N. J. (2009). Cutting-
plane training of structural svms. Mach. Learn.,
77(1):27–59.
Ladicky, L., Sturgess, P., Alahari, K., Russell, C., and Torr,
P. H. S. (2010). What, where and how many? combin-
ing object detectors and crfs. In Daniilidis, K., Mara-
gos, P., and Paragios, N., editors, ECCV, volume 6314
of Lecture Notes in Computer Science, pages 424–
437. Springer.
Lafferty, J. D., McCallum, A., and Pereira, F. C. N. (2001).
Conditional random fields: Probabilistic models for
segmenting and labeling sequence data. In Proceed-
ings of the Eighteenth International Conference on
Machine Learning, ICML ’01, pages 282–289, San
Francisco, CA, USA. Morgan Kaufmann Publishers
Inc.
Lai, K., Bo, L., Ren, X., and Fox, D. (2012). Detection-
based object labeling in 3d scenes. In IEEE Interna-
tional Conference on on Robotics and Automation.
Platt, J. C. (1999). Probabilistic outputs for support vector
machines and comparisons to regularized likelihood
methods. In Advances in Large Margin Classifiers,
pages 61–74. MIT Press.
Shotton, J., Girshick, R. B., Fitzgibbon, A. W., Sharp, T.,
Cook, M., Finocchio, M., Moore, R., Kohli, P., Crim-
inisi, A., Kipman, A., and Blake, A. (2013). Efficient
human pose estimation from single depth images.
IEEE Trans. Pattern Anal. Mach. Intell., 35(12):2821–
2840.
Shu, G., Dehghan, A., and Shah, M. (2013). Improving
an object detector and extracting regions using super-
pixels. In Proceedings of the 2013 IEEE Conference
on Computer Vision and Pattern Recognition, CVPR
’13, pages 3721–3727, Washington, DC, USA. IEEE
Computer Society.
Szummer, M., Kohli, P., and Hoiem, D. (2008). Learn-
ing crfs using graph cuts. In European Conference
on Computer Vision.
Teichman, A., Lussier, J. T., and Thrun, S. (2013). Learning
to segment and track in rgbd. IEEE T. Automation
Science and Engineering, pages 841–852.
Tsochantaridis, I., Joachims, T., Hofmann, T., and Altun,
Y. (2005). Large margin methods for structured and
interdependent output variables. J. Mach. Learn. Res.,
6:1453–1484.
Vibhav Vineet, Jonathan Warrell, L. L. and Torr, P.
(2011). Human instance segmentation from video us-
ing detector-based conditional random fields. In Pro-
ceedings of the British Machine Vision Conference,
pages 80.1–80.11. BMVA Press.
Human Recognition in RGBD Combining Object Detectors and Conditional Random Fields
663