In our future work, we plan to test the part-based
perception approach on a bigger data set also using
more advanced network structures as well as extend
the approach to infer not only objects, but also their
functions (affordances). The method performance
will be compared to the existing techniques and vali-
dated on real RGB-D data. In order to make predic-
tions more robust, more rotation axes and view angles
could be included in the training. Additionally, artifi-
cial noise and visual obstacles could be applied on the
training data to increase robustness even further.
REFERENCES
Biederman, I. (1987). Recognition-by-components: a the-
ory of human image understanding. Psychological re-
view, 94(2):115.
Chen, X., Golovinskiy, A., and Funkhouser, T. (2009). A
benchmark for 3D mesh segmentation. ACM Trans-
actions on Graphics (Proc. SIGGRAPH), 28(3).
Felzenszwalb, P. F., Girshick, R. B., McAllester, D., and
Ramanan, D. (2010). Object detection with discrimi-
natively trained part-based models. IEEE transactions
on pattern analysis and machine intelligence, 32(9).
Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014).
Rich feature hierarchies for accurate object detection
and semantic segmentation. In IEEE Conference on
Computer Vision and Pattern Recognition (CVPR),
pages 580–587. IEEE.
Gschwandtner, M., Kwitt, R., Uhl, A., and Pree, W. (2011).
Blensor: Blender sensor simulation toolbox. In Ad-
vances in Visual Computing, volume 6939 of Lec-
ture Notes in Computer Science, chapter 20. Springer
Berlin / Heidelberg, Berlin, Heidelberg.
Gupta, S., Arbel´aez, P., Girshick, R., and Malik, J. (2015).
Aligning 3d models to rgb-d images of cluttered
scenes. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, pages
4731–4740.
Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I.,
and Salakhutdinov, R. R. (2012). Improving neural
networks by preventing co-adaptation of feature de-
tectors. arXiv preprint arXiv:1207.0580.
Holzer, S., Rusu, R. B., Dixon, M., Gedikli, S., and Navab,
N. (2012). Adaptive neighborhood selection for real-
time surface normal estimation from organized point
cloud data using integral images. In International
Conference on Intelligent Robots and Systems (IROS),
pages 2684–2689. IEEE.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Im-
agenet classification with deep convolutional neural
networks. In Advances in neural information process-
ing systems, pages 1097–1105.
Leibe, B., Leonardis, A., and Schiele, B. (2004). Combined
object categorization and segmentation with an im-
plicit shape model. In Workshop on statistical learn-
ing in computer vision, ECCV, volume 2.
Oliveira, G. L., Valada, A., Bollen, C., Burgard, W., and
Brox, T. (2016). Deep learning for human part dis-
covery in images. In IEEE International Conference
on Robotics and Automation (ICRA).
Papon, J. and Schoeler, M. (2015). Semantic pose using
deep networks trained on synthetic rgb-d. In IEEE
International Conference on Computer Vision (ICCV).
Schoeler, M., Papon, J., and Worgotter, F. (2015). Con-
strained planar cuts - object partitioning for point
clouds. In The IEEE Conference on Computer Vision
and Pattern Recognition (CVPR).
Tsogkas, S., Kokkinos, I., Papandreou, G., and Vedaldi, A.
(2015). Semantic part segmentation with deep learn-
ing. arXiv preprint arXiv:1505.02438.
Wang, Y., Asafi, S., van Kaick, O., Zhang, H., Cohen-Or,
D., and Chen, B. (2012). Active co-analysis of a set
of shapes. 31(6):165:1–165:10.
Zhang, N., Donahue, J., Girshick, R., and Darrell, T.
(2014). Part-based r-cnns for fine-grained category de-
tection. In European Conference on Computer Vision.
Springer.