to improve both, its object detection performance as
well as runtime. By exploiting fast and descriptive
depth features, data reduction, as well as parallel pro-
cessing, the final implementation runs in near real-
time and can compete with state-of-the-art methods.
The output of the current detector are two-
dimensional axis-aligned bounding boxes in the im-
age coordinate system. To retrieve the full six degree
of freedom pose, the next step is to extract the region
in the point cloud that corresponds to the bounding
box and run ICP (Rusinkiewicz and Levoy, 2001) be-
tween a (learnt) 3D model of the object and the point
cloud region. Time complexity and result of ICP are
often improved by providing a good initial transfor-
mation, which could be generated by the Hough For-
est. To this aim the 2D Hough voting scheme needs to
be extended to 3D, in which object hypotheses are ac-
cumulated in real world and not in image coordinates.
REFERENCES
Arbeiter, G., Fuchs, S., Bormann, R., Fischer, J., and Verl,
A. (2012). Evaluation of 3d feature descriptors for
classification of surface geometries in point clouds. In
IROS 2012, pages 1644–1650.
Badami, I., St
¨
uckler, J., and Behnke, S. (2013). Depth-
enhanced hough forests for object-class detection and
continuous pose estimation. In SPME 2013, pages
1168–1174.
Ballard, D. (1981). Generalizing the hough transform to de-
tect arbitrary shapes. Pattern Recognition, 13(2):111–
122.
Bo, L., Ren, X., and Fox, D. (2013). Unsupervised feature
learning for rgb-d based object recognition. In Inter-
national Symposium on Experimental Robotics, pages
387–402.
Bo, L., Ren, X., and Fox, D. (2014). Learning hierarchical
sparse features for RGB-(D) object recognition. I. J.
Robotics Res., 33(4):581–599.
Breiman, L. (2001). Random forests. Machine Learning,
45(1):5–32.
Couprie, C., Farabet, C., Najman, L., and LeCun, Y. (2013).
Indoor semantic segmentation using depth informa-
tion. CoRR.
Farabet, C., Couprie, C., Najman, L., and LeCun, Y.
(2013). Learning hierarchical features for scene la-
beling. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 35(8):1915–1929.
Gall, J., Razavi, N., and Van Gool, L. (2012). An introduc-
tion to random forests for multi-class object detection.
In Outdoor and Large-Scale Real-World Scene Anal-
ysis, pages 243–263.
Gupta, S., Girshick, R., Arbelaez, P., and Malik, J. (2014).
Learning rich features from RGB-D images for object
detection and segmentation. In ECCV 2014.
Hinterstoisser, S., Holzer, S., Cagniart, C., Ilic, S., Kono-
lige, K., Navab, N., and Lepetit, V. (2011). Multi-
modal templates for real-time detection of texture-less
objects in heavily cluttered scenes. In ICCV 2011,
pages 858–865.
Hinterstoisser, S., Lepetit, V., Ilic, S., Fua, P., and Navab,
N. (2010). Dominant orientation templates for real-
time detection of texture-less objects. In CVPR 2010,
pages 2257–2264.
Hinterstoisser, S., Lepetit, V., Ilic, S., Holzer, S., Bradski,
G., Konolige, K., and Navab, N. (2013). Model based
training, detection and pose estimation of texture-less
3d objects in heavily cluttered scenes. In ACCV 2012,
pages 548–562.
Janoch, A., Karayev, S., Jia, Y., Barron, J., Fritz, M.,
Saenko, K., and Darrell, T. (2011). A category-level
3-d object dataset: Putting the kinect to work. In ICCV
2011, pages 1168–1174.
Knopp, J., Prasad, M., Willems, G., Timofte, R., and
Van Gool, L. (2010). Hough transform and 3d surf
for robust three dimensional classification. In ECCV
2010, pages 589–602.
Lai, K., Bo, L., Ren, X., and Fox, D. (2011a). A large-scale
hierarchical multi-view rgb-d object dataset. In ICRA,
pages 1817–1824. IEEE.
Lai, K., Bo, L., Ren, X., and Fox, D. (2011b). A scalable
tree-based approach for joint object and pose recogni-
tion. In AAAI 2011.
Leibe, B., Leonardis, A., and Schiele, B. (2006). An im-
plicit shape model for combined object categorization
and segmentation. In Toward Category-Level Object
Recognition, pages 508–524.
M
¨
orwald, T., Prankl, J., Richtsfeld, A., Zillich, M., and
Vincze, M. (2010). BLORT - The Blocks World
Robotic Vision Toolbox. In Best Practice in 3D Per-
ception and Modeling for Mobile Manipulation (in
conjunction with ICRA 2010).
Rios-Cabrera, R. and Tuytelaars, T. (2013). Discrimina-
tively trained templates for 3d object detection: A real
time scalable approach. In ICCV 2013, pages 2048–
2055.
Rusinkiewicz, S. and Levoy, M. (2001). Efficient variants
of the icp algorithm. In International Conference on
3-D Digital Imaging and Modeling.
Rusu, R., Blodow, N., and Beetz, M. (2009). Fast point
feature histograms (fpfh) for 3d registration. In ICRA
2009, pages 3212–3217.
Rusu, R., Bradski, G., Thibaux, R., and Hsu, J. (2010). Fast
3d recognition and pose using the viewpoint feature
histogram. In IROS 2010, pages 2155–2162.
Tang, S., Wang, X., Lv, X., Han, T. X., Keller, J., He,
Z., Skubic, M., and Lao, S. (2013). Histogram of
oriented normal vectors for object recognition with a
depth sensor. In ACCV 2012, pages 525–538.
Tombari, F. and Di Stefano, L. (2010). Object recognition in
3d scenes with occlusions and clutter by hough voting.
In PSIVT 2010, pages 349–355.
Vergnaud, D. (2011). Efficient and secure generalized
pattern matching via fast fourier transform. In
AFRICACRYPT 2011, pages 41–58, Berlin, Heidel-
berg. Springer-Verlag.
Wang, W., Chen, L., Chen, D., Li, S., and Kuhnlenz, K.
(2013). Fast object recognition and 6d pose estimation
using viewpoint oriented color-shape histogram. In
ICME, pages 1–6.
VISAPP 2017 - International Conference on Computer Vision Theory and Applications
186