better addressing the problem of the non-linear rela-
tionship between depth resolution and distance from
the sensor. For the 3D reconstruction of the scene,
we propose to fit cuboids to the objects composing
the scene since this shape is well adapted to most of
the indoor objects. Unlike the state-of-the-art method
(Jiang and Xiao, 2013) that runs a global optimization
process over sets of cuboids with strong constraints,
we propose to automatically segment the image, as a
preliminary step, in order to focus on the local cuboid
fitting on each extracted object. It is shown that our
method is robust to viewpoint and object orientation
variations. It is able to provide meaningful interpreta-
tions even in scenes with strong clutter and occlusion.
More importantly, it outperforms the state-of-the-art
approach not only in accuracy but also in robustness
and time complexity. Finally, a ground truth dataset
for which the exact 3D positions of the objects have
been measured is provided. This dataset can be used
for future comparisons.
REFERENCES
Andersen, M. R., Jensen, T., Lisouski, P., Mortensen, A.,
Hansen, M., Gregersen, T., and Ahrendt, P. (2012).
Kinect depth sensor evaluation for computer vision
applications. Technical Report ECE-TR-06, Aarhus
University.
Bazin, J. C., Seo, Y., Demonceaux, C., Vasseur, P., Ikeuchi,
K., Kweon, I., and Pollefeys, M. (2012). Globally
optimal line clustering and vanishing point estimation
in manhattan world. In CVPR, pages 638–645.
Coughlan, J. M. and Yuille, A. L. (1999). Manhattan world:
Compass direction from a single image by bayesian
inference. In ICCV, pages 941–947.
Cupec, R., Nyarko, E. K., and Filko, D. (2011). Fast 2.5d
mesh segmentation to approximately convex surfaces.
In ECMR, pages 49–54.
Dou, M., Guan, L., Frahm, J.-M., and Fuchs, H. (2013).
Exploring high-level plane primitives for indoor 3d re-
construction with a hand-held rgb-d camera. In ACCV
2012 Workshops, volume 7729 of Lecture Notes in
Computer Science, pages 94–108. Springer Berlin
Heidelberg.
Dwibedi, D., Malisiewicz, T., Badrinarayanan, V., and Ra-
binovich, A. (2016). Deep cuboid detection: Beyond
2d bounding boxes.
Fischler, M. A. and Bolles, R. C. (1981). Random sample
consensus: A paradigm for model fitting with appli-
cations to image analysis and automated cartography.
Commun. ACM, 24(6):381–395.
Hedau, V., Hoiem, D., and Forsyth, D. (2009). Recovering
the spatial layout of cluttered rooms. In ICCV, pages
1849–1856.
Izadi, S., Kim, D., Hilliges, O., Molyneaux, D., Newcombe,
R., Kohli, P., Shotton, J., Hodges, S., Freeman, D.,
Davison, A., and Fitzgibbon, A. (2011). Kinectfu-
sion: real-time 3d reconstruction and interaction using
a moving depth camera. In UIST, pages 559–568.
Jia, Z., Gallagher, A., Saxena, A., and Chen, T. (2013). 3d-
based reasoning with blocks, support, and stability. In
CVPR.
Jiang, H. (2014). Finding Approximate Convex Shapes in
RGBD Images, pages 582–596. Springer International
Publishing, Cham.
Jiang, H. and Xiao, J. (2013). A linear approach to matching
cuboids in rgbd images. In CVPR.
Lin, D., Fidler, S., and Urtasun, R. (2013). Holistic scene
understanding for 3d object detection with rgbd cam-
eras. In ICCV, pages 1417–1424.
Mirzaei, F. and Roumeliotis, S. (2011). Optimal estimation
of vanishing points in a manhattan world. In ICCV,
pages 2454–2461.
Neumann, D., Lugauer, F., Bauer, S., Wasza, J., and
Hornegger, J. (2011). Real-time rgb-d mapping and
3-d modeling on the gpu using the random ball cover
data structure. In ICCV Workshops, pages 1161–1167.
Neverova, N., Muselet, D., and Trémeau, A. (2013). 2 1/2
d scene reconstruction of indoor scenes from single
rgb-d images. In CCIW, pages 281–295.
Piegl, L. (1991). On nurbs: a survey. IEEE Computer
Graphics and Applications, 11(1):55–71.
Ren, Z. and Sudderth, E. B. (2016). Three-dimensional ob-
ject detection and layout prediction using clouds of
oriented gradients. IEEE CVPR.
Richtsfeld, A., Mörwald, T., Prankl, J., Balzer, J., Zillich,
M., and Vincze, M. (2012). Towards scene under-
standing - object segmentation using rgbd-images. In
CVWW.
Schwing, A. and Urtasun, R. (2012). Efficient exact infer-
ence for 3d indoor scene understanding. In ECCV,
volume 7577 of Lecture Notes in Computer Science,
pages 299–313. Springer Berlin Heidelberg.
Silberman, N., Hoiem, D., Kohli, P., and Fergus, R. (2012).
Indoor segmentation and support inference from rgbd
images. In ECCV, pages 746–760. Springer.
Taylor, C. and Cowley, A. (2011). Fast scene analysis using
image and range data. In ICRA, pages 3562–3567.
Taylor, C. and Cowley, A. (2012). Parsing indoor scenes
using rgb-d imagery. In RSS.
Zhang, J., Kan, C., Schwing, A. G., and Urtasun, R. (2013).
Estimating the 3d layout of indoor scenes and its clut-
ter from depth sensors. In ICCV, pages 1273–1280.
Zhang, Y., Song, S., Tan, P., and Xiao, J. (2014). PanoCon-
text: A Whole-Room 3D Context Model for Panoramic
Scene Understanding, pages 668–686. Springer Inter-
national Publishing, Cham.
3D Reconstruction of Indoor Scenes using a Single RGB-D Image
401