ants, we demonstrated that vanilla SfM outperforms
the other formulations in terms of accuracy of the re-
constructed objects. The main conclusion that we can
make is that a lower re-projection error does not ne-
cessarily correspond to a better structure, which puts
into question the accuracy of this metric as a measure
of the structure estimate when dealing with specific
objects, rather than the entire scene. In the future,
we will study the effect of semantics on the recon-
struction of the object of interest, and whether additi-
onal prior information about the nature of the object
could improve on the vanilla result of the SfM pro-
blem.
ACKNOWLEDGEMENTS
This work was supported by the Lebanese National
Council for Scientific Research (LNCSR).
REFERENCES
Bao, S. Y., Bagra, M., Chao, Y. W., and Savarese, S. (2012).
Semantic structure from motion with points, regions,
and objects. In IEEE Conference on Computer Vision
and Pattern Recognition (CVPR), pages 2703–2710.
Besl, P. J. and McKay, N. D. (1992). A method for regis-
tration of 3-d shapes. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 14(2):239–256.
Cipolla, R. and Robertson, D. (1999). 3d models of ar-
chitectural scenes from uncalibrated images and va-
nishing points. In Image Analysis and Processing,
1999. Proceedings. International Conference on, pa-
ges 824–829.
Crocco, M., Rubino, C., and Del Bue, A. (2016). Struc-
ture from motion with objects. In IEEE Conference
on Computer Vision and Pattern Recognition (CVPR).
Debevec, P. E., Taylor, C. J., and Malik, J. (1996). Mo-
deling and rendering architecture from photographs:
A hybrid geometry-and image-based approach. In
Annual conference on Computer graphics and inte-
ractive techniques, pages 11–20.
Fioraio, N. and Di Stefano, L. (2013). Joint detection,
tracking and mapping by semantic bundle adjustment.
In IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), pages 1538–1545.
Frahm, J. M., Fite-Georgel, P., Gallup, D., Johnson, T., Ra-
guram, R., Wu, C., and Pollefeys, M. (2010). Building
rome on a cloudless day. In European Conference on
Computer Vision (ECCV), pages 368–381.
Frost, D. P. and Murray, D. W. (2016). Object-aware bundle
adjustment for correcting monocular scale drift. In
IEEE International Conference on Robotics and Au-
tomation (ICRA), pages 4770–4776.
G
´
alvez-L
´
opez, D., Salas, M., Tard
´
os, J. D., and Montiel,
J. M. M. (2016). Real-time monocular object slam.
Robotics and Autonomous Systems, 75.
Hartley, R. and Zisserman, A. (2003). Multiple view geome-
try in computer vision. Cambridge University Press.
Kowdle, A., Batra, D., Chen, W. C., and Chen, T. (2010).
imodel: interactive co-segmentation for object of in-
terest 3d modeling. In European Conference on Com-
puter Vision (ECCV), pages 211–224.
Kr
¨
ahenb
¨
uhl, P. and Koltun, V. (2014). Geodesic object pro-
posals. In European Conference on Computer Vision,
pages 725–739.
Ma, T., Sun, Z., Zhang, W., and Chen, Q. (2015). Three-
dimensional reconstruction of a cylinder surface based
on constrained bundle adjustment. Optical Engineer-
ing, 54(6):063101–063101.
Oh, B. M., Chen, M., Dorsey, J., and Durand, F. (2001).
Image-based modeling and photo editing. In Annual
conference on Computer graphics and interactive
techniques, pages 433–442.
Sch
¨
onberger, J. L. and Frahm, J. M. (2016). Structure-from-
motion revisited. In IEEE Conference on Computer
Vision and Pattern Recognition (CVPR).
Sinha, S. N., Steedly, D., Szeliski, R., Agrawala, M., and
Pollefeys, M. (2008). Interactive 3d architectural mo-
deling from unordered photo collections. ACM Tran-
sactions on Graphics, 27(5):159.
Strecha, C., Von Hansen, W., Van Gool, L., Fua, P., and
Thoennessen, U. (2008). On benchmarking camera
calibration and multi-view stereo for high resolution
imagery. In Computer Vision and Pattern Recognition,
2008. CVPR 2008. IEEE Conference on, pages 1–8.
Ieee.
S
¨
underhauf, N., Dayoub, F., McMahon, S., Eich, M., Up-
croft, B., and Milford, M. (2015). Slam–quo vadis?
in support of object oriented and semantic slam. In
Robotics and Systems (RSS) Workshop, Rome, Italy.
Triggs, B., McLauchlan, P. F., Hartley, R. I., and Fitzgibbon,
A. W. (1999). Bundle adjustment—a modern synthe-
sis. In Hieidelberg, S. B., editor, International works-
hop on vision algorithms., pages 298–372.
Turk, G. and Levoy, M. (2005). The stanford bunny.
Van den Hengel, A., Dick, A., Thorm
¨
ahlen, T., Ward, B.,
and Torr, P. H. (2007). Videotrace: rapid interactive
scene modelling from video. ACM Transactions on
Graphics (ToG), 26(3):86.
Xiao, J. (2014). Sfmedu. http://vision.princeton.edu/
courses/SFMedu/.
Zhu, H., Meng, F., Cai, J., and Lu, S. (2016). Beyond pixels:
A comprehensive survey from bottom-up to seman-
tic image segmentation and cosegmentation. Journal
of Visual Communication and Image Representation,
34:12–27.
VISAPP 2018 - International Conference on Computer Vision Theory and Applications
548