5 CONCLUSION
In this paper, we have shown that using the constraints
have improved the depth and orientation estimates of
piecewise planar structures in city scale urban envi-
ronments.
By training PlaneRCNN to detect the buildings’
planar facades, the geometric information of each
visible facade can be extracted. Imposing the de-
duced multi-view geometric constraints by modifying
the standard bundle adjustment resulted in improved
depth and orientation estimates. The dense recon-
struction of the facades is obtained by using the fa-
cade masks generated by the neural network. In some
cases, the increase in depth error has been compen-
sated by the decrease of orientation error, ensuring
structural improvement.
The skyline, thus retrieved from the dense recon-
struction, can be used in navigation and path planning.
ACKNOWLEDGEMENTS
We thank Shivaan Sehgal and Sidhant Subramanian,
for annotating the building facades in SYNTHIA
dataset and Mukul Khanna, for helping out with fa-
cade detection network experiments. We also thank
Krishna Murthy J. at Real and Embodied AI Lab, Uni-
versit
´
e de Montr
´
eal for valuable feedback/advice dur-
ing the brainstorming sessions.
REFERENCES
Agarwal, S., Mierle, K., and Others. Ceres solver. http:
//ceres-solver.org.
Akbulut, Z.,
¨
Ozdemir, S., Acar, H., and Karsli, F. (2018).
Automatic building extraction from image and lidar
data with active contour segmentation. Journal of the
Indian Society of Remote Sensing, 46(12):2057–2068.
Dai, A., Chang, A. X., Savva, M., Halber, M., Funkhouser,
T., and Nießner, M. (2017). Scannet: Richly-
annotated 3d reconstructions of indoor scenes.
Gomez-Ojeda, R., Moreno, F., Zu
˜
niga-No
¨
el, D., Scara-
muzza, D., and Gonzalez-Jimenez, J. (2019). Pl-
slam: A stereo slam system through the combination
of points and line segments. IEEE Transactions on
Robotics, 35(3):734–746.
Grompone von Gioi, R., Jakubowicz, J., Morel, J.-M., and
Randall, G. (2012). LSD: a Line Segment Detector.
Image Processing On Line, 2:35–55.
He, K., Gkioxari, G., Doll
´
ar, P., and Girshick, R. (2017).
Mask r-cnn.
Joo, K., Oh, T., Kim, J., and Kweon, I. S. (2019). Robust
and globally optimal manhattan frame estimation in
near real time. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 41(3):682–696.
Khurana, D., Sankhla, S., Shukla, A., Varshney, R., Kalra,
P., and Banerjee, S. (2012). A grammar-based gui
for single view reconstruction. In Proceedings of
the Eighth Indian Conference on Computer Vision,
Graphics and Image Processing, ICVGIP ’12, New
York, NY, USA. Association for Computing Machin-
ery.
Lezama, J., Randall, G., and Grompone von Gioi, R.
(2017). Vanishing Point Detection in Urban Scenes
Using Point Alignments. Image Processing On Line,
7:131–164.
Li, H., Xing, Y., Zhao, J., Bazin, J., Liu, Z., and Liu,
Y. (2019). Leveraging structural regularity of atlanta
world for monocular slam. In 2019 International Con-
ference on Robotics and Automation (ICRA), pages
2412–2418.
Li, H., Yao, J., Bazin, J., Lu, X., Xing, Y., and Liu, K.
(2018). A monocular slam system leveraging struc-
tural regularity in manhattan world. In 2018 IEEE In-
ternational Conference on Robotics and Automation
(ICRA), pages 2518–2525.
Liu, C., Kim, K., Gu, J., Furukawa, Y., and Kautz, J.
(2018a). Planercnn: 3d plane detection and recon-
struction from a single image.
Liu, C., Yang, J., Ceylan, D., Yumer, E., and Furukawa, Y.
(2018b). Planenet: Piece-wise planar reconstruction
from a single rgb image.
Majdik, A. L., Till, C., and Scaramuzza, D. (2017). The
Zurich urban micro aerial vehicle dataset. The In-
ternational Journal of Robotics Research, 36(3):269–
273.
Ramalingam, S. and Brand, M. (2013). Lifting 3d manhat-
tan lines from a single image. In 2013 IEEE Inter-
national Conference on Computer Vision, pages 497–
504.
Ranade, S. and Ramalingam, S. (2018). Novel single view
constraints for manhattan 3d line reconstruction. In
2018 International Conference on 3D Vision (3DV),
pages 625–633.
Ronneberger, O., Fischer, P., and Brox, T. (2015). U-net:
Convolutional networks for biomedical image seg-
mentation.
Ros, G., Sellart, L., Materzynska, J., Vazquez, D., and
Lopez, A. M. (2016). The SYNTHIA Dataset: A
Large Collection of Synthetic Images for Semantic
Segmentation of Urban Scenes. In 2016 IEEE Con-
ference on Computer Vision and Pattern Recognition
(CVPR), pages 3234–3243, Las Vegas, NV, USA.
IEEE.
Schindler, G. and Dellaert, F. (2004). Atlanta world: an
expectation maximization framework for simultane-
ous low-level edge grouping and camera calibration in
complex man-made environments. In Proceedings of
the 2004 IEEE Computer Society Conference on Com-
puter Vision and Pattern Recognition, 2004. CVPR
2004., volume 1, pages I–I.
Straub, J., Freifeld, O., Rosman, G., Leonard, J. J.,
and Fisher, J. W. (2018). The manhattan frame
Multi-view Planarity Constraints for Skyline Estimation from UAV Images in City Scale Urban Environments
859