edges. The geometric consistency of our predictions is
improved with a view-synthesis loss that targets incon-
sistencies. Our experiments show that the proposed
method reduces gross errors in inverse-depth views of
the mesh by up to 77.5%.
While in this paper we focus on correcting meshes
of urban scenes built from street-level sensors, our
method is generally applicable to environments with
strong priors that can be learnt from data.
ACKNOWLEDGEMENTS
The authors would like to acknowledge the support
of the UK’s Engineering and Physical Sciences Re-
search Council (EPSRC) through the Centre for Doc-
toral Training in Autonomous Intelligent Machines
and Systems (AIMS) Programme Grant EP/L015897/1.
Paul Newman is supported by EPSRC Programme
Grant EP/M019918/1.
REFERENCES
Bredies, K., Kunisch, K., and Pock, T. (2010). Total gener-
alized variation. SIAM Journal of Imaging Sciences,
3:4920–526.
Canny, J. F. (1986). A computational approach to edge
detection. IEEE Transactions on Pattern Analysis and
Machine Intelligence, PAMI-8:679–698.
Chambolle, A. and Pock, T. (2011). A first-order primal-
dual algorithm for convex problems with applications
to imaging. Journal of Mathematical Imaging and
Vision, 40(1):120–145.
Dharmasiri, T., Spek, A., and Drummond, T. (2019). ENG:
End-to-end neural geometry for robust depth and pose
estimation using CNNs. In Proceedings of the Asian
Conference on Computer Vision (ACCV), pages 625–
642.
Eldesokey, A., Felsberg, M., and Khan, F. S. (2018). Prop-
agating confidences through CNNs for sparse data re-
gression. In Proceedings of the British Machine Vision
Conference (BMVC).
Felzenszwalb, P. F. and Huttenlocher, D. P. (2012). Distance
transforms of sampled functions. Theory of Computing,
8:415–428.
Godard, C., Mac Aodha, O., and Brostow, G. J. (2017). Un-
supervised monocular depth estimation with left-right
consistency. In Proceedings of the IEEE International
Conference on Computer Vision and Pattern Recogni-
tion (CVPR).
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep residual
learning for image recognition. In Proceedings of the
IEEE International Conference on Computer Vision
and Pattern Recognition (CVPR), pages 770–778.
Hua, J. and Gong, X. (2018). A normalized convolutional
neural network for guided sparse depth upsampling. In
Proceedings of the International Joint Conference on
Artificial Intelligence (IJCAI).
Jeon, J. and Lee, S. (2018). Reconstruction-based pairwise
depth dataset for depth image enhancement using CNN.
In Proceedings of the European Conference on Com-
puter Vision (ECCV).
Kingma, D. and Ba, J. (2015). Adam: A method for stochas-
tic optimization. In Proceedings of the International
Conference on Learning Representations (ICLR).
Klodt, M. and Vedaldi, A. (2018). Supervising the new with
the old: Learning SfM from SfM. In Proceedings of
the European Conference on Computer Vision (ECCV).
Knutsson, H. and Westin, C.-F. (1993). Normalized and
differential convolution: Methods for interpolation and
filtering of incomplete and uncertain data. In Proceed-
ings of the IEEE International Conference on Com-
puter Vision and Pattern Recognition (CVPR).
Kopf, J., Cohen, M. F., Lischinski, D., and Uyttendaele, M.
(2007). Joint bilateral upsampling. ACM Transactions
on Graphics, 26:96.
Kwon, H., Tai, Y.-W., and Lin, S. (2015). Data-driven depth
map refinement via multi-scale sparse representation.
In Proceedings of the IEEE International Conference
on Computer Vision and Pattern Recognition (CVPR),
pages 159–167.
Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., and
Navab, N. (2016). Deeper depth prediction with fully
convolutional residual networks. In Proceedings of the
IEEE International Conference on 3D Vision (3DV),
pages 239–248.
Ma, F. and Karaman, S. (2018). Sparse-to-dense: Depth pre-
diction from sparse depth samples and a single image.
In Proceedings of the IEEE International Conference
on Robotics and Automation (ICRA).
Mahjourian, R., Wicke, M., and Angelova, A. (2018). Un-
supervised learning of depth and ego-motion from
monocular video using 3D geometric constraints. In
Proceedings of the IEEE International Conference
on Computer Vision and Pattern Recognition (CVPR),
pages 5667–5675.
Matsuo, T., Fukushima, N., and Ishibashi, Y. (2013).
Weighted joint bilateral filter with slope depth compen-
sation filter for depth map refinement. In Proceedings
of the International Conference on Computer Vision
Theory and Applications (VISAPP).
Newcombe, R. A., Izadi, S., Hilliges, O., Molyneaux, D.,
Kim, D., Davison, A. J., Kohli, P., Shotton, J., Hodges,
S., and Fitzgibbon, A. W. (2011). KinectFusion: Real-
time dense surface mapping and tracking. In IEEE
International Symposium on Mixed and Augmented
Reality, pages 127–136.
Owen, A. B. (2007). A robust hybrid of lasso and ridge
regression. Contemporary Mathematics, 443:59–72.
Riegler, G., R
¨
uther, M., and Bischof, H. (2016). ATGV-Net:
Accurate depth super-resolution. In Proceedings of the
European Conference on Computer Vision (ECCV).
Ronneberger, O., Fischer, P., and Brox, T. (2015). U-Net:
Convolutional networks for biomedical image segmen-
tation. In Medical Image Computing and Computer
Assisted Intervention (MICCAI).