for plausible image-based navigation. ACM Transacti-
ons on Graphics (TOG), 32(3):30.
Chen, S. E. and Williams, L. (1993). View interpola-
tion for image synthesis. In Proceedings of the 20th
annual conference on Computer graphics and inte-
ractive techniques, pages 279–288. ACM.
Dosovitskiy, A., Tobias Springenberg, J., and Brox, T.
(2015). Learning to generate chairs with convolutio-
nal neural networks. In Proceedings of the IEEE Con-
ference on Computer Vision and Pattern Recognition,
pages 1538–1546.
Fitzgibbon, A., Wexler, Y., and Zisserman, A. (2005).
Image-based rendering using image-based priors. In-
ternational Journal of Computer Vision, 63(2):141–
151.
Flynn, J., Neulander, I., Philbin, J., and Snavely, N. (2016).
Deepstereo: Learning to predict new views from the
world’s imagery. In Proceedings of the IEEE Con-
ference on Computer Vision and Pattern Recognition,
pages 5515–5524.
Geiger, A., Lenz, P., and Urtasun, R. (2012). Are we re-
ady for autonomous driving? the kitti vision bench-
mark suite. In Computer Vision and Pattern Recogni-
tion (CVPR), 2012 IEEE Conference on, pages 3354–
3361. IEEE.
Godard, C., Mac Aodha, O., and Brostow, G. J. (2016).
Unsupervised monocular depth estimation with left-
right consistency. arXiv preprint arXiv:1609.03677.
Goesele, M., Ackermann, J., Fuhrmann, S., Haubold, C.,
Klowsky, R., Steedly, D., and Szeliski, R. (2010). Am-
bient point clouds for view interpolation. In ACM
Transactions on Graphics (TOG), volume 29, page 95.
ACM.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resi-
dual learning for image recognition. In Proceedings of
the IEEE conference on computer vision and pattern
recognition, pages 770–778.
Hosni, A., Bleyer, M., Rhemann, C., Gelautz, M., and
Rother, C. (2011). Real-time local stereo matching
using guided image filtering. In Multimedia and Expo
(ICME), 2011 IEEE International Conference on, pa-
ges 1–6. IEEE.
Huang, G., Liu, Z., Weinberger, K. Q., and van der Maaten,
L. (2017). Densely connected convolutional networks.
In Proceedings of the IEEE conference on computer
vision and pattern recognition, volume 1, page 3.
Jaderberg, M., Simonyan, K., Zisserman, A., et al. (2015).
Spatial transformer networks. In Advances in neural
information processing systems, pages 2017–2025.
Kalantari, N. K., Wang, T.-C., and Ramamoorthi, R. (2016).
Learning-based view synthesis for light field cameras.
ACM Transactions on Graphics (TOG), 35(6):193.
Kendall, A., Martirosyan, H., Dasgupta, S., Henry, P., Ken-
nedy, R., Bachrach, A., and Bry, A. (2017). End-to-
end learning of geometry and context for deep stereo
regression. arXiv preprint arXiv:1703.04309.
Kingma, D. P. and Ba, J. (2014). Adam: A method for sto-
chastic optimization. arXiv preprint arXiv:1412.6980.
Liu, M., He, X., and Salzmann, M. (2018). Geometry-aware
deep network for single-image novel view synthesis.
arXiv preprint arXiv:1804.06008.
Penner, E. and Zhang, L. (2017). Soft 3d reconstruction
for view synthesis. ACM Transactions on Graphics
(TOG), 36(6):235.
Ronneberger, O., Fischer, P., and Brox, T. (2015). U-net:
Convolutional networks for biomedical image seg-
mentation. In International Conference on Medical
image computing and computer-assisted intervention,
pages 234–241. Springer.
Scharstein, D. (1996). Stereo vision for view synthesis.
In Computer Vision and Pattern Recognition, 1996.
Proceedings CVPR’96, 1996 IEEE Computer Society
Conference on, pages 852–858. IEEE.
Seitz, S. M. and Dyer, C. R. (1995). Physically-valid view
synthesis by image interpolation. In Representation
of Visual Scenes, 1995.(In Conjuction with ICCV’95),
Proceedings IEEE Workshop on, pages 18–25. IEEE.
Seitz, S. M. and Dyer, C. R. (1996). View morphing. In
Proceedings of the 23rd annual conference on Com-
puter graphics and interactive techniques, pages 21–
30. ACM.
Tatarchenko, M., Dosovitskiy, A., and Brox, T. (2016).
Multi-view 3d models from single images with a con-
volutional network. In European Conference on Com-
puter Vision, pages 322–337. Springer.
Wang, Z., Bovik, A. C., Sheikh, H. R., and Simoncelli, E. P.
(2004). Image quality assessment: from error visi-
bility to structural similarity. IEEE transactions on
image processing, 13(4):600–612.
Yang, J., Reed, S. E., Yang, M.-H., and Lee, H. (2015).
Weakly-supervised disentangling with recurrent trans-
formations for 3d view synthesis. In Advances in
Neural Information Processing Systems, pages 1099–
1107.
Yin, X., Wei, H., Wang, X., Chen, Q., et al. (2018). Novel
view synthesis for large-scale scene using adversarial
loss. arXiv preprint arXiv:1802.07064.
Zhong, Y., Dai, Y., and Li, H. (2017). Self-supervised lear-
ning for stereo matching with self-improving ability.
arXiv preprint arXiv:1709.00930.
Zhou, T., Tulsiani, S., Sun, W., Malik, J., and Efros, A. A.
(2016). View synthesis by appearance flow. arXiv
preprint arXiv:1605.03557.
Zitnick, C. L., Kang, S. B., Uyttendaele, M., Winder, S., and
Szeliski, R. (2004). High-quality video view interpo-
lation using a layered representation. In ACM tran-
sactions on graphics (TOG), volume 23, pages 600–
608. ACM.
Fast View Synthesis with Deep Stereo Vision
799