transactions on pattern analysis and machine intelli-
gence.
Eichhardt, I., Chetverikov, D., and Janko, Z. (2017). Image-
guided tof depth upsampling: a survey. Machine Vi-
sion and Applications, 28(3-4):267–282.
Geiger, A., Lenz, P., and Urtasun, R. (2012). Are we ready
for autonomous driving? the kitti vision benchmark
suite. In 2012 IEEE Conference on Computer Vision
and Pattern Recognition, pages 3354–3361. IEEE.
Girshick, R. (2015). Fast r-cnn. In Proceedings of the IEEE
international conference on computer vision, pages
1440–1448.
Hamzah, R. A. and Ibrahim, H. (2016). Literature survey
on stereo vision disparity map algorithms. Journal of
Sensors, 2016.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid-
ual learning for image recognition. In Proceedings of
the IEEE conference on computer vision and pattern
recognition, pages 770–778.
Howard, A., Sandler, M., Chu, G., Chen, L.-C., Chen, B.,
Tan, M., Wang, W., Zhu, Y., Pang, R., Vasudevan, V.,
et al. (2019). Searching for mobilenetv3. In Proceed-
ings of the IEEE International Conference on Com-
puter Vision, pages 1314–1324.
Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D.,
Wang, W., Weyand, T., Andreetto, M., and Adam,
H. (2017). Mobilenets: Efficient convolutional neu-
ral networks for mobile vision applications. arXiv
preprint arXiv:1704.04861.
Hui, T.-W., Loy, C. C., and Tang, X. (2016). Depth map
super-resolution by deep multi-scale guidance. In Eu-
ropean conference on computer vision, pages 353–
369. Springer.
Iandola, F. N., Han, S., Moskewicz, M. W., Ashraf, K.,
Dally, W. J., and Keutzer, K. (2016). Squeezenet:
Alexnet-level accuracy with 50x fewer parame-
ters and¡ 0.5 mb model size. arXiv preprint
arXiv:1602.07360.
Izadi, S., Kim, D., Hilliges, O., Molyneaux, D., Newcombe,
R., Kohli, P., Shotton, J., Hodges, S., Freeman, D.,
Davison, A., et al. (2011). Kinectfusion: real-time 3d
reconstruction and interaction using a moving depth
camera. In Proceedings of the 24th annual ACM sym-
posium on User interface software and technology,
pages 559–568.
Jia, X., De Brabandere, B., Tuytelaars, T., and Gool, L. V.
(2016). Dynamic filter networks. In Advances in neu-
ral information processing systems, pages 667–675.
Kendall, A., Martirosyan, H., Dasgupta, S., Henry, P.,
Kennedy, R., Bachrach, A., and Bry, A. (2017). End-
to-end learning of geometry and context for deep
stereo regression. In Proceedings of the IEEE Interna-
tional Conference on Computer Vision, pages 66–75.
Khamis, S., Fanello, S., Rhemann, C., Kowdle, A.,
Valentin, J., and Izadi, S. (2018). Stereonet:
Guided hierarchical refinement for real-time edge-
aware depth prediction. In Proceedings of the Euro-
pean Conference on Computer Vision (ECCV), pages
573–590.
Kingma, D. P. and Ba, J. (2014). Adam: A
method for stochastic optimization. arXiv preprint
arXiv:1412.6980.
Li, Y., Huang, J.-B., Ahuja, N., and Yang, M.-H. (2016).
Deep joint image filtering. In European Conference
on Computer Vision, pages 154–169. Springer.
Lin, T.-Y., Doll
´
ar, P., Girshick, R., He, K., Hariharan, B.,
and Belongie, S. (2017). Feature pyramid networks
for object detection. In Proceedings of the IEEE con-
ference on computer vision and pattern recognition,
pages 2117–2125.
Lucas, B. D., Kanade, T., et al. (1981). An iterative image
registration technique with an application to stereo vi-
sion.
Mancini, M., Costante, G., Valigi, P., and Ciarfuglia, T. A.
(2016). Fast robust monocular depth estimation for
obstacle detection with fully convolutional networks.
In 2016 IEEE/RSJ International Conference on Intel-
ligent Robots and Systems (IROS), pages 4296–4303.
IEEE.
Mayer, N., Ilg, E., Hausser, P., Fischer, P., Cremers, D.,
Dosovitskiy, A., and Brox, T. (2016). A large dataset
to train convolutional networks for disparity, optical
flow, and scene flow estimation. In Proceedings of
the IEEE Conference on Computer Vision and Pattern
Recognition, pages 4040–4048.
Menze, M. and Geiger, A. (2015). Object scene flow for au-
tonomous vehicles. In Proceedings of the IEEE Con-
ference on Computer Vision and Pattern Recognition,
pages 3061–3070.
Quam, L. H. (1987). Hierarchical warp stereo. In Readings
in computer vision, pages 80–86. Elsevier.
Ronneberger, O., Fischer, P., and Brox, T. (2015). U-net:
Convolutional networks for biomedical image seg-
mentation. In International Conference on Medical
image computing and computer-assisted intervention,
pages 234–241. Springer.
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and
Chen, L.-C. (2018). Mobilenetv2: Inverted residu-
als and linear bottlenecks. In Proceedings of the IEEE
conference on computer vision and pattern recogni-
tion, pages 4510–4520.
Scharstein, D. and Szeliski, R. (2002). A taxonomy and
evaluation of dense two-frame stereo correspondence
algorithms. International journal of computer vision,
47(1-3):7–42.
Schmid, K., Tomic, T., Ruess, F., Hirschm
¨
uller, H., and
Suppa, M. (2013). Stereo vision based indoor/outdoor
navigation for flying robots. In 2013 IEEE/RSJ In-
ternational Conference on Intelligent Robots and Sys-
tems, pages 3955–3962. IEEE.
Su, H., Jampani, V., Sun, D., Gallo, O., Learned-Miller,
E., and Kautz, J. (2019). Pixel-adaptive convolutional
neural networks. In Proceedings of the IEEE Con-
ference on Computer Vision and Pattern Recognition,
pages 11166–11175.
Sun, K., Xiao, B., Liu, D., and Wang, J. (2019). Deep high-
resolution representation learning for human pose es-
timation. In Proceedings of the IEEE conference on
A Lightweight Real-time Stereo Depth Estimation Network with Dynamic Upsampling Modules
709