Authors:
Yong Deng
1
;
Jimin Xiao
2
and
Steven Zhiying Zhou
1
;
3
Affiliations:
1
Department of Electrical and Computer Engineering, National University of Singapore, 117583, Singapore
;
2
Department of Electrical and Electronic Engineering, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu, 215123, P.R.China
;
3
National University of Singapore Suzhou Research Institute, Suzhou, Jiangsu, 215123, P.R.China
Keyword(s):
Stereo Matching, Depth Estimation, Deep Learning, Dynamic Upsampling.
Abstract:
Deep learning based stereo matching networks achieve great success in the depth estimation from stereo image pairs.
However, current state-of-the-art methods usually are computationally intensive, which prevents them from being applied in real-time scenarios or on mobile platforms with limited computational resources.
In order to tackle this shortcoming, we propose a lightweight real-time stereo matching network for disparity estimation.
Our network adopts the efficient hierarchical Coarse-To-Fine (CTF) matching scheme, which starts matching from the low-resolution feature maps, and then upsamples and refines the previous disparity stage by stage until the full resolution. We can take the result of any stage as output to trade off accuracy and runtime.
We propose an efficient hourglass-shaped feature extractor based on the latest MobileNet V3 to extract multi-resolution feature maps from stereo image pairs. We also propose to replace the traditional upsampling method in the CTF m
atching scheme with the learning-based dynamic upsampling modules to avoid blurring effects caused by conventional upsampling methods.
Our model can process 1242 x 375 resolution images with 35-68 FPS on a GeForce GTX 1660 GPU, and outperforms all competitive baselines with comparable runtime on the KITTI 2012/2015 datasets.
(More)