defined in (Wang et al., 2015).
Features that can be added to the system that will
improve its efficacy include Data Augmentation, Cus-
tom Loss Function Design, and further hyperparame-
ter tuning. Quick improvements can be made by ex-
perimenting with more segmentation techniques and
even further improved by incorporating semantic seg-
mentation systems such as YOLOv3 or other FRCNN
based networks as proven in (Jiao et al., 2018) and
(Wang et al., 2015).
Further architectures to be tested and evaluated in-
clude the addition of further LSTM/GRU hidden units
such as W-Net Connected + LSTM. Preliminary re-
search in this paper and in accompanying references
suggest the time series nature of moving depth image
and the interpolated data points in the current datasets
can benefit from memory units when deducing depth
among sparsely populated depth maps. Another path
to take, illuminated by the work done in this paper, is
exploring the use of autoencoders for representation
learning of depth data to improve the inference time
of this system.
Finally, a review of appropriate loss functions will
be conducted. While MSE is a standard and staple of
measuring the success of depth-estimation, it is evi-
dent that the W-Net Connected model produces more
coherent results than U-Net, yet scored lower dur-
ing training and evaluation. From this result, we can
look on to utilizing scoring functions such as Scale In-
variant Loss, MSLE (Mean-Squared-Log-Error), and
possibly custom loss functions that take into account
more than relative or absolute difference between
ground truth and predicted images.
The end result of the improvements above will be
the practical real-time production of depth data fed
into a generic package for autonomous robotic sys-
tems equipped with obstacle detection and avoidance.
ACKNOWLEDGEMENTS
We thank Worcester Polytechnic Institute (WPI) for
providing computing resources and funding through-
out this project.
REFERENCES
Eigen, D., Puhrsch, C., and Fergus, R. (2014). Depth map
prediction from a single image using a multi-scale
deep network. In Advances in neural information pro-
cessing systems (NIPS).
Farabet, C., Couprie, C., Najman, L., and LeCun, Y.
(2012). Learning hierarchical features for scene label-
ing. IEEE transactions on pattern analysis and ma-
chine intelligence, 35(8):1915–1929.
Geiger, A., Lenz, P., Stiller, C., and Urtasun, R. (2013). Vi-
sion meets robotics: The kitti dataset. In The Interna-
tional Journal of Robotics Research, vol. 32, no. 11,
pp. 1231–1237.
Godard, C., Aodha, O. M., and Brostow, G. J. (2017). Un-
supervised monocular depth estimation with left-right
consistency. In 2017 IEEE Conference on Computer
Vision and Pattern Recognition (CVPR).
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid-
ual learning for image recognition. In 2016 IEEE Con-
ference on Computer Vision and Pattern Recognition
(CVPR).
Jiao, J., Cao, Y., Song, Y., and Lau, R. (2018). Look deeper
into depth: Monocular depth estimation with semantic
booster and attention-driven loss. In European Con-
ference on Computer Vision.
Lee, J. and Kim, C. (2019). Monocular depth estimation us-
ing relative depth maps. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recogni-
tion.
Lee, J. e. a. (2019). From big to small: Multi-scale local
planar guidance for monocular depth estimation. In
arXiv preprint arXiv:1907.10326.
Liang, M. and Hu, X. (2015). Recurrent convolutional neu-
ral network for object recognition. In Proceedings of
the IEEE conference on computer vision and pattern
recognition, pages 3367–3375.
Ronneberger, O., Fischer, P., and Brox, T. (2015). U-net:
Convolutional networks for biomedical image seg-
mentation. In International Conference on Medical
image computing and computer-assisted intervention.
Springer.
Silberman, N., Hoiem, D., Kohli, P., and Fergus, R. (2012).
Indoor segmentation and support inference from rgbd
images. In Computer Vision – ECCV 2012 Lecture
Notes in Computer Science, pp. 746–760.
Wang, P., Shen, X., Lin, Z., Cohen, S., Price, B., and Yuille,
A. (2015). Towards unified depth and semantic predic-
tion from a single image. In 2015 IEEE Conference on
Computer Vision and Pattern Recognition (CVPR).
APPENDIX
All network models discussed and developed in
this research are available at: https://github.com/
mech0ctopus/depth-estimation.
VEHITS 2020 - 6th International Conference on Vehicle Technology and Intelligent Transport Systems
414