Shallow Networks for High-accuracy Road Object-detection

Khalid Ashraf, Bichen Wu, Forrest N. Iandola, Matthew W. Moskewicz, Kurt Keutzer


The ability to automatically detect other vehicles on the road is vital to the safety of partially-autonomous and fully-autonomous vehicles. Most of the high-accuracy techniques for this task are based on R-CNN or one of its faster variants. In the research community, much emphasis has been applied to using 3D vision or complex R-CNN variants to achieve higher accuracy. However, are there more straightforward modifications that could deliver higher accuracy? Yes. We show that increasing input image resolution (i.e. upsampling) offers up to 12 percentage-points higher accuracy compared to an off-the-shelf baseline. We also find situations where earlier/shallower layers of CNN provide higher accuracy than later/deeper layers. We further show that shallow models and upsampled images yield competitive accuracy. Our findings contrast with the current trend towards deeper and larger models to achieve high accuracy in domain specific detection tasks.


  1. Chen, C., Seff, A., Kornhauser, A., and Xiao, J. (2015a). Deepdriving: Learning affordance for direct perception in autonomous driving. In CVPR.
  2. Chen, X., Kundu, K., Zhang, Z., Ma, H., Fidler, S., and Urtasun, R. (2016). Monocular 3d object detection for autonomous driving. In CVPR.
  3. Chen, X., Kundu, K., Zhu, Y., Berneshawi, A., Ma, H., Fidler, S., and Urtasun, R. (2015b). 3d object proposals for accurate object class detection. NIPS.
  4. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and FeiFei, L. (2009). ImageNet: A large-scale hierarchical image database. In CVPR.
  5. Erhan, D., Szegedy, C., Toshev, A., and Anguelov, D. (2014). Scalable object detection using deep neural networks. In CVPR.
  6. Everingham, M., Gool, L. V., Williams, C. K. I., Winn, J., and Zisserman, A. (2010). The pascal visual object classes (voc) challenge. International Journal of Computer Vision (IJCV).
  7. Felzenszwalb, P., Girshick, R., McAllester, D., and Ramanan, D. (2010). Object Detection with Discriminatively Trained Part Based Models. PAMI.
  8. Geiger, A., Lenz, P., and Urtasun, R. (2012). Are we ready for autonomous driving? the kitti vision benchmark suite. In CVPR.
  9. Girshick, R. (2015). Fast r-cnn. In ICCV.
  10. Girshick, R. B., Donahue, J., Darrell, T., and Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR.
  11. He, K., Zhang, X., Ren, S., and Sun, J. (2014). Spatial pyramid pooling in deep convolutional networks for visual recognition. arXiv:1406.4729.
  12. He, K., Zhang, X., Ren, S., and Sun, J. (2015). Deep residual learning for image recognition. arXiv:1512.03385.
  13. Hillel, A. B., Lerner, R., Levi, D., and Raz, G. (2012). Recent progress in road and lane detection: a survey. Machine Vision and Applications.
  14. Huval, B., Wang, T., Tandon, S., Kiske, J., Song, W., Pazhayampallil, J., Andriluka, M., Rajpurkar, P., Migimatsu, T., Cheng-Yue, R., Mujica, F., Coates, A., and Ng, A. Y. (2015). An empirical evaluation of deep learning on highway driving. arXiv preprint arXiv:1504.01716v3.
  15. Iandola, F. N., Moskewicz, M. W., Ashraf, K., Han, S., Dally, W. J., and Keutzer, K. (2016). Squeezenet: Alexnet-level accuracy with 50x fewer parameters and <1mb model size. arXiv:1602.07360.
  16. Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). ImageNet Classification with Deep Convolutional Neural Networks. In NIPS.
  17. Lin, M., Chen, Q., and Yan, S. (2013). Network in network. arXiv:1312.4400.
  18. Rajpurkar, P., Migimatsu, T., Kiske, J., Cheng-Yue, R., Tandon, S., Wang, T., and Ng, A. (2015). Driverseat: Crowdstrapping learning tasks for autonomous driving. arXiv preprint arXiv:1512.01872v1.
  19. Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. In NIPS.
  20. Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., and LeCun, Y. (2014). Overfeat: Integrated recognition, localization and detection using convolutional networks. In ICLR.
  21. Simonyan, K. and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556.
  22. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2014). Going deeper with convolutions. arXiv:1409.4842.
  23. Szegedy, C., Reed, S., Erhan, D., , and Anguelov, D. (2015). Scalable, high-quality object detection. arXiv:1412.1441 (v1).
  24. Xiang, Y., Choi, W., Lin, Y., and Savarese, S. (2015). Datadriven 3d voxel patterns for object category recognition. In CVPR.
  25. Yang, F., Choi, W., and Lin, Y. (2016). Exploit all the layers: Fast and accurate cnn object detector with scale dependent pooling and cascaded rejection classifiers. In CVPR.
  26. Zhu, Y., Urtasun, R., Salakhutdinov, R., and Fidler, S. (2015). segdeepm: Exploiting segmentation and context in deep neural networks for object detection. In CVPR.

Paper Citation

in Harvard Style

Ashraf K., Wu B., Iandola F., Moskewicz M. and Keutzer K. (2017). Shallow Networks for High-accuracy Road Object-detection . In Proceedings of the 3rd International Conference on Vehicle Technology and Intelligent Transport Systems - Volume 1: VEHITS, ISBN 978-989-758-242-4, pages 33-40. DOI: 10.5220/0006214900330040

in Bibtex Style

author={Khalid Ashraf and Bichen Wu and Forrest N. Iandola and Matthew W. Moskewicz and Kurt Keutzer},
title={Shallow Networks for High-accuracy Road Object-detection},
booktitle={Proceedings of the 3rd International Conference on Vehicle Technology and Intelligent Transport Systems - Volume 1: VEHITS,},

in EndNote Style

JO - Proceedings of the 3rd International Conference on Vehicle Technology and Intelligent Transport Systems - Volume 1: VEHITS,
TI - Shallow Networks for High-accuracy Road Object-detection
SN - 978-989-758-242-4
AU - Ashraf K.
AU - Wu B.
AU - Iandola F.
AU - Moskewicz M.
AU - Keutzer K.
PY - 2017
SP - 33
EP - 40
DO - 10.5220/0006214900330040