Convolutional Patch Networks with Spatial Prior for Road Detection and Urban Scene Understanding

Clemens-Alexander Brust, Sven Sickert, Marcel Simon, Erik Rodner, Joachim Denzler

2015

Abstract

Classifying single image patches is important in many different applications, such as road detection or scene understanding. In this paper, we present convolutional patch networks, which are convolutional networks learned to distinguish different image patches and which can be used for pixel-wise labeling. We also show how to incorporate spatial information of the patch as an input to the network, which allows for learning spatial priors for certain categories jointly with an appearance model. In particular, we focus on road detection and urban scene understanding, two application areas where we are able to achieve state-of-the-art results on the KITTI as well as on the LabelMeFacade dataset. Furthermore, our paper offers a guideline for people working in the area and desperately wandering through all the painstaking details that render training CNs on image patches extremely difficult.

References

  1. Alvarez, J. M., Gevers, T., LeCun, Y., and Lopez, A. M. (2012). Road scene segmentation from a single image. In European Conference on Computer Vision (ECCV), pages 376-389.
  2. Alvarez, J. M. and Lopez, A. M. (2011). Road detection based on illuminant invariance. IEEE Transactions on Intelligent Transportation Systems, 12(1):184-193.
  3. Chellapilla, K., Puri, S., Simard, P., et al. (2006). High performance convolutional neural networks for document processing. In Tenth International Workshop on Frontiers in Handwriting Recognition.
  4. Couprie, C., Farabet, C., Najman, L., and LeCun, Y. (2014). Convolutional nets and watershed cuts for real-time semantic labeling of rgbd videos. Journal of Machine Learning Research (JMLR), 15:3489-3511.
  5. Felzenszwalb, P. F. and Huttenlocher, D. P. (2004). Efficient graph-based image segmentation. International Journal of Computer Vision, 59(2):1-26.
  6. Fritsch, J., Kühnl, T., and Geiger, A. (2013). A new performance measure and evaluation benchmark for road detection algorithms. In IEEE International Conference on Intelligent Transportation Systems, pages 1693-1700.
  7. Fröhlich, B., Rodner, E., and Denzler, J. (2010). A fast approach for pixelwise labeling of facade images. In Proceedings of the International Conference on Pattern Recognition (ICPR), volume 7, pages 3029-3032.
  8. Fröhlich, B., Rodner, E., and Denzler, J. (2012). Semantic segmentation with millions of features: Integrating multiple cues in a combined random forest approach. In Asian Conference on Computer Vision (ACCV), pages 218-231.
  9. Geiger, A., Lenz, P., and Urtasun, R. (2012). Are we ready for autonomous driving? the kitti vision benchmark suite. In Computer Vision and Pattern Recognition (CVPR), pages 3354-3361.
  10. Glorot, X. and Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. In International Conference on Artificial Intelligence and Statistics (AISTATS), pages 249-256.
  11. Gupta, S., Girshick, R., Arbeláez, P., and Malik, J. (2014). Learning rich features from RGB-D images for object detection and segmentation. In European Conference on Computer Vision (ECCV).
  12. Hariharan, B., Arbeláez, P., Girshick, R., and Malik, J. (2014). Simultaneous detection and segmentation. In European Conference on Computer Vision (ECCV).
  13. Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580.
  14. Kang, Y., Yamaguchi, K., Naito, T., and Ninomiya, Y. (2011). Multiband image segmentation and object recognition for understanding road scenes. IEEE Transactions on Intelligent Transportation Systems, 12(4):1423-1433.
  15. Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (NIPS), pages 1097-1105.
  16. Kühnl, T., Kummert, F., and Fritsch, J. (2011). Monocular road segmentation using slow feature analysis. In Proceedings of the IEEE Intelligent Vehicles Symposium, pages 800-806.
  17. Kühnl, T., Kummert, F., and Fritsch, J. (2012). Spatial ray features for real-time ego-lane extraction. In IEEE Conference on Intelligent Transportation Systems, pages 288-293.
  18. LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., and Jackel, L. D. (1989). Backpropagation applied to handwritten zip code recognition. Neural computation, 1(4):541-551.
  19. LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (2001). Gradient-based learning applied to document recognition. In Intelligent Signal Processing, pages 306-351. IEEE Press.
  20. Masci, J., Giusti, A., Ciresan, D. C., Fricout, G., and Schmidhuber, J. (2013). A fast learning algorithm for image segmentation with max-pooling convolutional networks. arXiv preprint arXiv:1302.1690.
  21. Nowozin, S. (2014). Optimal decisions from probabilistic models: the intersection-over-union case. In Computer Vision and Pattern Recognition (CVPR).
  22. Scharwaechter, T., Enzweiler, M., Franke, U., and Roth, S. (2013). Efficient multi-cue scene segmentation. In German Conference on Pattern Recognition (GCPR), Lecture Notes in Computer Science, pages 435-445.
  23. Torralba, A. (2003). Contextual priming for object detection. International Journal of Computer Vision (IJCV), 53(2):169-191.
  24. Zhang, C., Wang, L., and Yang, R. (2010). Semantic segmentation of urban scenes using dense depth maps. In Daniilidis, K., Maragos, P., and Paragios, N., editors, European Conference on Computer Vision (ECCV), pages 708-721.
Download


Paper Citation


in Harvard Style

Brust C., Sickert S., Simon M., Rodner E. and Denzler J. (2015). Convolutional Patch Networks with Spatial Prior for Road Detection and Urban Scene Understanding . In Proceedings of the 10th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2015) ISBN 978-989-758-090-1, pages 510-517. DOI: 10.5220/0005355105100517


in Bibtex Style

@conference{visapp15,
author={Clemens-Alexander Brust and Sven Sickert and Marcel Simon and Erik Rodner and Joachim Denzler},
title={Convolutional Patch Networks with Spatial Prior for Road Detection and Urban Scene Understanding},
booktitle={Proceedings of the 10th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2015)},
year={2015},
pages={510-517},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005355105100517},
isbn={978-989-758-090-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 10th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2015)
TI - Convolutional Patch Networks with Spatial Prior for Road Detection and Urban Scene Understanding
SN - 978-989-758-090-1
AU - Brust C.
AU - Sickert S.
AU - Simon M.
AU - Rodner E.
AU - Denzler J.
PY - 2015
SP - 510
EP - 517
DO - 10.5220/0005355105100517