works can be used for the task of semantic segmenta-
tion. Our approach performs classification of image
patches at each pixel position. We analyzed differ-
ent popular network architectures along with differ-
ent techniques to improve the training. Furthermore,
we demonstrated how spatial prior information like
pixel positions can be incorporated into the learning
process leading to a significant performance gain.
For evaluation, we used two different application
scenarios: road detection and urban scene understand-
ing. We were able to achieve very good results in the
road detection challenge of the popular KITTI Vision
Benchmark Suite. In this scenario we outperformed
several competitors, even those that use stereo images
or laser data.
For a second set of experiments, we used the
dataset LabelMeFacade of (Fr
ohlich et al., 2010)
which is a multi-class classification task and shows
very diverse urban scenes. We were again able to
achieve state-of-the-art results. Future work will fo-
cus on speeding up the prediction phase, since we cur-
rently need around 30s for each image to infer the la-
bel at each position.
