What is also shown by this research is that in
an ever-continuing attempt to improve state-of-the-
art Deep Neural Nets, the area of loss functions has
not been fully explored. This is in agreement with
conclusions from other work (Janocha and Czarnecki,
2017), which state that while cross entropy has been
an unquestionable favourite, adopting one of the vari-
ous other losses can be equally, if not more, effective.
These conclusions, together with the conclusion from
this and other research on the effectiveness of IoU
loss show the same thing. More and more research
is being done towards architectures, creating deeper
or different convolutional networks, while a signifi-
cant improvement can already be made by choosing a
different loss function.
While this research shows that performance im-
proves significantly for the models that were used,
this can not be claimed for every model. As such
research on IoU loss with other models, such as
the well-established SegNet (Badrinarayanan et al.,
2017), may support our hypothesis that IoU loss per-
forms better in general for semantic segmentation
tasks.
In section 1 an explanation is given why the loss
function by (Rahman and Wang, 2016) is preferred
to the loss functions proposed by (Yuan et al., 2017;
Berman and Blaschko, 2017). In future research, it is
also interesting to compare these loss functions based
on IoU directly.
Finally, the claim has been made that the bene-
fit from training on IoU directly will only magnify
when a model is presented with sparse data. This has
not been evaluated in this research and can be done
by expanding the models presented here to perform
a segmentation task on multiple classes. This would
significantly reduce the amount of positive samples in
a dataset and thus be a way to explore the hypothe-
sis that an IoU loss function outperforms binary cross
entropy on sparser data. In previous work (Rahman
and Wang, 2016) sparse data is already used. How-
ever, here the sparsity of the data is taken as it is and
not isolated to determine its effect on the performance
of the IoU loss function. As such future work could
focus specifically on certain datasets, comparing per-
formance on sparse and dense data.
REFERENCES
Ahmed, F., Tarlow, D., and Batra, D. (2015). Optimiz-
ing expected intersection-over-union with candidate-
constrained CRFs. In 2015 IEEE International Con-
ference on Computer Vision (ICCV), pages 1850–
1858.
Badrinarayanan, V., Kendall, A., and Cipolla, R. (2017).
Segnet: A deep convolutional encoder-decoder ar-
chitecture for image segmentation. IEEE Transac-
tions on Pattern Analysis and Machine Intelligence,
39(12):2481–2495.
Berman, M. and Blaschko, M. B. (2017). Optimization
of the jaccard index for image segmentation with the
Lov
´
asz Hinge. CoRR, abs/1705.08790.
Chollet, F. et al. (2015). Keras. https://keras.io.
Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., and Fei-
Fei, L. (2009). ImageNet: A large-scale hierarchical
image database.
Everingham, M., Van Gool, L., Williams, C. K. I., Winn,
J., and Zisserman, A. (2011). The PASCAL Visual
Object Classes Challenge 2011 (VOC2011) Results.
He, K., Zhang, X., Ren, S., and Sun, J. (2015). Deep
residual learning for image recognition. CoRR,
abs/1512.03385.
Janocha, K. and Czarnecki, W. M. (2017). On loss func-
tions for deep neural networks in classification. CoRR,
abs/1702.05659.
Kae, A., Sohn, K., Lee, H., and Learned-Miller, E. (2013).
Augmenting CRFs with Boltzmann machine shape
priors for image labeling. In the IEEE Conference on
Computer Vision and Pattern Recognition (CVPR).
Le, V., Brandt, J., Lin, Z., Bourdev, L., and Huang, T. S.
(2012). Interactive facial feature localization. In Pro-
ceedings of the 12th European Conference on Com-
puter Vision - Volume Part III, ECCV’12, pages 679–
692, Berlin, Heidelberg. Springer-Verlag.
Noh, H., Hong, S., and Han, B. (2015). Learning de-
convolution network for semantic segmentation. In
Proceedings of the IEEE International Conference on
Computer Vision, pages 1520–1528.
Nowozin, S. (2014). Optimal decisions from probabilistic
models: The intersection-over-union case. In 2014
IEEE Conference on Computer Vision and Pattern
Recognition, pages 548–555.
Rahman, M. A. and Wang, Y. (2016). Optimizing
intersection-over-union in deep neural networks for
image segmentation. In International Symposium on
Visual Computing.
Shelhamer, E., Long, J., and Darrell, T. (2017). Fully con-
volutional networks for semantic segmentation. IEEE
Transactions on Pattern Analysis and Machine Intel-
ligence, 39(4):640–651.
Siam, M., Gamal, M., Abdel-Razek, M., Yogamani, S.,
and J
¨
agersand, M. (2018). RTSeg: Real-time se-
mantic segmentation comparative study. CoRR,
abs/1803.02758.
Simonyan, K. and Zisserman, A. (2014). Very deep con-
volutional networks for large-scale image recognition.
CoRR, abs/1409.1556.
Yuan, Y., Chao, M., and Lo, Y. C. (2017). Automatic
skin lesion segmentation using deep fully convolu-
tional networks with Jaccard distance. IEEE Trans-
actions on Medical Imaging, 36(9):1876–1886.
Zhao, H., Gallo, O., Frosio, I., and Kautz, J. (2017).
Loss functions for image restoration with neural net-
works. IEEE Transactions On Computational Imag-
ing, 3(1):47–57.
Deep Neural Networks with Intersection over Union Loss for Binary Image Segmentation
445