ure 5 can be compared to Figure 2. This shows that
the white lines in the weighted BCE output are over-
represented, especially in the bottom region. From
this information it can be stated that weighted BCE
assigns such weight to the underrepresented class that
some overfitting will occur on it, resulting in higher
predictions of that class than is realistic.
6 CONCLUSION
After careful discussion of the experimental results,
several conclusions can be drawn. First, that, with
the inclusion of the application of IoU on capsule net-
works, together with previous work (van Beers et al.,
2019; Rahman and Wang, 2016), a more generalized
claim can be made that using the IoU loss function is a
reasonable step to consider in optimizing any segmen-
tational neural network. In addition, it can be claimed
that the use of capsule layers does not respond differ-
ently to the use of count-based, rather than logarith-
mic, loss functions than do other models from these
previous works. The results presented here do not
mean that it is objectively proven that IoU is the bet-
ter option throughout domains or that IoU should be-
come the new baseline. Rather, the availability of a
loss function based on IoU should be part of a seg-
mentational neural net developer’s toolkit. To further
enhance this toolkit, however, similar research can be
done into other loss functions, such that a parameter
sweep on this particular aspect of a network will al-
ways yield optimal results. Another example of this
is a loss function based on the Dice metric (Lguensat
et al., 2018; Yuan et al., 2017), which can be used in
instances where true positives are much more impor-
tant than avoiding false positives and false negatives.
Secondly, it can be seen that the adaptation of a
dataset does much to interfere with the results on said
dataset. The removal of 10 ct scans out of 880 from
the LUNA16 dataset in previous work (LaLonde and
Bagci, 2018) hampers any attempt to reproduce these
studies, but also seems to reduce the average error by
92.35%, which is a staggering amount. While prepro-
cessing, or manual labor, can be used to adapt real
world samples in similar ways so as to retain high
scores from a model trained on an adapted dataset,
the goal of machine learning should always be real
world applicability, regardless of the noise in the real
world data. As such, it would be beneficial in future
research to search for optimizations of the SegCaps
network on the full, noisier dataset.
Finally, the results show that significant differ-
ences remain between different domains, such as lung
segmentation and face detection. This can be at-
tributed to a number of factors, for example complex-
ity of the data, size of the dataset, balance in fore-
ground and background pixels, etc. It might be bene-
ficial to get a clearer view of the effect each of these
factors has on the effectiveness of the SegCaps model,
but also other models. This could be further explored
by performing comparisons of IoU loss with other
loss functions in a broader selection of domains in
an attempt to detect a pattern of which loss function
should predictably perform better on which task. If
this is in any way generalizable, the parameter sweeps
required to determine optimal loss functions can be
greatly reduced in complexity.
REFERENCES
Badrinarayanan, V., Kendall, A., and Cipolla, R. (2017).
SegNet: A deep convolutional encoder-decoder ar-
chitecture for image segmentation. IEEE Transac-
tions on Pattern Analysis and Machine Intelligence,
39(12):2481–2495.
Chollet, F. et al. (2015). Keras. https://keras.io.
Haralick, R. M. and Shapiro, L. G. (1985). Image segmen-
tation techniques. Computer Vision, Graphics, and
Image Processing, 29(1):100 – 132.
Jegou, S., Drozdzal, M., Vazquez, D., Romero, A., and Ben-
gio, Y. (2017). The one hundred layers tiramisu: Fully
convolutional densenets for semantic segmentation. In
2017 IEEE Conference on Computer Vision and Pat-
tern Recognition Workshops (CVPRW), pages 1175–
1183.
Kae, A., Sohn, K., Lee, H., and Learned-Miller, E. (2013).
Augmenting CRFs with Boltzmann machine shape
priors for image labeling. In the IEEE Conference on
Computer Vision and Pattern Recognition (CVPR).
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2017). Im-
agenet classification with deep convolutional neural
networks. Commun. ACM, 60(6):84–90.
LaLonde, R. and Bagci, U. (2018). Capsules for object seg-
mentation. ArXiv, abs/1804.04241.
Lguensat, R., Sun, M., Fablet, R., Tandeo, P., Mason, E.,
and Chen, G. (2018). Eddynet: A deep neural net-
work for pixel-wise classification of oceanic eddies. In
IGARSS 2018 - 2018 IEEE International Geoscience
and Remote Sensing Symposium, pages 1764–1767.
Liu, S. and Deng, W. (2015). Very deep convolutional
neural network based image classification using small
training sample size. In 2015 3rd IAPR Asian Confer-
ence on Pattern Recognition (ACPR), pages 730–734.
Nowozin, S. (2014). Optimal decisions from probabilistic
models: The intersection-over-union case. In 2014
IEEE Conference on Computer Vision and Pattern
Recognition, pages 548–555.
Rahman, M. A. and Wang, Y. (2016). Optimizing
intersection-over-union in deep neural networks for
image segmentation. In International Symposium on
Visual Computing.
Capsule Networks with Intersection over Union Loss for Binary Image Segmentation
77