Figure 1: Plot of mean ISD against Dice score with regres-
sion line: y = 1.66x − 0.82, where y is Dice score and x is
mean ISD. The F-test yielded a p-value of 1.4× 10
−7
.
4 DISCUSSION
The reason as to why ISD is highly correlated to
Dice performance for discriminative CNNs, and can
be used to detect weak segmentations is not entirely
clear. As a counter-example, consider a model that
always predicts P(lesion) = 0 for all but one voxel,
which instead has P(lesion) = 1. Then, the ISD
would always be 1, since the model produces exactly
the same segmentation with probability 1, regardless
of the input. However, it would be unexpected to
see a high Dice score for this model. The coupling
of CNN outputs and ISD for detecting uncertain seg-
mentations requires further investigation for a deeper
understanding of its performance.
One important remark is that the Dice metric can
be substituted by other metrics such as the sensitivity,
specificity, mean squared error, or precision and the
preceding analysis would also follow for these distri-
butions, thereby permitting hypothesis testing. More-
over, the computations considered in this work were
over the entire brain, but could also be calculated on
specific regions of interest (ROIs). Deciding on which
metrics to use, and applying them to more detailed
brain sub-regions could improve the decision-making
potential, and is a possible area of development.
Another point to remark is that the proposed
method can be used to rigorously test competing dis-
criminative models based on their respective inter-
sample mean Dice confidence intervals, and select
the most robust one on an individualized patient ba-
sis. Applying this unifying technique for all compet-
ing CNNs in a brain lesion challenge may exhibit the
best possible performance, without any consideration
to the ground truth. Segmentation challenges have re-
cently begun to incorporate uncertainty analysis, but
further work is required to apply these techniques on
various types of brain lesion structures.
REFERENCES
Bakas, S., Reyes, M., Jakab, A., Bauer, S., Rempfler,
M., Crimi, A., Shinohara, R. T., Berger, C., Ha,
S. M., Rozycki, M., et al. (2018). Identifying
the best machine learning algorithms for brain tu-
mor segmentation, progression assessment, and over-
all survival prediction in the brats challenge. arXiv
preprint:1811.02629.
Dice, L. R. (1945). Measures of the amount of ecologic
association between species. Ecology, 26(3):297–302.
Havaei, M., Davy, A., Warde-Farley, D., Biard, A.,
Courville, A., Bengio, Y., Pal, C., Jodoin, P.-M., and
Larochelle, H. (2017). Brain tumor segmentation
with deep neural networks. Medical image analysis,
35:18–31.
Kamnitsas, K., Ledig, C., Newcombe, V. F., Simpson,
J. P., Kane, A. D., Menon, D. K., Rueckert, D., and
Glocker, B. (2017). Efficient multi-scale 3D CNN
with fully connected CRF for accurate brain lesion
segmentation. Medical image analysis, 36:61–78.
L
ˆ
e, M., Unkelbach, J., Ayache, N., and Delingette, H.
(2016). Sampling image segmentations for uncertai-
nty quantification. Medical image analysis, 34:42–51.
Maier, O., , B. H., von der Gablentz, J., H
¨
ani, L., Heinrich,
M. P., Liebrand, M., Winzeck, S., Basit, A., Bentley,
P., Chen, L., et al. (2017). ISLES 2015 - a public eval-
uation benchmark for ischemic stroke lesion segmen-
tation from multispectral MRI. Medical image analy-
sis, 35:250–269.
Raina., K., Yahorau., U., and Schmah., T. (2020). Ex-
ploiting bilateral symmetry in brain lesion segmen-
tation with reflective registration. In Proceedings of
the 13th International Joint Conference on Biomed-
ical Engineering Systems and Technologies - Vol-
ume 2: BIOIMAGING,, pages 116–122. INSTICC,
SciTePress.
Ronneberger, O., Fischer, P., and Brox, T. (2015). U-net:
Convolutional networks for biomedical image seg-
mentation. In International Conference on Medical
image computing and computer-assisted intervention,
pages 234–241. Springer.
Roy, A. G., Conjeti, S., Navab, N., and Wachinger, C.
(2018). Inherent brain segmentation quality con-
trol from fully convnet monte carlo sampling. In
International Conf. on Medical Image Computing
and Computer-Assisted Intervention, pages 664–672.
Springer.
Sørensen, T. J. (1948). A method of establishing groups of
equal amplitude in plant sociology based on similarity
of species content and its application to analyses of
the vegetation on Danish commons. I kommission hos
E. Munksgaard.
Winzeck, S., Hakim, A., McKinley, R., Pinto, J. A., Alves,
V., Silva, C., Pisov, M., Krivov, E., Belyaev, M.,
BIOIMAGING 2021 - 8th International Conference on Bioimaging
172