riched dataset, otherwise given the same exact train-
ing configurations. Also, we argue that aggressive
augmentation, given a consistent number of epochs,
achieves similar or better results to enrichment, in
spite of the balance achieved by the latter. Moreover,
a combination of several state-of-the-art techniques,
such as a modified Stochastic Weight-Averaging with
Cosine Annealing scheduler used for training the seg-
mentation module further improves the performance.
A powerful characteristic of our pipeline resides
in its replaceable modules: as the state-of-the-art ad-
vances, both the classification and the segmentation
modules can be replaced with improved versions,
thereby potentially leading to better results. We rec-
ommend that such a pipeline be used in all medical
segmentation problems; while we consider that the
deep learning modules should be indispensable (given
a complex dataset), the existence of the small regions
of interest component is debatable: depending on the
exact nature of the medical problem, the threshold for
elimination can vary to a great extent, if the module is
to be implemented.
As possible improvements, we consider that the
Tversky Loss (Salehi et al., 2017) could be used to
improve the final results, as it shows promising results
on both 2D and 3D image segmentation. In addition,
it could be relevant to investigate whether the usage of
class weights to penalize harder false negative errors
could also contribute to an increased recall. Last, but
not least, test-time augmentation is a technique that
has been widely used recently and could contribute to
increasing the performance.
REFERENCES
Chen, L., Zhu, Y., Papandreou, G., Schroff, F., and Adam,
H. (2018). Encoder-decoder with atrous separable
convolution for semantic image segmentation. CoRR,
abs/1802.02611.
Chollet, F. (2016). Xception: Deep learning with depthwise
separable convolutions. CoRR, abs/1610.02357.
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler,
M., Benenson, R., Franke, U., Roth, S., and Schiele,
B. (2016). The cityscapes dataset for semantic urban
scene understanding. In Proc. of the IEEE Conference
on Computer Vision and Pattern Recognition (CVPR).
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-
Fei, L. (2009). Imagenet: A large-scale hierarchical
image database. In 2009 IEEE conference on com-
puter vision and pattern recognition, pages 248–255.
Ieee.
Geiger, A., Lenz, P., Stiller, C., and Urtasun, R. (2013).
Vision meets robotics: The kitti dataset. International
Journal of Robotics Research (IJRR).
Hu, J., Shen, L., and Sun, G. (2017). Squeeze-and-
excitation networks. CoRR, abs/1709.01507.
Irvin, J., Rajpurkar, P., Ko, M., Yu, Y., Ciurea-Ilcus, S.,
Chute, C., Marklund, H., Haghgoo, B., Ball, R. L.,
Shpanskaya, K. S., Seekins, J., Mong, D. A., Halabi,
S. S., Sandberg, J. K., Jones, R., Larson, D. B., Lan-
glotz, C. P., Patel, B. N., Lungren, M. P., and Ng, A. Y.
(2019). Chexpert: A large chest radiograph dataset
with uncertainty labels and expert comparison. CoRR,
abs/1901.07031.
Izmailov, P., Podoprikhin, D., Garipov, T., Vetrov, D. P.,
and Wilson, A. G. (2018). Averaging weights leads
to wider optima and better generalization. CoRR,
abs/1803.05407.
Kingma, D. P. and Ba, J. (2014). Adam: A method for
stochastic optimization. CoRR, abs/1412.6980.
Lin, T., Doll
´
ar, P., Girshick, R. B., He, K., Hariharan, B.,
and Belongie, S. J. (2016). Feature pyramid networks
for object detection. CoRR, abs/1612.03144.
Lin, T., Goyal, P., Girshick, R. B., He, K., and Doll
´
ar, P.
(2017). Focal loss for dense object detection. CoRR,
abs/1708.02002.
Loshchilov, I. and Hutter, F. (2016). SGDR: stochastic gra-
dient descent with restarts. CoRR, abs/1608.03983.
Neuhold, G., Ollmann, T., Bulo, S. R., and Kontschieder,
P. (2018). The mapillary vistas dataset for semantic
understanding of street scenes. In 2017 IEEE Interna-
tional Conference on Computer Vision (ICCV), vol-
ume 00, pages 5000–5009.
Ronneberger, O., P.Fischer, and Brox, T. (2015). U-
net: Convolutional networks for biomedical image
segmentation. In Medical Image Computing and
Computer-Assisted Intervention (MICCAI), volume
9351 of LNCS, pages 234–241. Springer. (available
on arXiv:1505.04597 [cs.CV]).
Salehi, S. S. M., Erdogmus, D., and Gholipour, A. (2017).
Tversky loss function for image segmentation us-
ing 3d fully convolutional deep networks. CoRR,
abs/1706.05721.
Society for Imaging Informatics in Medicine (SIIM),
American College of Radiology (ACR), S. o.
T. R. S. M. (2019). Siim-acr pneumotho-
rax segmentation. https://www.kaggle.com/c/
siim-acr-pneumothorax-segmentation.
Szegedy, C., Ioffe, S., and Vanhoucke, V. (2016). Inception-
v4, inception-resnet and the impact of residual con-
nections on learning. CoRR, abs/1602.07261.
Tan, M. and Le, Q. V. (2019). Efficientnet: Rethink-
ing model scaling for convolutional neural networks.
CoRR, abs/1905.11946.
Timbus, C., Miclea, V.-C., and Lemnaru, C. (2018). Se-
mantic segmentation-based traffic sign detection and
recognition using deep learning techniques. pages
325–331.
Wu, Y. and He, K. (2018). Group normalization. CoRR,
abs/1803.08494.
Xie, S., Girshick, R. B., Doll
´
ar, P., Tu, Z., and He, K.
(2016). Aggregated residual transformations for deep
neural networks. CoRR, abs/1611.05431.
Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J. (2016). Pyra-
mid scene parsing network. CoRR, abs/1612.01105.
Zoph, B., Vasudevan, V., Shlens, J., and Le, Q. V. (2017).
Learning transferable architectures for scalable image
recognition. CoRR, abs/1707.07012.
VISAPP 2021 - 16th International Conference on Computer Vision Theory and Applications
272