clinical records seems to bring promising opportuni-
ties to create deep and rich datasets in order to inves-
tigate how well this transfer learning approach gen-
eralizes. To make sure that the machine learning al-
gorithms are functioning as intended, data cleaning
is necessary for the EyePACS dataset, discarding im-
ages that leave inaccurate or inconsistent grading by
the human graders or the machine learning methods,
as performed by (Gulshan et al., 2016).
In light of all of these, it is interesting to consider
the possibilities and consequences of the widespread
deployment of these algorithms in DR screening pro-
grams. The biggest challenge will be the poor under-
standing of how the algorithm reaches its final predic-
tion. Although accurate and precise, DL algorithms
are still considered a ”black box” due to their scale
and complexity, whereas the retina specialist inter-
prets the images based on recognizable features, more
in line with feature-based machine learning. There-
fore, a combination of machine learning algorithms
that have a strong agreement for the initial screen-
ing coupled with human grading to classify the posi-
tive predictions would likely yield a system with high
sensitivity and specificity, reducing the number of pa-
tients being referred unnecessarily.
In future work, we intend to deeply explore ex-
isting methodologies as well as develop new ones for
this purpose, so that disease diagnosis through DL can
be easily accepted by the medical society.
ACKNOWLEDGEMENTS
We would like to acknowledge the financial support
obtained from North Portugal Regional Operational
Programme (NORTE 2020), Portugal 2020 and
the European Regional Development Fund (ERDF)
from European Union through the project Symbi-
otic technology for societal efficiency gains: Deus
ex Machina (DEM), NORTE-01-0145-FEDER-
000026. The experimental data were kindly
provided by the Messidor program partners (see
http://www.adcis.net/en/DownloadThirdParty/
Messidor.html) and by EyePACS LLC. (see
http://www.eyepacs.com). It is also important
to acknowledge Telmo Barbosa and Silvia R
ˆ
ego,
from Fraunhofer Portugal AICOS for the develop-
ment of the annotation tool and management of the
dataset with multiple graders, and finally, the medical
doctors T
ˆ
ania Borges from Centro Hospitalar do
Porto and Gustavo Bacelar from CINTESIS, who
kindly annotated the images.
REFERENCES
Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean,
J., Devin, M., Ghemawat, S., Irving, G., Isard, M.,
et al. (2016). Tensorflow: a system for large-scale
machine learning. 16(1):265 – 283.
Andreotti, F., Carr, O., Pimentel, M. A., Mahdi, A., and
De Vos, M. (2017). Comparing feature-based classi-
fiers and convolutional neural networks to detect ar-
rhythmia from short segments of ecg. Computing,
44(1):1 – 4.
Arianti, A. and Andayani, G. A. (2016). Inter-
observer agreement in fundus photography for dia-
betic retinopathy screening in primary health care.
Ophthalmologica Indonesiana, 42(2):84 – 85.
Bourne, R. R., Stevens, G. A., White, R. A., Smith, J. L.,
Flaxman, S. R., Price, H., Jonas, J. B., Keeffe, J.,
Leasher, J., Naidoo, K., et al. (2013). Causes of vi-
sion loss worldwide, 1990–2010: a systematic analy-
sis. The lancet global health, 1(6):339 – 349.
Chollet, F. et al. (2015). Keras: Deep learning library for
theano and tensorflow. URL: https://keras. io/k, 7(8).
Costa, J., Sousa, I., and Soares, F. (2016). Smartphone-
based decision support system for elimination of
pathology-free cases in diabetic retinopathy screen-
ing.
Costa, P., Galdran, A., Smailagic, A., and Campilho, A.
(2018). A Weakly-Supervised Framework for Inter-
pretable Diabetic Retinopathy Detection on Retinal
Images. IEEE Access, 6:18747 – 18758.
Felgueiras, S., Costa, J., Soares, F., and Monteiro, M. P.
(2016). Cotton wool spots in eye fundus scope. Mas-
ter’s thesis, Faculdade de Engenharia da Universidade
Porto.
Feurer, M., Klein, A., Eggensperger, K., Springenberg, J.,
Blum, M., and Hutter, F. (2015). Efficient and robust
automated machine learning. In Cortes, C., Lawrence,
N. D., Lee, D. D., Sugiyama, M., and Garnett, R., edi-
tors, Advances in Neural Information Processing Sys-
tems 28, pages 2962 – 2970. Curran Associates, Inc.
Fleiss, J. L., Cohen, J., and Everitt, B. (1969). Large sample
standard errors of kappa and weighted kappa. Psycho-
logical bulletin, 72(5):323.
Gondal, W. M., K
¨
ohler, J. M., Grzeszick, R., Fink, G. A.,
and Hirsch, M. (2017). Weakly-supervised localiza-
tion of diabetic retinopathy lesions in retinal fundus
images. arXiv preprint arXiv:1706.09634.
Gulshan, V., Peng, L., Coram, M., Stumpe, M. C., Wu,
D., Narayanaswamy, A., Venugopalan, S., Widner, K.,
Madams, T., Cuadros, J., et al. (2016). Development
and validation of a deep learning algorithm for de-
tection of diabetic retinopathy in retinal fundus pho-
tographs. Jama, 316(22):2402 – 2410.
Huang, G., Liu, Z., van der Maaten, L., and Weinberger,
K. Q. (2016). Densely connected convolutional net-
works. arXiv preprint arXiv:1608.06993.
Ioffe, S. and Szegedy, C. (2015). Batch normalization: Ac-
celerating deep network training by reducing internal
covariate shift. arXiv preprint arXiv:1502.03167.
HEALTHINF 2019 - 12th International Conference on Health Informatics
490