Glaucoma Detection Using Transfer Learning with the Faster R-CNN

Model and a ResNet-50-FPN Backbone

Noirane Getirana de S

, Daniel Oliveira Dantas

and Gilton Jos

e Ferreira da Silva

Departamento de Computac¸

ao, Universidade Federal de Sergipe, S

ao Crist

ao, SE, Brazil

Keywords:

Ophtalmology, Diagnosis, Machine Learning, Deep Learning, Region-Based.

Abstract:

Early detection of glaucoma has the potential to prevent vision loss. The application of artiﬁcial intelligence

can enhance the cost-effectiveness of glaucoma detection by reducing the need for manual intervention. Glau-

coma is the second leading cause of blindness and, due to its asymptomatic nature until advanced stages,

diagnosis is often delayed. Having a general understanding of the disease’s pathophysiology, diagnosis, and

treatment can assist primary care physicians in referring high-risk patients for comprehensive ophthalmo-

logic examinations and actively participating in the care of individuals affected by this condition. This article

describes a method for glaucoma detection with the Faster R-CNN model and a ResNet-50-FPN backbone.

Our experiments demonstrated greater accuracy compared to models such as, AlexNet, VGG-11, VGG-16,

VGG-19, GoogleNet-V1, ResNet-18, ResNet-50, ResNet-101 and ResNet-152.

1 INTRODUCTION

The eye is an important organ of the human body. The

blindness is considered one of the most impactful dis-

abilities on individuals lives (Aljazaeri et al., 2020).

The signiﬁcant prevalence of foreseeable blindness

cases poses a global health challenge. Cataracts, un-

corrected refractive errors, and glaucoma are identi-

ﬁed as the primary causes of blindness (Furtado et al.,

2012). Early detection of glaucoma is crucial for pre-

venting visual impairment, and detection for this dis-

ease can have a signiﬁcant impact on the general pop-

ulation.

The task of glaucoma detection has gained consid-

erable attention in the ﬁeld of computerized medical

image analysis in recent years (Shibata et al., 2018).

Glaucoma is a chronic eye condition that causes per-

manent vision loss (Fu et al., 2018). Given that there

is no cure for the disease, it becomes crucial to timely

identify and diagnose it (Chai et al., 2018). Glaucoma

affects more than 70 million people worldwide with

approximately 10% being bilaterally blind (Quigley

and Broman, 2006).

The estimates indicate that the global number

of individuals affected by glaucoma will increase to

https://orcid.org/0009-0006-6767-292X

https://orcid.org/0000-0002-0142-891X

https://orcid.org/0000-0002-2281-9426

111.8 million by the year 2040, with a dispropor-

tionately higher incidence in regions of Asia and

Africa (Tham et al., 2014). Technological advance-

ments have been made in the application of artiﬁ-

cial intelligence techniques to assist in the detection

and diagnosis of glaucoma. Machine learning algo-

rithms and convolutional neural networks have been

employed to analyze eye fundus images, identifying

characteristic glaucoma-related changes and aiding in

patient screening.

Glaucoma is primarily a neuropathy, not a

retinopathy, and it affects the retina by damaging gan-

glion cells and their axons (Abr

amoff et al., 2010). A

characteristic feature of glaucoma is the development

of a cupped area in the optic disc, which is the visible

part of the optic nerve head in its three-dimensional

structure. The ratio between the cupped area of the

optic disc and the surface area of the neuroretinal rim

serves as an important indicator to assess the presence

and progression of glaucoma.

In this study, a glaucoma detection system was de-

veloped for ophthalmic images using the Faster R-

CNN model with a ResNet-50-FPN (feature pyra-

mid network) backbone, trained on the Common Ob-

jects in Context (COCO) dataset. Our experiments

produced superior accuracy when compared to pre-

trained networks such as AlexNet, VGG-11, VGG-

16, VGG-19, GoogleNet-V1, ResNet-18, ResNet-50,

ResNet-101, and ResNet-152, as referenced in prior

Getirana de Sá, N., Dantas, D. and Ferreira da Silva, G.

Glaucoma Detection Using Transfer Learning with the Faster R-CNN Model and a ResNet-50-FPN Backbone.

DOI: 10.5220/0012706500003690

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 26th International Conference on Enterprise Information Systems (ICEIS 2024) - Volume 1, pages 837-844

ISBN: 978-989-758-692-7; ISSN: 2184-4992

837

research (Sallam et al., 2021). The approach em-

ployed in our study demonstrates signiﬁcant poten-

tial for early glaucoma detection, contributing to im-

proved clinical outcomes and preservation of vision.

Our study aims to answer two research questions:

1. Is it better to have more images with less pixels or

fewer images with more pixels?

2. Is it better to apply histogram equalization to high

quality images or leave them unchanged?

To answer the research questions, three experi-

ments were carried out. All experiments done us-

ing images from the Artiﬁcial Intelligence for Ro-

bust Glaucoma Detection (AIROGS) (de Vente et al.,

2023) dataset, where the images of the fundus of the

eye have varying dimensions and are provided in high

quality, no information about the location of the optic

disc is provided, so it was necessary to segment the

optic disc and then classify the glaucoma.

To answer the ﬁrst research question, we used Ex-

periment 1 and Experiment 3. In Experiment 1, using

Detectron 2 (Wu et al., 2019), a state-of-the-art library

from Facebook AI Research that provides state-of-

the-art segmentation algorithms, 4.177 images were

generated with segmented optic discs, with dimen-

sions of 390×390 pixels. In Experiment 3, 1.000

images were used without any change in dimensions,

containing considerably more pixels than the images

in Experiment 1, the optic discs of the 1.000 images

were manually labeled using the LabelImg applica-

tion. We used histogram equalization in both experi-

ments 1 and 3.

To answer the second research question, we used

Experiment 2 and Experiment 3. The same set of

1.000 images used in Experiment 3 was also used

in Experiment 2. These experiments are similar, the

only difference is that histogram equalization is ap-

plied in Experiment 3, while the images from Exper-

iment 2 remain identical to those provided by dataset

AIROGS (de Vente et al., 2023). In all experiments,

we used the Faster R-CNN model with a ResNet-50-

FPN backbone structure for glaucoma detection.

2 FASTER R-CNN

Fast R-CNN utilizes a Convolutional Neural Network

(CNN) to extract features from the image and then

feeds these features into a classiﬁer to perform object

detection. Compared to image classiﬁcation, object

detection is a more challenging task that requires so-

phisticated methods, presenting two main challenges.

First, it is necessary to process multiple candidate ob-

ject locations, often referred to as to as proposals.

Second, these candidates provide only an approxi-

mate location that needs to be reﬁned for precise lo-

calization (Girshick, 2015).

Faster R-CNN combines the Fast R-CNN

model with an additional region proposal network

(RPN) (Oliveira et al., 2021; Nazir et al., 2020).

The RPN is a type of CNN that has the ability to

predict object boundaries and assign conﬁdence

scores to each position within an image. By utilizing

shared convolutional computations, it achieves a

considerably faster detection system compared to

the methods of selective search (SS) or edge boxes

(EB). The RPN’s approach of generating a reduced

number of proposals also leads to a decrease in the

computational workload required for region-wise

fully connected processing (Ren et al., 2015).

The choice to use Faster R-CNN is due to its

ability to achieve high accuracy in object detection.

Huang (Huang et al., 2017) analyzes the performance

of various object detection networks, including sin-

gle shot detector (SSD), Faster R-CNN, and R-FCN.

According to the results presented in his study, Faster

R-CNN emerged as the most accurate model, albeit

slower. It requires at least 100 ms per image, but of-

fers greater precision in object detection compared to

the R-FCN and SSD model. Despite its slower speed,

Faster R-CNN still ensures an apropriate speed for

implementation in this project.

Faster R-CNN and RPN have been used by several

winning teams in different object detection categories

in competitions such as, ILSVRC and COCO 2015,

suggesting that the method is not only an economical

solution for practical use, but also an effective way to

improve the accuracy in detecting objects. (Ren et al.,

2015).

3 METHODOLOGY

The methodology of this study is divided into seven

subsections. The ﬁrst subsection describes the

AIROGS, a dataset with 113,893 color images of the

fundus of the eye, with different dimensions. The

second subsection discusses the use of Detectron 2,

an open-source platform that was utilized for optic

disc detection. In the third subsection, the Faster R-

CNN model and the ResNet-50-FPN backbone are

presented, which were used for glaucoma detection.

The fourth subsection covers Experiment 1, using re-

sized images that underwent histogram equalization.

The ﬁfth subsection addresses Experiment 2, where

original images were used. The sixth subsection de-

scribes Experiment 3, where histogram equalization

was applied to the images with original dimensions.

ICEIS 2024 - 26th International Conference on Enterprise Information Systems

838

Finally, in the seventh subsection, the details of the

model training are presented. The code was devel-

oped in Python

3.1 AIROGS Dataset

The AIROGS Dataset (de Vente et al., 2023) consists

of a collection of color images of the fundus of the

eye from 60,357 individuals, totaling 113,893 images,

with various dimensions. The dataset includes sub-

jects from approximately 500 different locations, rep-

resenting a diverse ethnic population. A table is pro-

vided, containing two columns: challenge id, which

includes the image names, and class, which indicates

the classiﬁcation as either referable glaucoma (RG) or

non-referable glaucoma (NRG).

The label quality was ensured by a carefully se-

lected group of evaluators. This group included oph-

thalmologists, glaucoma specialists, ophthalmology

residents, and optometrists, totaling 32 professionals.

Each image was evaluated twice by different eval-

uators. If there was agreement between the evalu-

ators, the agreed-upon label became the ﬁnal label.

In case of disagreement, the image was assessed by

an experienced glaucoma specialist, and their deter-

mined label served as the ﬁnal label for the image.

Throughout the evaluation process, the performance

of all evaluators was monitored. Those who exhibited

sensitivity below 80% and/or speciﬁcity below 95%

were removed from the group of evaluators. Thus,

the AIROGS dataset has been meticulously labeled

and provides a set of high-quality labels for analysis

and research in the ﬁeld of glaucoma detection.

As the training labels did not contain information

about the location of the optic disc, it was necessary

to perform manual segmentation. In Experiment 1,

4.177 images were resized to 512×512 pixels. Then,

500 of these images underwent manual labeling of the

optic disc using the LabelImg application. Each la-

beled image had an XML annotation indicating the

location of the optic disc. The 500 images, along with

their XML annotations, were used as input for Detec-

tron 2 to generate 4.177 cropped optic discs with di-

mensions of 390×390 pixels. For experiments 2 and

3, the image dataset retained its original dimensions.

A total of 1.000 images were manually labeled using

the LabelImg application for these two experiments.

In Experiment 1, 4.177 images of the optic disc

were used, divided as follows: the test set was de-

ﬁned as 10% of the total training set size (417 im-

ages), while the validation set corresponded to 20%

of the total training set size (835 images), with the

remaining images (2925) allocated for training.

https://github.com/ddantas-ufs/2024 glaucoma

In experiments 2 and 3, 1.000 images of the eye

fundus with manually marked optic discs were used.

The test set size was deﬁned as 10% of the total

size (100 images), while the validation set size cor-

responded to 20% of the total size (200 images), with

the remaining images (700) allocated for training.

3.2 Detectron 2

Developed by Facebook AI Research (FAIR), is an

open-source platform that implements object detec-

tion and segmentation algorithms (Pham et al., 2020).

Detectron is a state-of-the-art library developed by

Facebook AI Research that provides cutting-edge

object detection and segmentation algorithms. It

serves as the successor to Detectron and maskrcnn-

benchmark

. Detectron2 supports various computer

vision research projects and production applications

at Facebook (Wu et al., 2019). Optic disc detection is

a critical step in the development of automated diag-

nosis systems for various serious ophthalmic patholo-

gies (Aquino et al., 2010), Detectron 2 was used for

this purpose in Experiment 1.

3.3 Faster R-CNN Model and a

ResNet-50-FPN Backbone

To use the Faster R-CNN model with ResNet-50 and

feature pyramid network (FPN) a list of tensors is re-

quired as input. Each tensor corresponds to an im-

age. These ﬂexible experiments allow the model to

handle images of different sizes and detect objects

at various resolutions. The behavior of the model

varies depending on whether it is in training or evalua-

tion mode (TorchVision maintainers and contributors,

2016). During training, the model uses input tensors

and a target (a list of dictionaries) that contains spe-

ciﬁc information. This target information typically

includes basic annotations for input images, such as

bounding boxes of objects present in the images. The

bounding boxes are a matrix of ﬂoat tensors with di-

mensions (N, 4), where N is the number of bound-

ing boxes. Each bounding box is deﬁned by coor-

dinates (x

, y

, x

, y

), which represent the reference

points deﬁning the box. The values (x

, y

, x

, y

)

specify the coordinates of the corners of the rectangu-

lar bounding box. Let W be the image width and H be

the image height. The conditions (0 ≤ x

< x

< W )

and (0 ≤ y

< y

< H) ensure that the bounding box

is within the image boundaries. In addition to the

bounding boxes, the target also includes the class la-

https://github.com/facebookresearch/maskrcnn-

benchmark

Glaucoma Detection Using Transfer Learning with the Faster R-CNN Model and a ResNet-50-FPN Backbone

839

Figure 1: Experiment 1: Images resized and with histogram

equalization.

bel for each box. During inference, the model only

requires the input tensors and returns post-processed

predictions, one for each input image. The ﬁelds of

the dictionary are as follows: the predicted bounding

boxes, the predicted labels for each detection, and the

scores for each detection.

3.4 Experiment 1: Images Resized and

with Histogram Equalization

Figure 1 illustrates the workﬂow for glaucoma detec-

tion in Experiment 1. Initially, a total of 4.177 fundus

images were resized to dimensions of 512×512 pix-

els. Out of these images, 500 were manually anno-

tated using the LabelImg application. Each annotated

image had an XML annotation specifying the location

of the optic disc.

The manually annotated images, along with their

respective annotations, were used to train Detectron 2,

Figure 2: Histogram equalization example: (a) Fundus of

the eye (b) Optic disc without histogram equalization (c)

Optic disc with histogram equalization.

an automated segmentation model. This enabled the

automatic segmentation of the optic discs, resulting

in 4.177 cropped discs with dimensions of 390×390

pixels.

The 4.177 segmented disc images, along with

a processed label table from the AIROGS dataset,

were used as input for classiﬁcation. This classiﬁca-

tion step employed the Faster R-CNN model with a

ResNet-50-FPN backbone.

To enhance the image quality, the histogram

equalization technique, as depicted in Figure 2, was

applied to all the segmented optic discs.

3.5 Experiment 2: Images with Original

Sizes

Figure 3 illustrates the workﬂow for glaucoma detec-

tion in Experiment 2. Initially, 1.000 images from the

AIROGS dataset were manually labeled using the La-

belImg application. Each labeled image was accom-

panied by an XML annotation indicating the location

of the optic disc.

The manually labeled images, along with their re-

spective XML labels, were used to train the Faster R-

CNN model with a ResNet-50 backbone responsible

for glaucoma classiﬁcation.

3.6 Experiment 3: Images with Original

Sizes and with Histogram

Equalization

In Experiment 3, the same images and annotations

from Experiment 2 were used. The Faster R-CNN

model with a ResNet-50-FPN backbone was utilized

for glaucoma detection. The only aspect that dis-

tinguishes Experiment 2 from Experiment 3 is that

in Experiment 3 the images went through histogram

equalization before training.

3.7 Training

Transfer learning is used to improve a model by trans-

ferring information from a related domain (Yan et al.,

ICEIS 2024 - 26th International Conference on Enterprise Information Systems

840

Figure 3: Experiment 2: Images with original sizes.

2017). The Faster R-CNN model with ResNet-50

architecture and FPN was pre-trained on the COCO

(Common Objects in Context) dataset (Lin et al.,

2014). In this study, the model weights are updated

during training using the SGD optimizer with a learn-

ing rate of 0.005, a momentum of 0.9, and a weight

decay rate of 0.0005. In terms of the model’s classiﬁ-

cation layer, the Faster R-CNN model with a ResNet-

50-FPN backbone architecture was designed for 91

classes. However, in the context of glaucoma detec-

tion, the classiﬁcation layer is replaced by a new layer

adapted to the speciﬁed number of classes, which in

this case are 3 classes: 0 for background, 1 for RG,

and 2 for NRG. The training duration in all experi-

ments was 12 epochs.

4 RESULTS AND DISCUSSION

Table 1 displays the values of the metrics in glaucoma

detection. Experiment 2 achieved the highest accu-

racy of 0.8900. In terms of precision, Experiment 1

had the highest value of 0.8967, while Experiment 3

had the lowest precision of 0.8043. Regarding recall,

Experiment 3 obtained the highest value of 0.9250,

while Experiment 1 had the lowest recall of 0.8643. In

terms of the F1-score, Experiment 2 had the best per-

formance with a score of 0.8817, followed by Exper-

iment 1 with 0.8802, and Experiment 3 with 0.8605.

These metrics can be compared with the metrics re-

ported by Sallam (Sallam et al., 2021). Figure 4 and

Figure 5 display examples of glaucoma detection pre-

dictions from experiments 1 and 2, respectively.

Accuracy (ACC) is an evaluation metric that mea-

sures the rate of correct predictions made by the

model in relation to the total number of evaluated ex-

amples. In other words, accuracy indicates the per-

centage of correct predictions out of the total predic-

tions made. True positive (TP) represents the cases in

which the model correctly predicted the positive class.

True negative (TN) represents the cases in which the

model correctly predicted the negative class. False

positive (FP) represents the cases in which the model

incorrectly predicted the positive class. False negative

(FN) represents the cases in which the model incor-

rectly predicted a negative class.

ACC =

TP+TN

TP+TN+FP+FN

Precision, also known as positive predictive value,

provides an estimate of how many of the examples

classiﬁed as positive by the model are actually posi-

tive.

Precision =

TP+FP

Recall, also known as true positive rate (TPR) or

sensitivity, is an evaluation metric that measures the

proportion of true positive predictions in relation to

the total number of actual positive examples.

Recall =

TP+FN

F1-score is the harmonic mean of precision and re-

call, giving equal weight to both metrics. It provides

a balanced evaluation of a model’s performance, par-

ticularly in scenarios where precision and recall are

both important. By using the F1-score, we can assess

the trade-off between precision and recall. A higher

F1-score indicates a better balance between the two

metrics, while a lower score suggests an imbalance in

favor of either precision or recall.

F1-score = 2

Precision×Recall

Precision+Recall

In this study, three separate experiments, experi-

ment 1, experiment 2, and experiment 3, were car-

ried out to evaluate glaucoma detection using a Faster

R-CNN model and a ResNet-50-FPN backbone. The

accuracy values obtained were 0.8753, 0.8900, and

Glaucoma Detection Using Transfer Learning with the Faster R-CNN Model and a ResNet-50-FPN Backbone

841

Table 1: Comparison of metrics.

Experiment Accuracy Precision Recall F1-score

Experiment 1 0.8753 0.8967 0.8643 0.8802

Experiment 2 0.8900 0.8913 0.8723 0.8817

Experiment 3 0.8800 0.8043 0.9250 0.8605

Models presented in Sallam (Sallam et al., 2021)

AlexNet 0.814 0.818 0.815 -

VGG-11 0.800 0.800 0.800 -

VGG-16 0.822 0.820 0.820 -

VGG-19 0.809 0.809 0.809 -

GoogleNet-V1 0.829 0.829 0.830 -

ResNet-18 0.867 0.867 0.867 -

ResNet-50 0.856 0.856 0.857 -

ResNet-101 0.862 0.862 0.862 -

ResNet-152 0.869 0.869 0.869 -

Figure 4: Prediction of glaucoma detection Experiment 1.

0.8800 for experiment 1, experiment 2, and experi-

ment 3, respectively, indicating a promising perfor-

mance in detecting glaucoma in the dataset used in

this study.

Our study aimed to answer two research ques-

tions. The ﬁrst question was whether it is better to

have more images with fewer pixels or fewer images

with more pixels. Experiment 3, which used fewer

images with more pixels, obtained a greater accuracy

of 0.8800 compared to experiment 1, which obtained

an accuracy of 0.8753.

The second research question was whether it is

better to apply histogram equalization to high qual-

ity images or leave them unchanged. Experiment 2,

which used high-quality unaltered images, had a bet-

ter accuracy of 0.8900 compared to experiment 3,

which used histogram equalization on high-quality

images, and achieved an accuracy of 0.8800.

These ﬁndings suggest that in our study, hav-

ing fewer images with more pixels and leaving high-

Figure 5: Prediction of glaucoma detection Experiment 2.

quality images unaltered led to improved performance

in terms of accuracy.

As shown in Table 1, the experiments conducted

in this study exhibited a superior performance in glau-

coma detection compared to the results reported in

the reference study (Sallam et al., 2021). The refer-

ence study employed models such as AlexNet, VGG-

11, VGG-16, VGG-19, GoogleNet-V1, ResNet-18,

ResNet-50, ResNet-101, and ResNet-152, achieving

an accuracy ranging from 0.814 to 0.869.

5 CONCLUSIONS

This study provided compelling evidence of the

promising performance of the Faster R-CNN model

and a ResNet-50-FPN backbone approach in glau-

coma detection. The experimental results demon-

strated higher accuracy compared to reference mod-

els such as AlexNet, VGG-11, VGG-16, VGG-

19, GoogleNet-V1, ResNet-18, ResNet-50, ResNet-

101, and ResNet-152. This disparity suggests that

ICEIS 2024 - 26th International Conference on Enterprise Information Systems

842

the combination of the Faster R-CNN model and a

ResNet-50-FPN can serve as a robust and effective

choice for early glaucoma detection.

In our study, we investigated how the number of

images and histogram equalization affect the accu-

racy of glaucoma detection. The ﬁrst question we

addressed was, is it better to have more images with

fewer pixels or fewer images with more pixels? We

found that using fewer images with more pixels re-

sulted in higher accuracy in glaucoma detection com-

pared to using more images with fewer pixels. This

means that having a higher resolution in the images,

even with fewer total images, led to better perfor-

mance in glaucoma detection.

The second question we investigated was whether

histogram equalization affects glaucoma detection in

high-quality images. We found that leaving the high-

quality images unaltered resulted in better accuracy

than applying histogram equalization to those im-

ages. This indicates that histogram equalization did

not bring considerable beneﬁts to glaucoma detection

in high-quality images in our study.

For future studies, we recommend conducting a

Monte Carlo analysis and applying a statistical test to

determine if there is a signiﬁcant difference between

the results of the different experiments. By perform-

ing Monte Carlo simulations and appropriate statis-

tical tests, it will be possible to obtain more robust

conclusions about which experiment yields superior

results. This statistical analysis will enhance the reli-

ability and validity of the ﬁndings, contributing to the

advancement of knowledge in glaucoma detection.

The accuracy and recall achieved in the exper-

iments underscore the ability of the Faster R-CNN

model and ResNet-50-FPN backbone approach to

make accurate predictions and identify positive cases

of glaucoma. These ﬁndings highlight the potential

of the proposed approach to support healthcare pro-

fessionals in the early diagnosis.

REFERENCES

Abr

amoff, M. D., Garvin, M. K., and Sonka, M. (2010).

Retinal imaging and image analysis. IEEE Reviews in

Biomedical Engineering, 3:169–208.

Aljazaeri, M., Bazi, Y., AlMubarak, H., and Alajlan, N.

(2020). Faster R-CNN and DenseNet regression for

glaucoma detection in retinal fundus images. In 2020

2nd International Conference on Computer and Infor-

mation Sciences (ICCIS), pages 1–4. IEEE.

Aquino, A., Geg

undez-Arias, M. E., and Mar

ın, D. (2010).

Detecting the optic disc boundary in digital fundus im-

ages using morphological, edge detection, and feature

extraction techniques. IEEE Transactions on Medical

Imaging, 29(11):1860–1869.

Chai, Y., Liu, H., and Xu, J. (2018). Glaucoma diagnosis

based on both hidden features and domain knowledge

through deep learning models. Knowledge-Based Sys-

tems, 161:147–156.

de Vente, C., Vermeer, K. A., Jaccard, N., Wang, H., Sun,

H., Khader, F., Truhn, D., Aimyshev, T., Zhanibekuly,

Y., Le, T.-D., Galdran, A., Gonz

alez Ballester, M. n.,

Carneiro, G., G, D. R., S, H. P., Puthussery, D., Liu,

H., Yang, Z., Kondo, S., Kasai, S., Wang, E., Durva-

sula, A., Heras, J., Zapata, M. n., Ara

ujo, T., Aresta,

G., Bogunovi

c, H., Arikan, M., Lee, Y. C., Cho, H. B.,

Choi, Y. H., Qayyum, A., Razzak, I., van Ginneken,

B., Lemij, H. G., and S

anchez, C. I. (2023). AIROGS:

Artiﬁcial Intelligence for RObust Glaucoma Screen-

ing Challenge. arXiv preprint arXiv:2302.01738.

Fu, H., Cheng, J., Xu, Y., Zhang, C., Wong, D. W. K., Liu,

J., and Cao, X. (2018). Disc-aware ensemble network

for glaucoma screening from fundus image. IEEE

Transactions on Medical Imaging, 37(11):2493–2501.

Furtado, J. M., Lansingh, V. C., Carter, M. J., Milanese,

M. F., Pe

na, B. N., Ghersi, H. A., Bote, P. L., Nano,

M. E., and Silva, J. C. (2012). Causes of blindness

and visual impairment in Latin America. Survey of

Ophthalmology, 57(2):149–177.

Girshick, R. (2015). Fast R-CNN. In 2015 IEEE Interna-

tional Conference on Computer Vision, pages 1440–

1448.

Huang, J., Rathod, V., Sun, C., Zhu, M., Korattikara, A.,

Fathi, A., Fischer, I., Wojna, Z., Song, Y., Guadar-

rama, S., et al. (2017). Speed/accuracy trade-offs for

modern convolutional object detectors. In 2017 IEEE

Conference on Computer Vision and Pattern Recogni-

tion, pages 7310–7311.

Lin, T., Maire, M., Belongie, S. J., Bourdev, L. D., Girshick,

R. B., Hays, J., Perona, P., Ramanan, D., Doll’a r, P.,

and Zitnick, C. L. (2014). Microsoft COCO: common

objects in context. CoRR, abs/1405.0312.

Nazir, T., Irtaza, A., Rashid, J., Nawaz, M., and Mehmood,

T. (2020). Diabetic retinopathy lesions detection using

faster-rcnn from retinal images. In 2020 First Inter-

national Conference of Smart Systems and Emerging

Technologies (SMARTTECH), pages 38–42. IEEE.

Oliveira, A. L. C., Carvalho, A. B., and Dantas, D. O.

(2021). Faster R-CNN approach for diabetic foot ul-

cer detection. In 16th International Joint Conference

on Computer Vision, Imaging and Computer Graphics

Theory and Applications (VISIGRAPP 2021) - Volume

4: VISAPP, pages 677–684. INSTICC, SciTePress.

Pham, V., Pham, C., and Dang, T. (2020). Road damage

detection and classiﬁcation with detectron2 and faster

r-cnn. In 2020 IEEE International Conference on Big

Data (Big Data), pages 5592–5601. IEEE.

Quigley, H. A. and Broman, A. T. (2006). The number of

people with glaucoma worldwide in 2010 and 2020.

British Journal of Ophthalmology, 90(3):262–267.

Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster R-

CNN: Towards real-time object detection with region

proposal networks. Advances in neural information

processing systems, 28.

Glaucoma Detection Using Transfer Learning with the Faster R-CNN Model and a ResNet-50-FPN Backbone

843

Sallam, A., Gaid, A. S., Saif, W. Q., Hana’a, A., Abdulka-

reem, R. A., Ahmed, K. J., Saeed, A. Y., and Radman,

A. (2021). Early detection of glaucoma using transfer

learning from pre-trained cnn models. In 2021 Inter-

national Conference of Technology, Science and Ad-

ministration (ICTSA), pages 1–5. IEEE.

Shibata, N., Tanito, M., Mitsuhashi, K., Fujino, Y., Mat-

suura, M., Murata, H., and Asaoka, R. (2018). Devel-

opment of a deep residual learning algorithm to screen

for glaucoma from fundus photography. Scientiﬁc re-

ports, 8(1):14665.

Tham, Y.-C., Li, X., Wong, T. Y., Quigley, H. A., Aung, T.,

and Cheng, C.-Y. (2014). Global prevalence of glau-

coma and projections of glaucoma burden through

2040: a systematic review and meta-analysis. Oph-

thalmology, 121(11):2081–2090.

TorchVision maintainers and contributors (2016). Torchvi-

sion: Pytorch’s computer vision library. https://github.

com/pytorch/vision.

Wu, Y., Kirillov, A., Massa, F., Lo, W.-Y., and Gir-

shick, R. (2019). Detectron2. https://github.com/

facebookresearch/detectron2.

Yan, S., Shen, B., Mo, W., and Li, N. (2017). Transfer learn-

ing for cross-platform software crowdsourcing recom-

mendation. In 2017 24th Asia-Paciﬁc Software Engi-

neering Conference (APSEC), pages 269–278. IEEE.

ICEIS 2024 - 26th International Conference on Enterprise Information Systems

844