A Deep Learning Approach for Predicting the Response to Anti-VEGF

Treatment in Diabetic Macular Edema Patients Using Optical Coherence

Tomography Images

Karima Garraoui

1,5,∗ a

, Ines Rahmany

1,5 b

, Salah Dhahri

1,2

, Hedi Tabia

, Desir

e Sidib

Hsouna Zgolli

and Nawres Khlifa

Faculty of Sciences and Techniques of Sidi Bouzid, University of Kairouan, Tunisia

Electronics and Microelectronics Laboratory, Faculty of Sciences of Monastir, University of Monastir 5000, Tunisia

University of Paris-Saclay, Evry IBISC Evry, France

Department of Ophthalmology Institut Hedi Raies Tunis, Tunisia

Research Laboratory of Biophysics and Medical Technologies, Higher Institute of Medical Technologies of Tunis,

University of Tunis El Manar, 1006 Tunis, Tunisia

Keywords:

Prediction, Anti-VEGF, DME Patients, OCT Images, Deep Learning, Siamese Network, EfﬁcientNetB2.

Abstract:

Diabetic macular edema (DME) is a serious complication of diabetes that can lead to vision loss, making

the prediction of patient response to anti-vascular endothelial growth factor (anti-VEGF) treatment crucial for

optimizing therapeutic strategies. This study introduces ESSDP (Extended Siam Saves Diabetes Patients), a

novel deep learning approach leveraging a Siamese network architecture with EfﬁcientNetB2 to predict thera-

peutic response in DME patients through optical coherence tomography (OCT) image analysis. By classifying

patients into good or poor responder groups based on central macular thickness reduction after injection, the

proposed framework achieved a predictive performance with an accuracy of 0.80, sensitivity of 0.71, precision

of 0.89, and an F1-Score of 0.74. These ﬁndings highlight the potential of Siamese network-based deep learn-

ing architectures as effective tools for predicting treatment outcomes in DME patients, even when working

with limited datasets, and pave the way for enhancing personalized treatment strategies in ophthalmology.

1 INTRODUCTION

Diabetic macular edema (DME) is a common and se-

rious complication of diabetes, affecting patients cen-

tral vision. It is characterized by a thickening of the

central retina due to the accumulation of intraretinal

ﬂuid (Bhagat et al., 2009). DME represents a major

cause of visual impairment in people with diabetes

(Yau et al., 2012).

The treatment of DME has evolved signiﬁcantly

with the advent of anti-VEGF agents. These ther-

apeutic agents, such as ranibizumab and aﬂibercept,

speciﬁcally target VEGF, a protein involved in vascu-

lar permeability and pathological angiogenesis (Fer-

rara et al., 2004). The treatment of DME has evolved

signiﬁcantly with the advent of anti-VEGF (vascular

endothelial growth factor) agents. These therapeutic

https://orcid.org/0009-0006-4435-5884

https://orcid.org/0000-0001-9086-5080

∗

Corresponding author

agents, such as ranibizumab and aﬂibercept, specif-

ically target VEGF, a protein involved in vascular

permeability and pathological angiogenesis (Brown

et al., 2013).

Optical coherence tomography (OCT) plays a cru-

cial role in the diagnosis and monitoring of DME.

This non-invasive imaging technique provides cross-

sectional images of the retina with micrometric reso-

lution (Zhang et al., 2022). OCT allows quantiﬁcation

of retinal thickness, visualization of edema morphol-

ogy, and evaluation of treatment response (Browning

et al., 2007).

Recently, artiﬁcial intelligence (AI), and more

speciﬁcally deep learning (DL) and convolutional

neural networks (CNN), have emerged as promising

Work carried out in the OCTIPA project (CMCU

23G1418), as part of the PHC-Utique program managed by

the CMCU of the French Ministry of Europe and Foreign

Affairs and the Tunisian Ministrtfcy of Higher Education

and Scientiﬁc Research

Garraoui, K., Rahmany, I., Dhahri, S., Tabia, H., Sidibé, D., Zgolli, H. and Khlifa, N.

A Deep Learning Approach for Predicting the Response to Anti-VEGF Treatment in Diabetic Macular Edema Patients Using Optical Coherence Tomography Images.

DOI: 10.5220/0013181700003890

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 17th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2025) - Volume 2, pages 453-462

ISBN: 978-989-758-737-5; ISSN: 2184-433X

453

tools in the analysis of OCT images. These tech-

nologies allow for automated interpretation of im-

ages, early detection of abnormalities, and prediction

of disease progression (Ting et al., 2019). CNNs, in

particular, have shown great efﬁciency in the classiﬁ-

cation of medical images, including those from OCT

(Kermany et al., 2018).

The classiﬁcation of medical imaging, especially

OCT images in the context of DME, is a rapidly ex-

panding ﬁeld. AI algorithms can now classify images

according to various criteria, such as the presence or

absence of edema, the type of edema, or the sever-

ity of the disease (Schlegl et al., 2018). This auto-

mated classiﬁcation offers considerable potential for

improving diagnostic efﬁciency and therapeutic man-

agement of patients with DME (Fauw et al., 2018).

In this study, we propose an approach based on a

Siamese network using the EfﬁcientNetB2 architec-

ture to predict the response to anti-VEGF treatment

in patients with diabetic macular edema (DME) from

OCT images. The Siamese network, initially intro-

duced by (Ding and Zhu, 2022), is a neural archi-

tecture particularly suited to comparison or similarity

tasks. It consists of two identical subnetworks shar-

ing the same weights, each processing a different in-

put image. These subnetworks, in our case based on

EfﬁcientNetB2, extract relevant features from OCT

images. The uniqueness of the Siamese network lies

in its ability to learn a representation of images that

brings together the characteristics of patients with

similar treatment responses, while distancing those

of patients with different responses. This approach

is particularly effective for limited-size datasets, as is

often the case in medical imaging.

Our approach presents a signiﬁcant innovation in

the ﬁeld of prediction of Anti-VEGF treatment re-

sponse in DME patients using deep learning on OCT

Images. To our knowledge, this speciﬁc method has

not been applied to this particular research topic be-

fore. This originality provides several notable advan-

tages to our work:

• Our study opens new research avenues in the

ﬁelds of ophthalmology and machine learning ap-

plied to medicine by proposing an innovative ap-

proach based on the Siamese network to a critical

clinical problem.

• Our paper goal in this regard is to use both opti-

cal coherence tomography and deep learning pic-

tures to anticipate how individuals with diabetes

macular edema will respond to opposed to VEGF

medication.

• Experimental veriﬁcation with public OCT and

private datasets, shows that this method can ef-

fectively predict anti-VEGF treatment response in

DME High Impact Potential: As the ﬁrst applica-

tion of this method to predicting anti-VEGF treat-

ment response in DME, our work has the potential

to signiﬁcantly inﬂuence future research and clin-

ical practices in this area.

This document is organized as follows: Section 2

presents a state of the art of existing methods for pre-

dicting response to anti-VEGF treatment. Section 3

details our methodological approach, including the

Siamese network architecture, the feature extraction

process, and the learning strategy. Section 4 describes

the experiments carried out, including the description

of the databases used, the evaluation protocols, and

the results obtained. Finally, Section 5 concludes the

study, discusses the clinical implications of our re-

sults, and presents future perspectives for improving

and applying this approach.

2 RELATED WORK

This research project aims to implement a predictive

model exploiting deep learning techniques to assess

reaction to patients having DME to anti-VEGF treat-

ment. Based on established references in medical

imaging, particularly within the ﬁeld related to op-

tical coherence tomography (OCT), the goal is to de-

sign a model capable of predicting the efﬁciency of

treatment from OCT images.

Considering earlier research like those carried out

and reported by (Cao et al., 2021), (Jin et al., 2024)

(Ko et al., 2022). The process entails the creation of

an advanced learning (DL) model for segmenting im-

ages and response categorization by the use of infor-

mation from patients who have not yet received ther-

apy retinopathy caused by diabetes along with elec-

tronic medical records (EMR).

These studies document different patient informa-

tion, encompassing age, sexuality, and sharp vision,

OCT evaluations, as well as further eye disorders.

We start by looking at prior efforts about the ap-

plication of deep learning to anticipate the reaction to

the medical intervention.

In their work, (Meng et al., 2024), intended to as-

sess a prediction model based on BPNN (Back Prop-

agation Neural Network) over OCT-omics to evaluate

the anti-VEGF therapy’s effectiveness in those who

have DME, or diabetic macular edema. A review con-

ducted on 113 eyes in the past was carried out on 82

patients. The classiﬁers used were logistic regression

and Support Vector Machine. These were applied to

a dataset of 34 eyes from a total of 79 eyes. The

ﬁndings indicated that the classiﬁers demonstrated

superior discriminating powers during the validation

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

454

sets as well as during the test sets. Similarly, (Jin

et al., 2024) developed an algorithm that leverages

deep learning techniques to quantify the ﬂuid within

and beneath the retina in optical coherence tomog-

raphy (OCT) images, aimed at assessing changes in

the condition of patients with diabetic macular edema

(DME). A deep learning network based on the U-Net

model was used for the segmentation and calculation

of intraretinal ﬂuid (IRF) as well as the ﬂuid con-

tent in the sub-retinal (SRF) region. A total of 2,955

OCT scans from DME patients with SRF, and IRF,

who received anti-VEGF therapy were analyzed. The

method demonstrated an area under the ROC curve of

0.993 for IRF and 0.998 for SRF. This deep learning

approach enabled the accurate determination of ﬂuid

volumes for both IRF and SRF, with high sensitivity

and speciﬁcity, to assess the condition of patients with

DME.

Furthermore, (Liu et al., 2023) in their work,

aimed to evaluate the accuracy of images obtained

from optical imaging generated by generative an-

tagonist networks (GAN) in order predict response

anti-VEGF levels in individuals with diabetic mac-

ular edema (DME). Clinical and imaging data from

715 patients were used for training, and data from

103 patients were used for validation. Six different

GAN models were applied to generate OCT images,

aimed at estimating the effectiveness of anti-VEGF

therapy. The RegGAN model showed the best predic-

tive performance. The majority of the generated post-

processed OCT images, 95 out of 103, were difﬁcult

for experts to differentiate from real OCT images. By

utilizing GAN models, physicians can better predict

how patients with diabetic macular edema may re-

spond to anti-VEGF treatment, leading to improved

management strategies.

Also, (Ko et al., 2022), in their work, aimed to

develop a time convolutional network (TCN) model

to predict changes in visual acuity (VA) one year af-

ter three monthly injections of anti-VEGF for macu-

lar edema caused by diabetes (DME), using images

from optical coherence tomography (OCT) taken at 1

month and 3 months of follow-up. OCT imaging data

from 317 DME patients treated with three anti-VEGF

injections were collected retrospectively, with pa-

tients classiﬁed as ”improved” (2-line enhanced VA)

or ”non-responders.” A trained beforehand ResNet50

model was applied to extract image characteristics,

then reﬁned on the training set with data augmenta-

tion for the ”enhanced” group. Using concatenated

OCT images with ResNet50 alone achieved 69.04%

accuracy, 0.70.37% speciﬁcity, and 68.05% sensitiv-

ity. However, the application of TCN to extract tem-

poral characteristics of serial OCT images improved

predictive performance to 81.25% accuracy, 74.40%

speciﬁcity, and 92.07% sensitivity, showing its po-

tential to predict the response to DME treatment and

identify early non-responders for treatment adjust-

ment.

In their study, (Cao et al., 2021) aimed to predict

therapeutic responses to anti-VEGF agents in OCT

images of DME patients at the start of medical treat-

ment, using an explainable machine learning-based

system. 712 patients were classiﬁed as poor respon-

ders (294) and good responders (418) based on the re-

duction in central macular thickness following three

injections. Models were developed to make predic-

tions based on the features extracted from the basic

OCT. After performing 5-fold cross-validation, the

best model was a random forest (RF) with a sensi-

tivity of 0.900, a speciﬁcity of 0.851, and an AUC

of 0.923. Ophthalmologists One and Two achieved

sensitivities of 0.775 and 0.750, and speciﬁcities of

0.716 and 0.821 respectively. The sum of the hyper-

reﬂective points proved to be the most relevant fea-

ture. Thus, the RF algorithm accurately predict the

response to anti-VEGF treatment, contributing to per-

sonalized therapeutic planning.

The table 1 below represents the summary of re-

lated work.

3 PROPOSED METHOD

In this study, we introduce a novel approach called

ESSDP (Extended Siam Saves Diabetes Patients),

which leverages the Siamese network architecture for

predicting the response to anti-VEGF treatment in

DME patients using OCT images. This name reﬂects

the core objective of the framework: extending the

application of Siamese networks to improve the lives

of diabetes patients through advanced prediction ca-

pabilities.

The Siamese network is a variant of neural net-

work design initially proposed by Bromley et al.

(Koch et al., 2015). They developed a pair of iden-

tical neural networks with shared parameters and co-

efﬁcients, which generated dual feature representa-

tions when presented with a pair of input signatures.

The outcome for the two signatures comprises a pair

of vectors. These representations are then evalu-

ated using a similarity metric, which was utilized as

an optimization criterion during the learning process.

Over time, the Siamese network has been adapted

to additional areas of machine vision applications,

such as identity conﬁrmation (Taigman et al., 2014)

and single-example image classiﬁcation (Koch et al.,

2015). The core principle of the Siamese neural ar-

A Deep Learning Approach for Predicting the Response to Anti-VEGF Treatment in Diabetic Macular Edema Patients Using Optical

Coherence Tomography Images

455

Table 1: Summary table of related work dealing with response to anti-vegf treatment.

Author Method Database Results

(Meng et al., 2024) Logistic regression,

SVM, BPNN

Private, 113 eyes

from 82 patients

Sensitivity = 0.962%,

Speciﬁcity = 0.926%,

F1-Score = 0.962%,

AUC = 0.982%

(Jin et al., 2024) Deep learning (U-

Net)

Private, 2955 OCT

images from 14 eyes

AUROC 0.993% for IRF,

0.998% for SRF volume

(Liu et al., 2023) GAN models (Reg-

GAN best)

Private, 715 train-

ings, 103 validations

RegGAN showed highest

prediction accuracy, MAE

26.74±21.28 m for CMT

(Ko et al., 2022) Temporal CNN

(TCN)

Private, Taipei Veter-

ans General Hospital,

317 patients

Accuracy = 81.25%,

Speciﬁcity = 74.40%,

Sensitivity = 92.07%

(Cao et al., 2021) Random Forest Private, 712 patients Sensitivity = 0.900%,

Speciﬁcity = 0.851%,

AUC = 0.923%

chitecture is to acquire generalized feature encodings

with a similarity (or difference) measure derived from

the feature representations extracted from a pair of

comparable inputs (retinal scans in our case). The

Siamese architectures have demonstrated particular

effectiveness in scenarios with sparse data, as they can

be trained using limited labeled examples and sub-

sequently reﬁned on more extensive datasets (Koch

et al., 2015).

The Siamese Network architecture is a family of

designs that typically encompasses a pair of equiv-

alent networks. These two networks possess identi-

cal layer counts and structure, featuring shared co-

efﬁcients and weights. Modiﬁcations to the param-

eters of one network are mirrored in the companion

network due to the identical conﬁguration. This ap-

proach has proven effective for dimensionality reduc-

tion in weakly supervised metric learning and identity

veriﬁcation (Koch et al., 2015).

The uppermost layer of these networks incorpo-

rates an objective function that quantiﬁes the sim-

ilarity or divergence score utilizing Euclidean dis-

tance, cosine similarity, or Manhattan distance be-

tween the feature vector representations from the two

networks. Three widely-used objective functions as-

sociated with Siamese networks are contrastive loss,

triplet loss, and binary cross-entropy.

For our investigation, We employed the binary

cross-entropy (BCE) loss function, denoted as L

BCE

and deﬁned as follows:

BCE

= −[ylog(p) + (1 − y)log(1 − p)] (1)

In this equation:

• L

BCE

represents the binary cross-entropy loss.

• y ∈ {0, 1} is the actual class designation (or

ground truth), where y = 1 indicates the positive

class, and y = 0 indicates the negative class.

• p ∈ [0, 1] denotes the likelihood estimated by the

model that the instance belongs to the positive

class (y = 1).

The objective of Siamese networks is to generate

the vectorized feature representation among sample

images sharing an identical class designation to be

nearer together, while distancing the feature vector

representations among sample images with distinct

class designations. Through the binary cross-entropy

objective function 1, following the learning phase

of the model, the resulting feature vector possesses

the characteristic that the Manhattan separation be-

tween similarly-classiﬁed images is more cohesive

compared to images from different categories. For

determining if a pair of images are of the same cat-

egory (label = 0) or distinct categories label= 1), a

threshold on the cosine divergence of the separation

among stored vector representations must be estab-

lished. Generally, this approach is decided through

model training and seeking resemblance scores from

artiﬁcial and authentic images. A match in the top

K is deemed a qualifying criterion derived from the

image collection using the established threshold.

The ﬁgure 1 below illustrates the ﬂowchart of the

proposed method based on Siamese network architec-

ture.

The complete process of treatment response clas-

siﬁcation structure is illustrated in ﬁgure 2 below.

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

456

Figure 1: The ﬂowchart of the complete process of the proposed solution using Siamese networks.

Figure 2: Flowchart of treatment response classiﬁcation for the test dataset.

Feature Extraction Using CNN

During the test phase, the test sample and the

trained samples are extracted by the convolutional

neural network in order to derive relevant attributes

from OCT scans.

This step captures the most important aspects of the

images for predicting the treatment response.

Calculation of Manhattan Distances

After feature extraction, we calculate the Man-

hattan distances between these features and those

of the reference groups (good and bad responders).

This step quantiﬁes the similarity between a new

patient and the reference patients by using function

2, formulated as follows:

Manhattan

∑

i=1

− y

| (2)

In this equation:

• D

Manhattan

represents the Manhattan distance,

which quantiﬁes the similarity between two vec-

tors x and y.

• x

and y

are the i-th components of the respective

vectors.

• n denotes the total number of components in each

vector.

Classiﬁcation Using KNN

Finally, we use a k-Nearest Neighbors (KNN)

classiﬁer to determine the class of the new patient

based on their similarity to the training samples.

This similarity-based classiﬁcation method provides

increased interpretability of the results.

4 EXPERIMENTAL RESULTS

4.1 Dataset

4.1.1 Training Dataset (Kaggle OCT)

For the training of our model, we have used the retinal

OCT dataset from Kaggle

. This extensive dataset en-

https://www.kaggle.com/code/paultimothymooney/

detect-retina-damage-from-oct-images

A Deep Learning Approach for Predicting the Response to Anti-VEGF Treatment in Diabetic Macular Edema Patients Using Optical

Coherence Tomography Images

457

compasses 84,495 retinal scans in JPEG format, cat-

egorized into four distinct groups: NORMAL, CNV

(choroidal neovascularization), DME (diabetic mac-

ular edema), and DRUSEN. The collection is orga-

nized into three main directories (train, test, and val),

each containing subdirectories for each image cate-

gory. This structure enables the learning and assess-

ment of the system across a diverse range of ocular

conditions.

4.1.2 Test Dataset

Our test dataset, for the classiﬁcation of good and

poor responders to anti-VEGF treatment, comes from

the Ophthalmology Department A at the H

edi Raies

Institute in Tunis. It includes 120 radiographs cor-

responding to 104 patients with DME who received

anti-VEGF treatment. For each patient, we collected

pre-treatment and post-treatment images. A profes-

sional ophthalmologist analyzed the post-treatment

images to create a database containing the pre-

treatment image associated with a label indicating

whether the patient is a good or poor responder to the

treatment.

4.2 Results and Discussion

Our overall experimental procedure relies on the

Siamese network framework. It begins with feature

extraction from pairs of OCT images, followed by

the calculation of the distance between these features.

The loss function guides the learning process by min-

imizing the distance between images of patients with

similar responses to anti-VEGF treatment while max-

imizing it between patients with different responses.

This approach allows learning directly from pairs of

images and works efﬁciently with relatively small

datasets, which is often the case in medical applica-

tions.

Once the features are derived, a KNN k-Nearest

Neighbors algorithm is employed for the ultimate cat-

egorization. KNN evaluates the Manhattan distance

between the attributes of the sample picture and the

ones from the training dataset to forecast the sample

picture’s class based on the closest neighbors in the

attribute space.

4.2.1 Hyperparameters’ Tuning

In our experimental environment, models were

trained and validated using a ﬁve-fold cross-

validation approach to ensure generalization of re-

sults. Hyperparameters, including batch size, learn-

ing rate, and number of time periods, were optimized

using a grid search.

Here are the speciﬁc values for the hyperparameters

according to our code:

• Batch size: 32.

• Number of epochs:10.

• Cross-validation: Five-fold cross-validation ap-

proach.

• Hyperparameter Optimization: Using grid search

to optimize hyperparameters.

• Callback to adjust the learning rate: ReduceL-

ROnPlateau that reduces the learning rate by 0.2

after 3 periods without improved validation loss,

with a minimum of 0.00001.

The Siamese model was trained using these optimized

parameters, thus ensuring a good generalization of the

results.

Performance evaluation plays a crucial role in every

image classiﬁcation endeavor. Various assessment

criteria exist to gauge the performance of an image

classiﬁcation system. In this work, we focus on: accu-

racy, sensitivity, F1-score, as well as precision (Gran-

dini and Visani, 2020).

AUC-ROC Curve is used to interpret the likeli-

hood that, when considering two randomly chosen

patients, one being a treatment responder and the

other a non-responder, the predictive marker’s value

is greater for the responder compared to the non- re-

sponder. Notably, an AUC of 0.5 (50%) suggests

that the marker is non-informative. A rise in AUC

signiﬁes enhanced discriminatory capabilities of the

model, with a maximum of 1.0 (100%).

4.2.2 Experimental Results

We chose to use the EfﬁcientNetB2 as an architecture

for our convolutional neural network (CNN) model.

This architecture has recently demonstrated strong

performance across many image classiﬁcation tasks,

offering a good balance between accuracy and ef-

ﬁciency. The table 2 represents the results of our

method.

Table 2: The overall results of the proposed method Efﬁ-

cientNetB2.

Metric Value

Accuracy 0.80%

Sensitivity 0.71%

Precision 0.89%

F1-score 0.74%

4.2.3 Classiﬁcation Results

A prediction of ’good responders’, with associated

scores of 0.40 for poor responders and 0.60 for good

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

458

responders. Figure 3 below represents the prediction

on an OCT image is that the image of a DME patient

is good for anti-VEGF treatment.

Figure 3: Good responder patient.

It shows a prediction of ’bad responders’, with as-

sociated scores of 0.60 for poor responders and 0.40

for good responders. Figure 4 below represents the

prediction on an OCT image, meaning that here we

see that the image is of a DME patient, which is clas-

siﬁed as a poor responder to anti-VEGF treatment.

Figure 4: Bad responder patient.

4.2.4 Training and Validation Loss and

Accuracy Curves

Figure 5 below shows the evolution of the model’s

loss and accuracy during training and validation

across epochs.

The training loss curve decreases fairly steadily,

indicating that the model is learning well during train-

ing. The validation loss curve follows a similar trend,

but with a slight increase at the end, suggesting pos-

sible overﬁtting.

The accuracy curves show an inverse trend, with

an increase in both training accuracy and validation

accuracy over the epochs. This conﬁrms that the

model is improving in its performance.

4.2.5 Comparison with Other Architectures

We evaluated several Convolutional Neural Network

(CNN) architectures for our classiﬁcation task and

compared their effectiveness based on metrics such

as accuracy, sensitivity, precision, and F1-score. Our

following results provide a comparison of the mod-

els EfﬁcientNetB2, CNN from scratch, InceptionV3,

ResNet50V2, EfﬁcientNetB1, and EfﬁcientNetB3, al-

lowing us to identify the strengths and weaknesses of

each architecture.The table 3 provides a comparison

of evaluation metrics.

Table 3: Comparison of evaluation metrics.

Accu-

racy

Sensi-

tivity

Preci-

sion

F1-

score

CNN from

scratch

0.79% 0.79% 0.79% 0.79%

InceptionV3 0.68% 0.68% 0.80% 0.64%

ResNet50V2 0.50% 0.50% 0.25% 0.33%

EfﬁcientNetB1 0.62% 0.62% 0.79% 0.56%

EfﬁcientNetB2 0.80% 0.71% 0.89% 0.74%

EfﬁcientNetB3 0.71% 0.57% 0.85% 0.54%

Comparison with Other Methods from the

Literature

The table 4 below compares our method, based on

a Siamese network and EfﬁcientNetB2, with other

approaches from the literature. It highlights (1) the

databases used, (2) the results in terms of sensitiv-

ity, speciﬁcity, F1-score, and AUC, and (3) the per-

formance of each study.

4.2.6 Discussion

The results obtained, with an accuracy of 80%, a sen-

sitivity of 71%, a precision of 89%, and an F1-Score

of 74%, demonstrate the effectiveness of the proposed

approach for predicting the anti-VEGF treatment re-

sponse in patients with DME. These results are likely

attributed to the use of the Siamese network com-

A Deep Learning Approach for Predicting the Response to Anti-VEGF Treatment in Diabetic Macular Edema Patients Using Optical

Coherence Tomography Images

459

Figure 5: Training and validation loss of the EfﬁcientNetB2 architecture.

Table 4: Comparison with other methods from the litera-

ture.

Author Method Data-

base

Results

(Meng

et al.,

2024)

Logistic

regres-

sion,

SVM,

BPNN

Private,

113 eyes

from 82

patients

Sensitivity=0.962%,

Speciﬁcity=0.926%,

F1-Score = 0.962%,

AUC = 0.982%

(Jin

et al.,

2024)

Deep

learning

(U-Net)

Private,

2955

OCT

images

from 14

eyes

AUROC 0.993% for

IRF,

0.998% for SRF vol-

ume

(Cao

et al.,

2021)

Random

Forest

Private,

712

patients

Sensitivity=0.900%,

Speciﬁcity=0.851%,

AUC=0.923%

Our

method

Efﬁcient-

NetB2

Private,

104 OCT

images

Accuracy=0.80% ,

Sensitivity=0.71%,

Precision=0.89%,

F1 score=0.74%

bined with the EfﬁcientNetB2 architecture, which al-

lows for efﬁcient feature extraction from OCT im-

ages while effectively managing the limited dataset.

The ability of our method to function with a reduced

dataset is one of its main strengths. However, several

limitations should be acknowledged. First, the small

size of our dataset (104 patients) may limit the gener-

alizability of the results to larger or more diverse pop-

ulations. Additionally, variations in OCT image qual-

ity, due to differences in the equipment used or ac-

quisition protocols, could affect the robustness of the

model. Poor-quality or poorly lit images, for exam-

ple, may introduce bias into the model’s predictions.

To enhance the robustness and generalizability of our

method, several avenues are being explored. Testing

our approach on other databases, including similar

retinal pathologies or other medical imaging modal-

ities, would allow us to assess its ability to adapt to

different clinical scenarios. Moreover, integrating ad-

ditional clinical data, such as age, medical history,

or other biological factors of patients, could enrich

the predictions by capturing aspects not visible in the

OCT images, further strengthening the accuracy of

the results. We also plan to expand our study to other

retinal pathologies to test the generalization capability

of our method while maintaining its advantage of ef-

fectively working with limited datasets. These efforts

will help validate the applicability of our approach in

various clinical contexts.

Furthermore, we aim to explore multimodal ap-

proaches, such as combining textual and medical im-

age information. Integrating these different sources

of information could improve the precision and per-

formance of the model, further strengthening the con-

tribution of our work. Lastly, we intend to implement

additional models that leverage the inherent strengths

of machine learning and deep learning methods to fur-

ther improve the prediction of the anti-VEGF treat-

ment response in patients with DME.

5 CONCLUSION

In humans, learning is a continuous process that

evolves throughout life, inﬂuenced by sensory percep-

tions, personal experiences, and recurring events. In

contrast, devices function through processes that rely

on input and output data. Deep learning, a technique

inspired by the human brain, has emerged as a power-

ful tool, achieving levels of accuracy that sometimes

surpass human capabilities. It has proven particularly

effective in the medical ﬁeld, where it can identify

diseases in medical images, characterize them, and

even quantify their progression.

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

460

Unlike humans, who continuously process large

volumes of data and face a variety of challenges over

time, deep learning models typically learn from a

more limited dataset tailored to a speciﬁc task. This

study focused on exploring the impact of an optimized

data pipeline on the performance of a deep learn-

ing model, highlighting the signiﬁcant improvements

that can be achieved through a data-driven approach.

Our ﬁndings suggest that combining robust data engi-

neering with a relatively simple convolutional neural

network architecture, such as the Siamese network,

holds great potential for advancing clinical applica-

tions. Speciﬁcally, the model can be leveraged to

predict responses to anti-VEGF treatment in patients

with diabetic macular edema (DME), offering a valu-

able tool for personalized treatment strategies.

The use of the Siamese network architecture in our

study, designed for scenarios involving small datasets,

was particularly beneﬁcial given the limited size of

our private OCT image dataset. However, several av-

enues remain for enhancing clinical outcomes. Fu-

ture research could focus on expanding the dataset by

including diverse patient populations to improve the

generalizability of the model. Additionally, integrat-

ing multimodal data, such as clinical histories or ge-

netic information, could enhance predictive accuracy.

Exploring transfer learning or semi-supervised learn-

ing techniques could also help overcome the limita-

tions of small datasets and expand the applicability

of this approach to other retinal pathologies and dis-

eases beyond DME. These steps could strengthen the

model’s potential in clinical settings, ultimately lead-

ing to more accurate and timely treatment predictions

for patients.

REFERENCES

Bhagat, N., Grigorian, R. A., Tutela, A., and Zarbin, M. A.

(2009). Diabetic macular edema: pathogenesis and

treatment. Survey of Ophthalmology, 54(1):1–32.

Brown, D. M., Nguyen, Q. D., Marcus, D. M., Boyer, D. S.,

Patel, S., Feiner, L., and Ehrlich, J. S. (2013). Long-

term outcomes of ranibizumab therapy for diabetic

macular edema: the 36-month results from two phase

iii trials. Ophthalmology, 120(10):2013–2022.

Browning, D. J., Glassman, A. R., Aiello, L. P., Beck,

R. W., Brown, D. M., Fong, D. S., and Ferris, F. L.

(2007). Relationship between optical coherence to-

mography–measured central retinal thickness and vi-

sual acuity in diabetic macular edema. Ophthalmol-

ogy, 114(3):525–536.

Cao, J., You, K., Jin, K., Lou, L., Wang, Y., Chen, M., and

Ye, J. (2021). Prediction of response to anti-vascular

endothelial growth factor treatment in diabetic mac-

ular oedema using an optical coherence tomography-

based machine learning method. Acta Ophthalmolog-

ica, 99(1):e19–e27.

Ding, F. and Zhu, F. (2022). Hliferl: A hierarchical life-

long reinforcement learning framework. Journal of

King Saud University - Computer and Information

Sciences, 34(7):4312–4321.

Fauw, J. D., Ledsam, J. R., Romera-Paredes, B., Nikolov,

S., Tomasev, N., Blackwell, S., and Ronneberger, O.

(2018). Clinically applicable deep learning for diag-

nosis and referral in retinal disease. Nature Medicine,

24(9):1342–1350.

Ferrara, N., Hillan, K. J., Gerber, H. P., and Novotny, W.

(2004). Discovery and development of bevacizumab,

an anti-vegf antibody for treating cancer. Nature Re-

views Drug Discovery, 3(5):391–400.

Grandini, E. B. and Visani, G. (2020). Metrics for multi-

class classiﬁcation: an overview. arXiv preprint,

2008.05756.

Jin, Y., Yong, S., Ke, S., Zhang, C., Liu, Y., Wang, J., and

Zhang, J. (2024). Deep learning assisted ﬂuid vol-

ume calculation for assessing anti-vascular endothe-

lial growth factor effect in diabetic macular edema.

Heliyon, 10(8).

Kermany, D. S., Goldbaum, M., Cai, W., Valentim, C. C.,

Liang, H., Baxter, S. L., and Zhang, K. (2018). Iden-

tifying medical diagnoses and treatable diseases by

image-based deep learning. Cell, 172(5):1122–1131.

Ko, Y., Peng, C., Ho, H., Chiu, S., Chen, S., and Lee, C.

(2022). Deep learning assisted prediction of long-term

visual outcome after 3 monthly anti-vascular endothe-

lial growth factor injections in patients with central-

involved diabetic macular edema. Investigative Oph-

thalmology & Visual Science, 63(7):3778–F0199.

Koch, G., Zemel, R., and Salakhutdinov, R. (2015).

Siamese neural networks for one-shot image recogni-

tion. In ICML Deep Learning Workshop, volume 2,

Lille.

Liu, S., Hu, W., Xu, F., Chen, W., Liu, J., Yu, X., and

Li, J. (2023). Prediction of oct images of short-

term response to anti-vegf treatment for diabetic mac-

ular edema using different generative adversarial net-

works. Photodiagnosis and Photodynamic Therapy,

41:103272.

Meng, Z., Chen, Y., Li, H., Zhang, Y., Yao, X., Meng,

Y., and Luo, J. (2024). Machine learning and optical

coherence tomography-derived radiomics analysis to

predict persistent diabetic macular edema in patients

undergoing anti-vegf intravitreal therapy. Journal of

Translational Medicine, 22(1):358.

Schlegl, T., Waldstein, S. M., Bogunovic, H., Endstraßer, F.,

Sadeghipour, A., Philip, A. M., and Schmidt-Erfurth,

U. (2018). Fully automated detection and quantiﬁca-

tion of macular ﬂuid in oct using deep learning. Oph-

thalmology, 125(4):549–558.

Taigman, Y., Yang, M., Ranzato, M., and Wolf, L. (2014).

Deepface: Closing the gap to human-level perfor-

mance in face veriﬁcation. In Proceedings of the IEEE

Conference on Computer Vision and Pattern Recogni-

tion, pages 1701–1708.

A Deep Learning Approach for Predicting the Response to Anti-VEGF Treatment in Diabetic Macular Edema Patients Using Optical

Coherence Tomography Images

461

Ting, D. S. W., Pasquale, L. R., Peng, L., Campbell, J. P.,

Lee, A. Y., Raman, R., and Wong, T. Y. (2019). Artiﬁ-

cial intelligence and deep learning in ophthalmology.

British Journal of Ophthalmology, 103(2):167–175.

Yau, J. W., Rogers, S. L., Kawasaki, R., Lamoureux, E. L.,

Kowalski, J. W., Bek, T., and Wong, T. Y. (2012).

Global prevalence and major risk factors of diabetic

retinopathy. Diabetes Care, 35(3):556–564.

Zhang, Y., Yang, M., Zhao, S. X., Shen, L. J., and Han,

W. (2022). Hyperosmolarity disrupts tight junction

via tnf-α/mmp pathway in primary human corneal ep-

ithelial cells. International Journal of Ophthalmology,

15(5):683.

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

462