Cost-Effective Strabismus Measurement with Deep Learning

Luis Felipe Araujo de Oliveira

, Jo

ao Dallyson Sousa de Almeida

Thales Levi Azevedo Valente

, Jorge Antonio Meireles Teixeira

and Geraldo Braz Junior

ucleo de Computac

ao Aplicada, Universidade Federal do Maranh

ao (UFMA), S

ao Lu

ıs, MA, Brazil

jorgemeireles1@gmail.com, {jdallyson, geraldo}@nca.ufma.br

Keywords:

Strabismus, Convolutional Neural Network, Deep Learning, YOLO.

Abstract:

This article presents a new methodology for detecting and measuring strabismus. Traditional diagnostic meth-

ods in the medical ﬁeld often require patients to visit a specialist, which can present challenges in regions

with limited access to strabismus experts. An accessible and automated approach can, therefore, support

ophthalmologists in making diagnoses. The proposed methods use images from the Hirschberg Test exams

and employ techniques based on Convolutional Neural Networks (CNNs) and image processing to detect the

limbus region and measure the brightness reﬂected in patients’ eyes from the camera’s ﬂash. The method cal-

culates the distance between the limbus’s center and the reﬂected brightness’s center, converting this distance

from pixels to diopters. The results show the potential of these approaches, achieving signiﬁcant effectiveness.

1 INTRODUCTION

Strabismus is an eye condition in which the eyes

are not correctly aligned and point in different di-

rections. This condition affects about 2% to 4% of

the global population (Hashemi et al., 2019) and is

often caused by abnormalities in binocular vision or

issues with the neuromuscular control of eye move-

ments. Strabismus can result in various complica-

tions, including permanent vision loss, visual ﬁeld

defects, and impaired binocular vision, among other

problems(Buffenn, 2021).

The current diagnosis of strabismus primarily re-

lies on two tests: the Prismatic Cover Test (PCT),

also known as the Cover Test, and the Hirschberg

Test. During the PCT, the examiner alternately cov-

ers one eye while observing the other. They measure

the deviation in prismatic diopters (PD) by adjusting

the strength of the prism to restrict eye movement. In

the Hirschberg Test, the examiner shines a small light

into the patient’s eyes. They determine the angle of

the strabismus by measuring the distance between the

corneal reﬂection light reﬂection (CR) and the center

https://orcid.org/0009-0008-8717-4420

https://orcid.org/0000-0001-7013-9700

https://orcid.org/0000-0002-5429-4986

https://orcid.org/0000-0002-1842-486X

https://orcid.org/0000-0003-3731-6431

of the pupil. Both tests rely on the examiner’s exper-

tise and can introduce subjectivity.

There have been efforts to enhance the accuracy

of strabismus measurement and detection, as seen

in studies (Miao et al., 2020) and (Durajczyk et al.,

2023). However, these methods often require costly

virtual reality or specialized equipment, which can be

impractical for small clinics in rural areas.

In recent years, researchers have widely used con-

volutional neural networks (CNN) for various classiﬁ-

cation and detection tasks (Li et al., 2022) due to their

strong ability to generalize across different data types.

This capability enables them to handle complex ob-

ject detection scenarios, such as identifying cars or

human faces. Additionally, studies demonstrate that

CNNs outperform traditional computer vision meth-

ods in classiﬁcation and detection tasks (O’Mahony

et al., 2020).

Therefore, the present work aims to address the

challenges of strabismus diagnosis by proposing a

methodology for detecting this condition. Building

upon the Hirschberg Test, it leverages advancements

in Convolutional Neural Networks (CNNs) and im-

age processing techniques to improve accuracy, ac-

cessibility, and cost-effectiveness. By integrating au-

tomated detection capabilities, these proposed ap-

proaches reduce the reliance on specialized equip-

ment and the need for expert examiners, making

them particularly suitable for areas with limited ac-

Araujo de Oliveira, L. F., Sousa de Almeida, J. D., Valente, T. L. A., Teixeira, J. A. M. and Braz Junior, G.

Cost-Effective Strabismus Measurement with Deep Learning.

DOI: 10.5220/0013438200003929

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 27th International Conference on Enterprise Information Systems (ICEIS 2025) - Volume 1, pages 593-604

ISBN: 978-989-758-749-8; ISSN: 2184-4992

593

cess to ophthalmological resources, with the hopes

of making strabismus diagnosis widely available and

cheaper.

This study utilizes a comprehensive dataset of

Hirschberg Test images collected under controlled

and real-world conditions to evaluate the proposed

methodologies effectively. The evaluation metrics in-

clude precision, recall, and computational efﬁciency,

providing a thorough assessment of the performance

of the proposed methods. This work aims to con-

tribute to the ﬁeld of ophthalmology by offering scal-

able and practical solutions for strabismus detection.

These advancements can potentially improve early di-

agnosis and treatment outcomes, especially in under-

served regions, thus addressing a critical gap in global

eye care.

2 RELATED WORK

In recent years, there has been a growing effort to

automate strabismus measurement, given the serious-

ness of the problem and the negative consequences of

untreated strabismus.

In (Almeida, 2015), researchers propose a pio-

neering method for preliminary screening and assist-

ing in diagnosing strabismus in digital images, uti-

lizing a dataset of 40 strabismic patients and 5 non-

strabismic individuals. They ﬁrst segment the fa-

cial region based on the color range of human skin.

Within the area deﬁned by the ﬁlter, they employ Sup-

port Vector Machine (SVM) algorithms, geostatisti-

cal functions, histogram equalization, and the circular

Hough Transform to locate both eyes accurately. Us-

ing an additional Hough Transform, they identify the

limbus and the CR within the eye region. The detec-

tion types of strabismus were (ET), exotropias (XT),

hypertropias (HT), and hypotropias (HoT), with the

method achieving an accuracy of 88% and 100% for

ET and XT (horizontal deviations), respectively, and

80.33% and 83.33% for HT and HoT (vertical devia-

tions), respectively. The total mean error in diopters

was 5.6∆ and 3.83∆ for horizontal and vertical devia-

tions, respectively.

In (Cheng et al., 2021), a school nurse conducted

vision screening in an elementary school using spe-

cialized equipment and the EyeTurn app, referenced

in (Pundlik et al., 2019), which outputs information

about eye location and strabismus measurement (in

diopters). Following a predeﬁned protocol, an ex-

pert evaluated whether the software correctly identi-

ﬁed image features such as the iris and corneal re-

ﬂection. In the ﬁrst moment, the researchers used a

threshold of 2.4 ∆ (strabismus detected by the app) to

decide which patients required in-person cover testing

with a specialized eye doctor. They successfully ob-

tained at least one measurement from the app for 93%

of the children. Of these, 40 children were ﬂagged

for a prism alternate cover test, and 6 were conﬁrmed

to have strabismus. The study established the app’s

optimum threshold for detecting strabismus to be 3.0

∆, achieving a sensitivity of 83% and a speciﬁcity of

76.5%.

In (S¸

ukr

u Karaaslan et al., 2023), a pre-trained

Key-Point detection model, Mediapipe, a frame-

work implementation of the model developed in

(Bazarevsky et al., 2019), determines the eye iris re-

gion. Next, the CR is segmented using a binary seg-

mentation along Otsu’s algorithm for determining the

threshold; for the pupil, the image is converted from

grayscale to HSV, zooming in the iris for better detec-

tion of the pupil, which is segmented using an arbi-

trary threshold determined by the authors and by se-

lecting the minimum enclosing circle of the region.

Strabismus is measured based on the distance be-

tween the pupil and CR centers. The method consid-

ers that the correct detection of the CR and the cornea

is a successful measurement, given that hypothesis the

authors achieved 90% accuracy for right and 91% ac-

curacy for left iris detection, 98% accuracy for CR

detection and a maximum error of ± 2° in measure-

ments. Average measurement error or standard devi-

ation were not speciﬁed.

(Dericio

glu and C¸ erman, 2019) analyzed 1,022

photographs with a resolution of 3264x2448 from

15 orthophoric subjects with known gaze angle and

imaging distance. They developed a smartphone ap-

plication allowing users to mark the limbus and the

corneal light reﬂex manually. The application then

calculated the distance from the geometric center of

the cornea to the light reﬂex (RD), the corneal diam-

eter (CD), and the interpupillary distance (IPD). The

study presented two parts. In the ﬁrst part, the re-

searchers measured each patient’s RD/CD ratio and

used it in a regression model to develop a non-linear

equation for estimating gaze angles. They incorpo-

rated the IPD to calculate a trigonometric function

suitable for converting pixels to millimeters based on

imaging distance. In the second part, they applied

these equations to 572 photographs of strabismus pa-

tients to calculate deviations based on their RD/CD

ratio and IPD. They then compared these results with

measurements taken by a double-masked strabismus

specialist using the Krimsky or Prism Cover Test. The

paper achieved high correlation between estimated

measurements and the specialist, with r = 0.966 and

P < 0.001, mean error of the estimated gaze angle

0.03∆ ± 4.60∆ and average error −0.68∆ ± 6.1∆; the

ICEIS 2025 - 27th International Conference on Enterprise Information Systems

594

reliability of the method was high, given Cronbach’s

α = 0.983.

Given an extensive search of the literature, it is

clear that precise object detection is one of the main

bottlenecks for automatic methods of strabismus mea-

surement and can be a decisive factor for the correct

calculation. In (Almeida, 2015), feature-based im-

age processing algorithms are used to detect the lim-

bus and the CR for measurement; the method does

achieve reliable results but is inviable for resource-

restricted situations due to the cost of all of the image

processing required to detect the limbus and the CR.

In (S¸

ukr

u Karaaslan et al., 2023) after the initial iris

detection using a Deep Learning model, the rest of

the method also relies on image processing to detect

the pupil and the CR; the pupil is much more difﬁcult

to detect than the limbus due to its variable but much

smaller size compared to limbus, a fact that can in-

crease the error rate of such method extensively when

the patient wears corrective lenses(the lenses would

generally distort the color of or partially occlude the

pupil).

Alternative methods for automatic detection of

strabismus involve using smartphone apps, due to the

great availability of smartphones and ease of use of

such apps, as in (Cheng et al., 2021), where a nurse

was only required to align the patient’s eyes with rect-

angles displayed in the app’s interface. In (Dericio

glu

and C¸ erman, 2019) the app was used to obtain the RD,

CD and IPD whilst the strabismus calculation was in-

ferred from this data. Both methods required human

input, even if not requiring knowledge of strabismus

measurements, but achieved signiﬁcant practical re-

sults, either with high correlation with specialist mea-

surements or high speciﬁcity and sensitivity.

Even though (Almeida, 2015) evaluates ﬁve dis-

tinct gaze positions, it employs an extensive image-

processing pipeline and parameterized feature extrac-

tion, increasing both parametrization and computa-

tional costs. In contrast, our proposed solution har-

nesses convolutional neural networks (CNNs) to au-

tomatically detect the corneal reﬂex and the limbus,

eliminating the need for additional human input or

specialized knowledge of strabismus measurement.

Moreover, our approach also evaluates ﬁve gaze posi-

tions yet does so with greater efﬁciency and scalabil-

ity, making it more suitable for real-world screening

scenarios where time and resources are limited.

However, this work aims to create a scalable

and practical solution for strabismus measurement,

one that does not require human-input, any knowl-

edge of strabismus measurement, together with cost-

effectiveness related both to speed and precision of

measurements.

3 MATERIALS AND METHODS

In this section, we will discuss the dataset, used net-

work and details of this work’s method, along with

the speciﬁcs of strabismus calculation.

3.1 Dataset

The dataset is the same as used in (Almeida, 2015),

and comprises of 277 images with a resolution of

1536x2048 from 45 patients of various sexes and

ages. These images capture ﬁve distinct gaze posi-

tions: PP (Primary Position of Gaze), LEVO (eyes

looking to the left), DEXTRO (eyes looking to the

right), SUPRA (eyes looking upward), and INFRA

(eyes looking downward). Figure 1 illustrates these

gaze positions. For testing purposes, only the images

in these 5 standard gaze positions were used for test-

ing as noted in Section 4, despite some patients having

images in variations of the 5 standard positions (e.g:

patients looking both upward and to the left).

(a) .

(b) . (c) . (d) .

(e) .

Figure 1: a) SUPRA, b) LEVO, c) PP, d) DEXTRO, e) IN-

FRA.

Researchers manually annotated all images and

used them for training the network. For the testing

phase, we excluded four patients who lacked com-

plete annotations from the physician.

3.2 YOLO (You Only Look once)

YOLO, which stands for “You Only Look Once,” is

a convolutional neural network designed for real-time

object detection. It introduces a novel approach to de-

tection initially proposed by (Redmon et al., 2015).

Unlike traditional methods that treat object detection

as a classiﬁcation problem, YOLO treats it as a re-

gression problem.

YOLO consists of a uniﬁed model in which a sin-

gle network predicts the location of objects and their

corresponding classes. This design allows YOLO to

operate at remarkable speeds; for instance, YOLOv8

can process images at 150 frames per second (FPS)

Cost-Effective Strabismus Measurement with Deep Learning

595

with an Average Precision Across Scales (APAS) of

52.7. These results were achieved using the COCO

(Common Objects in Context) challenge dataset (Lin

et al., 2014) and executed on NVIDIA RTX 3090

GPU.

Although researchers initially designed YOLO for

real-time detection in videos, they have demonstrated

its effectiveness for object detection in images as well

(Varghese and M., 2024). In its eighth version, the

network produces three outputs: the ﬁrst includes the

coordinates of the detected bounding box (bbox), its

conﬁdence score, and the class of the detected object

within that region. The model calculates the conﬁ-

dence score of the bbox based on the estimated prob-

ability that the region belongs to the speciﬁed class.

This work employs YOLOv8, as outlined in (Vargh-

ese and M., 2024).

The YOLOv8 network has two main components:

the backbone and the head. The backbone, a con-

volutional neural network (CNN), extracts a feature

map from the input and provides context at multi-

ple levels of representation for the rest of the net-

work. In YOLOv8, engineers base the backbone on

the EfﬁcientNet-B4 architecture (Tan and Le, 2019).

Meanwhile, the head utilizes the feature map pro-

duced by the backbone as input for additional convo-

lutions and upsampling layers. Ultimately, it culmi-

nates in ﬁnal convolutions that perform object classi-

ﬁcation and localization.

3.3 Method

This work employs the YOLO model for detecting

both corneal reﬂection and limbus due to its robust-

ness and effectiveness in object detection, as outlined

in Section 4.1. We train the YOLOv8 network on the

dataset described in 3.1 to accurately identify the lim-

bus and the corneal reﬂection point in the patient’s im-

age. This identiﬁcation enables us to calculate strabis-

mus using the coordinates obtained from the bound-

ing boxes. Figure 2 illustrates our method.

Initially, we divided the patient’s image into two

halves along the vertical axis (we blurred the image in

the ﬁgure to anonymize the patient). We then perform

inference on both halves of the image. If the network

detects at least two bounding boxes for the limbus and

two for the corneal reﬂection, we proceed with the

method; otherwise, we consider it an error. Typically,

the number of bounding boxes exceeds the minimum

required, so we must ﬁlter which ones to use. We

check for all limbus bounding boxes if there are any

CR bounding boxes, if there are not any, the bounding

box is excluded, with the same logic being applied to

CR bounding boxes. In most cases (except for speciﬁc

exceptions), the corneal reﬂection is contained within

the limbus.

Finally, the corneal reﬂection bounding boxes are

sorted based on their distance to the limbus. This

is because erroneously detected corneal reﬂection

bounding boxes are very likely to be farther from any

limbus bounding box. The two pairs of bounding

boxes (corneal reﬂection and limbus) with the small-

est distances with respect to each other are selected,

following the hypothesis that if low-quality bound-

ing boxes still remain after the exclusion process, the

model will likely still detect the correct limbus bound-

ing boxes close to the corneal reﬂection bounding

boxes.

3.3.1 Strabismus Calculation

After identifying the bounding boxes for both eyes

and corneal reﬂection using the procedure above, we

compute the strabismus angles through the following

steps:

1. Identify the Fixating Eye: calculate the Eu-

clidean distance between the center of the limbus

and the corneal reﬂection for each eye. The eye

with the smaller distance is considered the ﬁxat-

ing eye.

2. Compute the Deviated Eye Displacement: for

the other (deviated) eye, compute the horizontal

(HD

pixel

) and vertical (V D

pixel

) components of

the distance between the center of the limbus and

the corneal reﬂection.

3. Convert Pixels to Millimeters: use the following

equation to convert pixel distances to millimeters:

pixel

= Limb

adult

/Diam

f ix

where Diam

f ix

the diameter of the ﬁxating eye in pixels (the

bounding box’s width for horizontal calculations

and height for vertical calculations). Limb

adult

represents the average adult limbus size (Khng

and Osher, 2008). Consequently:

= HD

pixel

∗ pixel

V D

= V D

pixel

∗ pixel

4. Convert Millimeters to Diopters: ﬁnally, con-

vert the distances in millimeters to diopters using

the conversion constant delta = 15, as established

in (Schwartz, 2006) and (Almeida, 2015):

diop

= HD

∗ delta

V D

diop

= V D

∗ delta

By integrating the CNN-based detection (Sec-

tion 3.3) with the strabismus calculation steps out-

lined here, our approach delivers an automated and

efﬁcient framework for reliably measuring strabismus

in clinical or ﬁeld settings.

ICEIS 2025 - 27th International Conference on Enterprise Information Systems

596

Figure 2: CNN-based strabismus calculation.

4 RESULTS AND DISCUSSION

We divided the experimental outcomes into training

and testing to provide a comprehensive analysis of

both phases.

4.1 Training

We evaluated the training performance of the model

using Precision, Recall, and the F1-Score metrics.

The F1-Score represents the harmonic mean of Pre-

cision (the ratio of true positives to all predicted pos-

itives) and Recall (the ratio of true positives to all ac-

tual positives). Mathematically, the F1-Score is de-

ﬁned as:

= 2

precision ∗ recall

precision + recall

2t p

2t p + f p + f n

(1)

where t p denotes true positives (bounding boxes

with an Intersection over Union (IoU ≥ 0.5); f p de-

notes false positives (IoU < 0.5), and f n indicates

false negatives (the background being mislabeled as

a valid bounding box). The F1-Score ranges from 0

to 1, with higher values indicating better model per-

formance.

We conducted training on Google Colab (Google,

2023) (version 2024-11-11), a cloud-computing plat-

form for Machine Learning and Data Analysis, using

an Ubuntu 22.04 LTS virtual machine equipped with

an Nvidia T4 15 GB GPU. The model ran under the

Ultralytics (Jocher et al., 2023) framework (version

8.1.17) and used 2 training classes, limbo and ﬂash

for the limbus and CR, respectively.

In the ﬁrst training stage, we split the dataset (Sec-

tion 3.1) into 70% for training, 10% for validation,

and 20% for testing. We froze 15 out of 21 network

layers and trained for 39 of the planned 50 epochs, im-

plementing early stopping after 10 epochs. We used

a batch size of 3, an image resolution of 1600x1600 ,

optimizer with an initial learning rate of lr

= 0.01.

The ﬁnal learning rate was set to lr

= 0.01 ∗ lr

following the Ultralytics scheduling strategy, which

Figure 3: F1-Score training curve.

starts with a higher learning rate and progressively

decreases it after each epoch. We applied data aug-

mentation online, employing transformations such as

saturation and hue adjustments, translation, scaling,

horizontal ﬂipping, and partial image erasure through

the Ultralytics Python framework.

On the validation set, this training process

achieved a Precision of 0.972, a Recall of 0.474 and

a F1-Score of 0.637 on validation. Figure 3 exem-

pliﬁes the performance differences between ﬂash and

limbus detection, due to the higher amplitude of the

limbus class curve in the graph; the ﬂash class had

worse performance, as denoted by the lower ampli-

tude in the plot, possibly due to other bright spots

present in some images, as discussed in Section 4.3.

Figure 4 shows precision and recall at varying thresh-

olds of IoU, with a satisfying performance for limbus

and a lower performance for the ﬂash class. These re-

sults prompted the authors to further train the model

with 5-fold cross-validation to enhance the model’s

detection capabilities, as will be discussed.

In the second stage of training, we employed 5-

fold cross-validation (Refaeilzadeh et al., 2009), with

a 70%–30% split for training and testing, respectively.

We applied transfer learning at each training session,

using the weights from the previous session (Zhuang

et al., 2019). The ﬁrst session in this stage initialized

its weights from the ﬁrst training stage. Each session

was trained for 20 epochs, with early-stopping of 5

Cost-Effective Strabismus Measurement with Deep Learning

597

Figure 4: Precision-recall training curve.

epochs, retaining other hyperparameters from the ini-

tial training. Table 1 presents the results for each test

fold.

Table 1: Table with results respective to each fold and aver-

age metrics.

Test Fold Precision Recall F1-Score

FOLD 1 0.954 0.983 0.968

FOLD 2 0.949 0.980 0.964

FOLD 3 0,948 0.956 0.952

FOLD 4 0.963 0.965 0.964

FOLD 5 0,944 0.967 0.955

AVERAGE 0.952 0.970 0.961

Results were satisfactory both for classiﬁcation

and bounding box localization, given high average

Precision, Recall and F1-Score. The last training re-

sults showed a good promise for precise CR and lim-

bus detection.

4.2 Test Results

We evaluated the model using the respective test fold

from each 5-fold cross-validation session, thereby

minimizing potential training bias. We also excluded

patient images that lacked strabismus annotations,

since these images could not be evaluated for mea-

surement error and patient images in non-standard

gaze positions, as the specialist had to ask some pa-

tients to look to the right and to the left when in the

INFRA and SUPRA positions. To make the quantity

of images in each position more balanced, such im-

ages were excluded. Results are evaluated with the

following metrics: Mean Absolute Error (MAE) in

prismatic diopters, Pearson’s correlation coefﬁcient

(CORR) and amount of images that passed to the

measurement stage (QUANT).

Results relative to 213 valid images (i.e., images

with specialist annotations in all ﬁve standard gaze

positions), yielded 213 valid detections, correspond-

ing to an 100% detection rate. Test results took

around 100ms to 500ms. Table 2 shows the method

results for all positions.

Table 2: Average results per position.

METRICS

Position MAE TOTAL MAE H MAE V CORR H CORR V QUANT

PP 9.18 ± 8.09 12.81 ± 8.99 5.55 ± 3.65 0.65 0.2 42

DEXTRO 17.14 ± 29.19 27.32 ± 37.2 6.96 ± 6.94 0.25 0.47 43

LEVO 11.1 ± 15.41 16.29 ± 19.64 5.91 ± 4.26 0.57 0.35 43

SUPRA 14.76 ± 18.27 19.38 ± 22.33 10.13 ± 9.32 0.46 0.33 43

INFRA 32.91 ± 69.78 28.91 ± 44.53 36.92 ± 80.35 0.5 0.34 42

AVERAGE 17.02 ± 28.15 20.94 ± 26.54 13.09 ± 20.9 0.48 0.34 213

Given the high standard deviation in the overall

results from Table 2, also evidenced in the Bland-

Altman plot in Figure 5, we ﬁltered out selected pa-

tients to achieve a more accurate analysis. Speciﬁ-

cally, we removed cases where errors exceeded 2 stan-

dard deviations from the mean, because such high er-

rors usually denoted cases where the CNN detected

the CR very far in the image or the limbus was lat-

erally or vertically occluded, as will be discussed in

Section 4.3

Figure 5: Bland-Altman plot for the whole test set and all

positions, with horizontal and vertical strabismus. Mean

denote by red line, 95% Conﬁdence Interval denoted by

dashed lines.

It can be observed from the results in Table 3 that

the LEVO position had the best result, with an aver-

age MAE of 7.18 ± 6.09 prismatic diopters. A reason

for such result is that most patients in the dataset have

some form of horizontal strabismus and when asked

to look to the right or to the left, the deviated eye be-

comes more visually perceptible to the CNN, also the

CR in the deviated eye becomes more dislocated in

some patients. Figure 6 illustrates a good measure-

ment in the LEVO position, with 3∆ and 3∆ of error

in horizontal and vertical measurements, respectively.

Table 3: Filtered results per position.

METRICS

Position MAE TOTAL MAE H MAE V CORR H CORR V QUANT

PP 7.48 ± 5.45 10.13 ± 5.15 4.84 ± 3.14 0.76 0.42 33

DEXTRO 9.76 ± 9.97 14.84 ± 11.14 4.68 ± 3.3 0.46 0.53 35

LEVO 7.18 ± 6.09 9.92 ± 6.8 4.43 ± 2.61 0.79 0.36 35

SUPRA 9.75 ± 7.67 12.16 ± 8.06 7.35 ± 4.54 0.8 0.63 34

INFRA 8.22 ± 6.95 11.46 ± 7.52 4.98 ± 3.34 0.58 0.47 35

AVERAGE 8.48 ± 7.23 11.7 ± 7.73 5.26 ± 3.39 0.68 0.48 172

The PP position had the second best result on av-

erage, losing to the LEVO position in the total aver-

age by less than 1∆. The PP position is often con-

sidered the easiest for measurement, since the eye-

ICEIS 2025 - 27th International Conference on Enterprise Information Systems

598

Figure 6: Patient 016 with a good measurement in the

LEVO position.

lids are more open and the patient is attempting to ﬁx

the gaze directly into an object in front of them. Fig-

ure 7 illustrates a good measurement in the PP posi-

tion, with 3∆ and 5∆ of error in horizontal and vertical

measurements, respectively.

Figure 7: Patient 014 with a good measurement in the PP

position.

The SUPRA and INFRA positions had results

with an average error of 9.75 ± 7.67∆ and 8.22 ±

6.95∆, respectively. These positions can have worse

results since the limbus might be less visible in certain

patients, as discussed in Section 4.3. Figures 8 and 9

illustrate measurements in these positions.

Figure 8: Patient 001 in the SUPRA position.

Figure 9: Patient 006 in the INFRA position, error of 0∆ in

horizontal and error of 3∆ in vertical measurements.

The DEXTRO position had the worst results out

of all positions, with an average error 2.58∆ higher

than the LEVO position, a discrepancy similar to the

one between the INFRA and SUPRA position (The

SUPRA position being 1.53 ∆ worse worse on aver-

age than the INFRA position). Such differences can

be explained due to the acquisition of images not fol-

lowing a speciﬁc protocol, in order to simulate real-

world acquisitions, leading to differences between op-

posite positions that can diminish or increase mea-

surement error. Figure 10 illustrates patient 033 with

an error of 4∆in horizontal and error of 4∆ in vertical

measurements.

Figure 10: Patient 033 in the DEXTRO position.

As shown in Table 3 and in Figure 11, ﬁltering

out these problematic cases signiﬁcantly reduced the

standard deviation and improved overall results. The

vertical and horizontal MAE for most positions de-

creased below 10 ∆, aligning with the maximum er-

ror tolerance for specialist measurements reported in

(Choi and Kushner, 1998). The number of detected

images dropped from 213 to 172 after ﬁltering, cor-

responding to 62.1% of the total dataset or 80.1% of

valid images. Despite this reduction, the lower vari-

ance in the metrics underscores the enhanced reliabil-

ity of the ﬁnal analysis.

Figure 11: Bland-Altman plot with ﬁltered results. Mean

denoted by red line, 95% Conﬁdence Interval denoted by

dashed lines.

To assess the main method’s measurement perfor-

mance, the authors implemented two other strabismus

measurement methods. Both methods use the detec-

tions by the YOLO model as basis.

The ﬁrst method comprises of several image pro-

cessing algorithms applied in sequence to try to es-

timate a circle that would be that would be consid-

ered the limbus. For this method the limbus bounding

boxes obtained by YOLO were used as input, extend-

ing the bounding boxes dimensions by 10%, then con-

verting the image to grayscale according to the equa-

tion:

Y = 0.299 · R + 0.587 · G + 0.114 · B

Where Y is the ﬁnal pixel value in grayscale,

and R, G, and B are the red, green, and blue values

in the RGB scale, respectively. After converting to

grayscale, some image processing techniques are ap-

plied sequentially. A Difference of Gaussian (DoG)

is applied to emphasize the contours of the limbus.

Next, the unsharp masking algorithm, as described in

(Petrou and Bosdogianni, 1999), is used to enhance

the sharpness of the image. The Canny edge detec-

tion algorithm ((Canny, 1986)) is then executed, with

contours smaller than 20 px in perimeter being ex-

cluded. A custom mask is applied to the result of the

edge detection, the mask consisting of applying two

circular binary masks (matrices) to the image and re-

taining only the pixels located between the two circu-

lar masks, thereby excluding unwanted artifacts and

preserving the approximate region where the limbus

is located. The mask follows the formula below:

Cost-Effective Strabismus Measurement with Deep Learning

599

Mask(x,y) = E(x,y) & notI(x, y)

Result = Image & Mask(x, y)

Where E(x,y) is the larger-radius circular mask

(outer), I(x, y) is the smaller-radius circular mask (in-

ner), and Mask(x, y) is the mask to be applied to the

image, leading to only the contours between the two

masks remaining. After this step, the Hough Trans-

form is applied for detecting the limbus, as imple-

mented in (Valente et al., 2017), limiting angles to

between 60° and 120° for the top part of the circle and

240° to 300° for the bottom. This restriction accounts

for cases where the patient’s eyelid or glasses obscure

these regions of the limbus, potentially causing incor-

rect detections. The algorithm returns the most voted

circle in the image, which is then converted into a

square bounding box (height and width equal to the

circle’s radius), with the center corresponding to the

detected circle’s center. Strabismus is then calculated

based on this bounding box, as noted in Section 3.3.1.

It is worth noting that the parameters for the perimeter

ﬁlter, unsharp masking algorithm, and Hough algo-

rithm were chosen using a genetic algorithm (SHADE

algorithm) implemented in the Python optimization

library Mealpy 3.0.1 (Van Thieu and Mirjalili, 2023),

with a objective function of F = 0.5∗MAE +0.5∗SD,

where MAE is the Mean Absolute Error and SD is the

standard deviation, optimized for a subset of 10 pa-

tients taken from the training folder of the ﬁrst train-

ing phase of the YOLO model. The optimization al-

gorithm is based on the article by (Tanabe and Fuku-

naga, 2014). Table 4 shows the results obtained by

this method and Figure 12 shows the main steps of

the method.

Figure 12: Hough Transform-based method for strabismus

calculation.

Table 4: Metrics table for the Hough Method.

METRICS

Position MAE TOTAL MAE H MAE V CORR H CORR V QUANT

PP 11.91 ± 6.77 12.69 ± 7.33 11.14 ± 5.27 0,514 0,216 33

DEXTRO 12.49 ± 9.86 13.58 ± 10.74 11.41 ± 7.45 0,626 0,286 36

LEVO 10.34 ± 7.25 12.23 ± 8.31 8.44 ± 4.76 0,502 0,356 33

SUPRA 12.15 ± 6.87 10.94 ± 6.6 13.37 ± 6.19 0,732 0,512 34

INFRA 13.96 ± 10.13 14.2 ± 9.66 13.71 ± 8.42 0,416 0,462 34

AVERAGE 12.17 ± 8.18 12.73 ± 8.53 11.62 ± 6.42 0,558 0,3664 170

Another method created for comparison consists

of using the bounding boxes provided by YOLO as in-

put and passing them to the Segment Anything Model

(SAM), as described in (Kirillov et al., 2023). In the

referenced article, the model is trained on a variety

of segmentation masks obtained and validated by the

authors, creating the SA-1B dataset with 1 billion seg-

mentation masks across various semantic levels. The

large volume of data improves the network’s gener-

alization, making it applicable to real-world contexts

rather than just benchmarks. In this method, SAM

was used to predict segmentation masks for each lim-

bus Region of Interest (ROI). The method starts with

the ROI detected by YOLO, performing detection

within it. After the SAM model completes its predic-

tion, the bounding box of the largest detected area is

considered to represent the limbus. The calculation of

strabismus then continues using the limbus detected

by SAM. Table 5shows the SAM method’s results.

Table 5: Metrics table for the SAM Method.

METRICS

Position MAE TOTAL MAE H MAE V CORR H CORR V QUANT

PP 88.84 ± 54.99 73.34 ± 18.03 104.34 ± 71.29 0,356 0,228 33

DEXTRO 107.13 ± 83.25 69.71 ± 25.54 144.55 ± 96.33 0,206 0,44 35

LEVO 87.57 ± 35.59 85.37 ± 38.77 89.77 ± 27.29 0,474 0,392 35

SUPRA 74.01 ± 22.43 64.85 ± 18.45 83.16 ± 19.7 0,372 0,49 35

INFRA 73.74 ± 23.64 65.34 ± 15.67 82.15 ± 24.06 0,262 0,51 33

AVERAGE 86.26 ± 43.98 71.72 ± 23.29 100.79 ± 47.74 0,334 0,412 171

It can be observed from the results that the YOLO

method had lower measurement error and higher

correlation on average, specially the SAM method,

were the MAE was higher than 100 ∆. It is also

valid to compare the present work’s main method

with (Almeida, 2015) results for measurement, that

achieved average errors of 5.6∆ for horizontal mea-

surements and 3.83∆ for vertical measurements, al-

though the amount of images that reached the strabis-

mus detection stage was approximately the same, the

percentage of images in our method is higher, with a

rate of 80% of the valid images.

4.3 Case Study

In this section, we provide a more detailed analysis of

the method’s detection and cases of error.

One of the most common causes of high error for

the method is the incorrect detection of the CR. This

occurs due to bright spots in the patients corrective

lenses caused by the camera’s ﬂash. For instance,

in Figure 13, patient 036 in the DEXTRO position

experienced signiﬁcant errors with 308∆ in horizon-

tal measurements and 30∆ in vertical measurements.

These errors resulted from misidentifying the CR in

the patient’s glasses. Similarly, Figure 14 shows pa-

tient 003, who had the highest detection error in the

entire dataset, with 482∆ in horizontal strabismus and

1104∆ in vertical strabismus. This high error occurred

because the method wrongly detected the CR in the

patient’s hair.

Another cause of high error for the method detec-

tion is the limbus being laterally occluded, most often

in the DEXTRO and LEVO positions, but not exclu-

sively. Figure 15 shows a case of high error where the

patient’s right eye is laterally occluded, what led to er-

ICEIS 2025 - 27th International Conference on Enterprise Information Systems

600

Figure 13: Patient 036 with wrong bright spot detection de-

noted by the blue arrow.

Figure 14: Patient 003 in INFRA position, where the blue

arrow denotes where the model detected the CR.

ror of 60∆ in horizontal measurements and 1∆ in ver-

tical measurements. Figure 16 shows patient 025 in

the LEVO position and a lateral occlusion of the lim-

bus; although the network precisely located the CR in

the image, it did not detect precisely the limbus, what

led to an error of 19 ∆ in horizontal measurements and

11∆ in vertical measurements.

Figure 15: Patient 009 in the LEVO position, with the lim-

bus laterally occluded.

The partial vertical occlusion of the limbus by the

eyelids also contributed to some error cases. Fig-

ure 17 illustrates such case., where patient 013 in the

LEVO position had an error of 43∆ in horizontal mea-

surements and 11∆ in vertical measurements. Figure

18 showcases patient 040 in the DEXTRO position,

eyelids vertically occluding the limbus with 26 ∆ of

error in horizontal measurements and 2∆ of error in

vertical measurements. Such errors occur most often

in the SUPRA and INFRA positions, but not exclu-

sively.

4.4 Discussion

In this work, we conducted tests to evaluate the

CNN’s capability to detect custom objects, specif-

ically the limbus and CR, and to measure strabis-

mus based on these detections. Both training phases

achieved signiﬁcant results, particularly the second

phase, which utilized 5-fold cross-validation and

Figure 16: Patient 025 in the LEVO position, with the lim-

bus laterally occluded.

Figure 17: Patient 013 in the INFRA position, with the lim-

bus vertically occluded.

Figure 18: Patient 040 in the DEXTRO position, with the

limbus vertically occluded.

Cost-Effective Strabismus Measurement with Deep Learning

601

achieved an average F1-Score of 0.961. These results

highlight the potential for precise object detection.

When we compare our method to related works,

we observe signiﬁcant improvements. In (Almeida,

2015) and (de Almeida et al., 2012), researchers re-

port that their methods are computationally expensive

due to the application of several image processing al-

gorithms. They also struggle with inaccuracies, such

as a doctor’s ﬁnger inadvertently appearing on the

patient’s face or other artifacts in the surroundings,

complicating the precise location of the face or eyes.

These issues make their methodologies less suitable

for real-world scenarios despite reporting average er-

rors of 5.6∆ for horizontal measurements and 3.83∆

for vertical measurements. In contrast, our proposed

method demonstrates excellent resistance to environ-

mental changes, blurriness, and other artifacts on the

patient’s face, as shown in Section 4.2.

Alternatively, (Dericio

glu and C¸ erman, 2019)

aimed at using a mobile application to increase mea-

surement precision by delimiting the limbus and CR

with the help of a user interface. Such a method is sus-

ceptible to human error, along with being consider-

ably slower than the present method, requiring a man-

ual delimitation of the CR and the limbus, whilst our

method requires about 200ms to 500ms for each half

of the image to be processed by the YOLOv8 model.

(Cheng et al., 2021) used undisclosed image pro-

cessing algorithms to detect the limbus and the CR,

but such algorithms suffer from the same problems

that the work of (Almeida, 2015) suffers, that is, in-

ferior speed and much higher parametrization when

compared to a CNN-based approach.

The method implemented by (S¸

ukr

u Karaaslan

et al., 2023) utilizes the Mediapipe model, as ex-

plained in Section 2, to detect the pupil rather than

the limbus, aiming for more precise detection. How-

ever, detecting the pupil proves challenging due to

its small size, which limits the method’s applicabil-

ity in real-world scenarios where patients may be at

varying distances from the camera. In contrast, the

current work remains robust since it effectively ac-

commodates non-standardized imaging distances, al-

lowing the model to perform well under these condi-

tions. Figures 19 and 20 illustrate challenging mea-

surements where the model achieves low error rates.

The tests demonstrated the model’s potential for

real-world application of strabismus measurement,

given the high agreement between the method’s and

strabologist’s measurements (see Figure 11 and Table

3). Despite the promising results, the model did not

perform well in certain adverse situations, as noted in

Section 4.3. The CR was the main obstacle to better

results, usually due to unwanted bright spots in the

Figure 19: Patient 033 in the INFRA position with 6∆ and

3∆ of error in horizontal and vertical measurements, respec-

tively. Vertically occluded limbus did not affect the ﬁnal

measurement.

Figure 20: Patient 020 in the LEVO position with 0∆ and

5∆ of error in horizontal and vertical measurements, respec-

tively. The bright spots in the patient’s corrective lenses did

not affect the ﬁnal detection.

corrective lenses of the patients or wrong detection of

the CR in distant locations of the image. Overall, re-

sults show a good potential for using the method in

real-world scenarios.

5 CONCLUSION

This work tackles the challenges of automated stra-

bismus measurement using the Hirschberg Test by

employing a cost-effective methodology, mainly de-

signed for resource-constrained environments and uti-

lizing deep learning techniques.

Through statistical analysis, we validated our

CNN-based strabismus measurement technique and

established meaningful metrics, including Mean Ab-

solute Error, Standard Deviation, and Pearson’s corre-

lation coefﬁcient. Our results show that this method-

ology could serve as a preliminary tool for measuring

strabismus in resource-limited settings, such as rural

clinics or facilities lacking specialized equipment.

When analyzing the ﬁltered results, the method

demonstrated error rates below the accepted threshold

of 10∆ for specialist measurements in the Hirschberg

ICEIS 2025 - 27th International Conference on Enterprise Information Systems

602

Test. However, it faced challenges with certain

blurred images and patients wearing corrective lenses.

These issues are expected, as the CNN identiﬁes the

most likely locations for the CR point, and the ﬁl-

tering process may not always correct inaccurate de-

tections. Additionally, there were difﬁculties related

to horizontally or vertically occluded limbus, which

are common error cases. The CNN struggles to ac-

curately estimate the location and size of the limbus

bounding box when it lacks relevant information in

the image. Nevertheless, the method is expected to

yield lower error rates for these situations. Such limi-

tations could be addressed by utilizing a larger dataset

and training for more epochs.

A possible use case for this method involves cap-

turing patients’ photos with a camera’s ﬂash from dif-

ferent positions and then analyzing the images with

the model for preliminary strabismus measurement.

If the detection exceeds a threshold, the model could

prompt the patient to see an expert for in-person mea-

surements. This technique could help prevent many

cases of early strabismus, amblyopia, and other vi-

sion problems related to strabismus by providing oph-

thalmologists with a preliminary screening tool that

doesn’t require specialized equipment.

In future work, the bounding box ﬁltering could

be augmented since it is a heuristic that does not al-

ways produce precise results. This is evident from

some detections of CR that are very distant (Euclidean

distance > 200px) in the image. A possible aug-

mentation for the bounding box ﬁltering would be to

somehow segment the patient’s face and only consider

bounding boxes within that area.

Additionally, the training process could be opti-

mized by training for more epochs, ﬁne-tuning hy-

perparameters (Tuba et al., 2021), or using a larger

dataset, which could enhance the CNN’s generaliza-

tion and object detection performance. Moreover,

case studies similar to (Cheng et al., 2021) could

be undertaken to evaluate the method’s effectiveness

and performance in clinics, particularly in challeng-

ing cases with real personnel.

ACKNOWLEDGEMENTS

The authors acknowledge the Coordenac¸

ao de

Aperfeic¸oamento de Pessoal de N

ıvel Superior

(CAPES), Brazil - Finance Code 001, Conselho

Nacional de Desenvolvimento Cient

ıﬁco e Tec-

nol

ogico (CNPq), Brazil, and Fundac¸

ao de Am-

paro

a Pesquisa Desenvolvimento Cient

ıﬁco e Tec-

nol

ogico do Maranh

ao (FAPEMA) (Brazil), Empresa

Brasileira de Servic¸os Hospitalares (Ebserh) Brazil

(Grant number 409593/2021-4) for the ﬁnancial sup-

port.

REFERENCES

Almeida, J.D. S., S. A. T. J. (2015). Computer-aided

methodology for syndromic strabismus diagnosis.

Journal of Digital Imaging, 28:462—-473.

Bazarevsky, V., Kartynnik, Y., Vakunov, A., Raveendran,

K., and Grundmann, M. (2019). Blazeface: Sub-

millisecond neural face detection on mobile gpus.

Buffenn, A. N. (2021). The impact of strabismus on psy-

chosocial heath and quality of life: a systematic re-

view. Survey of Ophthalmology, 66(6):1051–1064.

Canny, J. (1986). A computational approach to edge de-

tection. IEEE Transactions on pattern analysis and

machine intelligence, (6):679–698.

Cheng, W., Lynn, M. H., Pundlik, S., Almeida, C., Luo, G.,

and Houston, K. (2021). A smartphone ocular align-

ment measurement app in school screening for stra-

bismus. BMC ophthalmology, 21:1–10.

Choi, R. Y. and Kushner, B. J. (1998). The accuracy of

experienced strabismologists using the hirschberg and

krimsky tests. Ophthalmology, 105 7:1301–6.

de Almeida, J. D. S., Silva, A. C., de Paiva, A. C., and

Teixeira, J. A. M. (2012). Computational methodol-

ogy for automatic detection of strabismus in digital

images through hirschberg test. Computers in biology

and medicine, 42(1):135–146.

Dericio

glu, V. and C¸ erman, E. (2019). Quantitative mea-

surement of horizontal strabismus with digital photog-

raphy. Journal of American Association for Pediatric

Ophthalmology and Strabismus, 23(1):18.e1–18.e6.

Durajczyk, M., Grudzi

nska, E., and Modrzejewska, M.

(2023). Present knowledge of modern technology and

virtual computer reality to assess the angle of strabis-

mus. Klinika Oczna / Acta Ophthalmologica Polonica,

125(1):13–16.

Google (2023). Google colaboratory. https://colab.research.

google.com/. Accessed: 2024-12-30.

Hashemi, H., Pakzad, R., Heydarian, S., Yekta, A.,

Aghamirsalim, M., Shokrollahzadeh, F., Khoshhal, F.,

Pakbin, M., Ramin, S., and Khabazkhoob, M. (2019).

Global and regional prevalence of strabismus: a com-

prehensive systematic review and meta-analysis. Stra-

bismus, 27:54 – 65.

Jocher, G., Chaurasia, A., and Qiu, J. (2023). Ultralytics

yolov8. https://github.com/ultralytics/ultralytics.

Khng, C. and Osher, R. H. (2008). Evaluation of the rela-

tionship between corneal diameter and lens diameter.

Journal of cataract and refractive surgery, 34.

Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C.,

Gustafson, L., Xiao, T., Whitehead, S., Berg, A. C.,

Lo, W.-Y., Doll

ar, P., and Girshick, R. (2023). Seg-

ment anything.

Li, Z., Liu, F., Yang, W., Peng, S., and Zhou, J. (2022).

A survey of convolutional neural networks: Anal-

ysis, applications, and prospects. IEEE Transac-

Cost-Effective Strabismus Measurement with Deep Learning

603

tions on Neural Networks and Learning Systems,

33(12):6999–7019.

Lin, T., Maire, M., Belongie, S. J., Bourdev, L. D., Girshick,

R. B., Hays, J., Perona, P., Ramanan, D., Doll

ar, P.,

and Zitnick, C. L. (2014). Microsoft COCO: common

objects in context. CoRR, abs/1405.0312.

Miao, Y., Jeon, J. Y., Park, G., Park, S. W., and Heo, H.

(2020). Virtual reality-based measurement of ocular

deviation in strabismus. Computer Methods and Pro-

grams in Biomedicine, 185:105132.

O’Mahony, N., Campbell, S., Carvalho, A., Harapanahalli,

S., Hernandez, G. V., Krpalkova, L., Riordan, D.,

and Walsh, J. (2020). Deep learning vs. traditional

computer vision. In Advances in Computer Vision:

Proceedings of the 2019 Computer Vision Conference

(CVC), Volume 1 1, pages 128–144. Springer.

Petrou, M. and Bosdogianni, P. (1999). Image Processing:

The Fundamentals. John Wiley & Sons, Inc., USA,

1st edition.

Pundlik, S., Tomasi, M., Liu, R., Houston, K., and Luo, G.

(2019). Development and preliminary evaluation of a

smartphone app for measuring eye alignment. Trans-

lational Vision Science & Technology, 8(1):19–19.

Redmon, J., Divvala, S. K., Girshick, R. B., and Farhadi, A.

(2015). You only look once: Uniﬁed, real-time object

detection. CoRR, abs/1506.02640.

Refaeilzadeh, P., Tang, L., and Liu, H. (2009). Cross-

Validation, pages 532–538. Springer US, Boston,

MA.

Schwartz, G. (2006). The Eye Exam: A Complete Guide.

SLACK.

Tan, M. and Le, Q. V. (2019). Efﬁcientnet: Rethink-

ing model scaling for convolutional neural networks.

CoRR, abs/1905.11946.

Tanabe, R. and Fukunaga, A. S. (2014). Improving the

search performance of shade using linear population

size reduction. In 2014 IEEE Congress on Evolution-

ary Computation (CEC), pages 1658–1665.

Tuba, E., Bacanin, N., Strumberger, I., and Tuba, M.

(2021). Convolutional Neural Networks Hyperparam-

eters Tuning, pages 65–84.

Valente, T. L. A., de Almeida, J. D. S., Silva, A. C., Teixeira,

J. A. M., and Gattass, M. (2017). Automatic diagno-

sis of strabismus in digital videos through cover test.

Computer Methods and Programs in Biomedicine,

140:295–305.

Van Thieu, N. and Mirjalili, S. (2023). Mealpy: An open-

source library for latest meta-heuristic algorithms in

python. Journal of Systems Architecture.

Varghese, R. and M., S. (2024). Yolov8: A novel object

detection algorithm with enhanced performance and

robustness. In 2024 International Conference on Ad-

vances in Data Engineering and Intelligent Comput-

ing Systems (ADICS), pages 1–6.

Zhuang, F., Qi, Z., Duan, K., Xi, D., Zhu, Y., Zhu, H.,

Xiong, H., and He, Q. (2019). A comprehensive sur-

vey on transfer learning. CoRR, abs/1911.02685.

S¸

ukr

u Karaaslan, Kobat, S. G., and Gedikpınar, M. (2023).

A new method based on deep learning and image pro-

cessing for detection of strabismus with the hirschberg

test. Photodiagnosis and Photodynamic Therapy,

44:103805.

ICEIS 2025 - 27th International Conference on Enterprise Information Systems

604