and classes without multiple sequential steps.
We choose YOLO for its speed and lower likeli-
hood of identifying false positives in the image back-
ground compared to alternative models ((Horak and
Sablatnig, 2019; Jiang et al., 2022)). These qualities
make YOLO one of the most effective convolutional
neural network models for object detection.
The YOLO network consists of a backbone, re-
sponsible for collecting and organizing image fea-
tures, and a Head, which utilizes these features for
box and class prediction. Between the backbone and
the head lies the neck, which integrates image features
before forwarding them for prediction. In this paper,
we experiment with the YOLOv8s model, the smaller
version of YOLO 8 ((Yan et al., 2022)).
3 THE EXPERIMENT
In this section, we showcase the outcomes of our
experimental investigation, which was conducted to
demonstrate the efficacy of employing the YOLO 8
model for detecting and localizing Egyptian Hiero-
glyphs within images.
We compiled images from the COTA COCO anks
Image Dataset, a collection specifically curated for
constructing models geared towards Egyptian Hiero-
glyph detection from images. This dataset is openly
accessible for research endeavors
1
.
The utilized dataset comprises 1729 distinct real-
world images featuring Egyptian Hieroglyphs. Each
image is annotated with a single label, specifically
”ank,” along with its corresponding localization, rep-
resented by a bounding box indicating the position of
the Egyptian Hieroglyph within the image. To ensure
the reproducibility of our results, we opted to utilize a
publicly available dataset.
The images of Egyptian Hieroglyphs are stored in
JPEG format with a resolution of 640 x 640 pixels.
We divided the images as follows: 1230 images for
training, 325 for validation, and the remaining 174
for the test set, representing an approximate split per-
centage of 70:20:10, respectively. The dataset we ac-
quired comes with annotations, meaning that each im-
age includes detailed information about the bounding
box surrounding each Egyptian Hieroglyph. Image
augmentation was performed by generating additional
images for each original image through horizontal and
vertical flips. After implementing data augmentation,
we obtained the final dataset. For model training,
we selected a batch size of 16 and set the number
1
https://universe.roboflow.com/matthew-custer-bclqa
/cota coco anks/dataset/3
of epochs to 10, with an initial learning rate of 0.01.
For model training and testing, we utilized a machine
equipped with an NVIDIA Tesla T4 GPU card boast-
ing 16 GB of memory.
The experimental results obtained from our pro-
posed method are visualized in Figures 2 through var-
ious plots.
Figures 2 present the experimental findings de-
rived from the proposed method, showcased through
multiple plots. In the top row of plots in Figure 2,
the following metrics are depicted: ”train/box loss”
(representing the trend of box loss during training,
measuring the fidelity of predicted bounding boxes
to the ground truth object), ”train/obj loss” (illus-
trating the trend of obj loss during training, where
objectness determines the presence of an object at
an anchor), ”train/cls loss” (displaying the trend of
cls loss during training, which gauges the accuracy
of object classification within each predicted bound-
ing box, with each box potentially containing an ob-
ject class or ”background” – commonly referred to as
cross-entropy loss), precision trend, and recall trend.
In the bottom row of plots in Figure 2, the following
metrics are showcased: ”val/box
loss” (depicting the
trend of box loss in validation), ”val/obj loss” (illus-
trating the trend of obj loss in validation), mean Av-
erage Precision when Intersection over Union is equal
to 0.5 (mAP 0.5), and mean Average Precision when
Intersection over Union ranges between 0.5 and 0.95
(mAP 0.5:0.95).
All the metrics demonstrate the anticipated pat-
terns: precision, recall, mAP 0.5, and mAP 0.5:0.95
are expected to increase as the number of epochs pro-
gresses, indicative of the model effectively learning
to detect objects from Egyptian hieroglyphs images.
Conversely, the other metrics display a decreasing
trend as the number of epochs increases, providing
further evidence that the model is effectively learn-
ing from Egyptian hieroglyph images. Specifically,
the loss metrics generally signify instances where the
model misidentifies a particular object, hence the loss
values typically start high in the initial epochs and
gradually decrease as the model learns to accurately
detect objects of interest.
In Figures 2, the plots for metrics/mAP 0.5 and
metrics/mAP 0.5:0.95 illustrate the mAP value for
IOU=50 and IOU ranging from 50 to 95 (signifying
different IoU thresholds from 0.5 to 0.95, with a step
size of 0.05) on average mAP.
It is noteworthy that both the metrics/mAP 0.5
and metrics/mAP 0.5:0.95 plots in Figure 2 exhibit
an increasing trend. This observation indicates that
the model is effectively learning the spatial locations
within images to accurately identify the objects of in-
ICSOFT 2024 - 19th International Conference on Software Technologies
436