Egyptian Hieroglyphs Localisation Through Object Detection
Lucia Lombardi, Francesco Mercaldo and Antonella Santone
Department of Medicine and Health Sciences “Vincenzo Tiberio”, University of Molise, Campobasso, Italy
Keywords:
Archeology, Hieroglyphs, Deep Learning, Object Detection, YOLO.
Abstract:
Old Egyptians used Hieroglyphic language to record their findings in medicine, engineering, sciences, achieve-
ments, their religious views, beside facts from their daily life. Thus, it is fundamentally important to under-
stand and digitally store these scripts for anyone who wants to understand the Egyptian history and learn more
about this great civilization. The interpretation of Egyptian hieroglyphs is areasonably broad and highly com-
plex problem, but have always been fascinating with their stories and the ability to be read in several ways
rather than one, which is a challenge in itself to be translated to modern languages. In this paper, we adopt
the YOLO 8 model which revolutionized object detection with its one-stage deep learning approach. YOLO
is designed to classify images and accurately determine the positions of detected objects within them. Using
this DL approach, we were able to significantly reduce the time required to investigate the interpretation of
hieroglyphs. To ensure the reproducibility of our results, we opted to utilize a publicly available dataset. All
the metrics demonstrate the anticipated patterns: precision, recall, mAP 0.5, and mAP 0.5:0.95 are expected
to increase as the number of epochs progresses, indicative of the model effectively learning to detect objects
from Egyptian hieroglyphs images.
1 INTRODUCTION
Recent advances in automated sensing enabled by the
proliferation of drones, robotics and Light Detection
and Ranging (LiDAR), Big Data, and artificial in-
telligence may fuel the next wave of archaeological
discovery (Zhou et al., 2023; Mercaldo et al., 2022;
Huang et al., 2024). Despite the concerns related to
its use (Barucci and Neri, 2020), there is in AI the
amazing power and the perceived hope not only to
automate human tasks, but also to improve human
understanding. Fields such as archaeology, philology
and human sciences are now beginning to be perme-
ated from AI, even though its actual role has still to
be fully understood. The interpretation of Egyptian
hieroglyphs is a reasonably broad and highly com-
plex problem. Hieroglyphs are usually found in dif-
ferent materials, such as stone, ceramics, wood, and
papyrus, among others. Visual identification will be
significantly different depending on the material used
to elaborate them. This problem is because each ma-
terial provides a distinct colors and textures. On the
other hand, many hieroglyphs can change their hori-
zontal orientation to show a different meaning in what
they want to express. This work focuses on this spe-
cific aspect, integrating the Egyptological perspective
with the application of the latest information tech-
nologies. The obtained results explore the data pro-
cessing methodology and the creation of a functional
and usable AI model for Egyptologists. The results
represent a contribution to the research on automatic
translation of ancient Egyptian. The AI model is able
to generate reasonably accurate translations and can
be used to facilitate the interpretation of ancient Egyp-
tian texts. However, it also shows that the research
that can be done in this direction is vast and requires
further study.
2 THE METHOD
This section introduces the proposed method de-
signed to identify and pinpoint Egyptian hieroglyphs
within images. Specifically, we present an ap-
proach intended to autonomously detect Egyptian hi-
eroglyphs directly from images, such as those cap-
tured by robots. Additionally, our method can pre-
cisely locate these hieroglyphs within the image and
provide a prediction percentage for each detected hi-
eroglyphs.
Figure 1 depicts the proposed method.
To develop a proficient deep learning model for
434
Lombardi, L., Mercaldo, F. and Santone, A.
Egyptian Hieroglyphs Localisation Through Object Detection.
DOI: 10.5220/0012787900003753
Paper published under CC license (CC BY-NC-ND 4.0)
In Proceedings of the 19th International Conference on Software Technologies (ICSOFT 2024), pages 434-441
ISBN: 978-989-758-706-1; ISSN: 2184-2833
Proceedings Copyright © 2024 by SCITEPRESS Science and Technology Publications, Lda.
Figure 1: The proposed method.
detecting Egyptian hieroglyphs in images, it’s crucial
to possess a dataset containing images sourced from
expert archeologists as well as robots or drones (refer
to ”archeologist” in Figure 1). Robots or drones are
especially valuable for capturing images in remote or
difficult-to-access locations. In order to construct a
model capable of not only identifying Egyptian hi-
eroglyphs within an image but also accurately locat-
ing them, a dataset containing images with detailed
hieroglyph positions is indispensable. These images
are meticulously labeled and annotated by domain ex-
perts (i.e., ”archeologist” in Figure 1) to mark the ar-
eas in the images where the Egyptian hieroglyphs are
present (referred to as ”label and bounding boxes” in
Figure 1). The detection class for the bounding box is
singular, focusing solely on the ankh symbol (indicat-
ing the proposed model’s emphasis on detecting frag-
ments of the ankh Egyptian hieroglyph). The choice
to prioritize the detection of this specific hieroglyph is
due to its historical significance as a widely used dec-
orative motif in ancient Egypt and neighboring cul-
tures, symbolizing ”life” itself.
Furthermore, to ensure the development of an ef-
fective model capable of accurately predicting unseen
images, a diverse set of images captured from vari-
ous angles and under different conditions is neces-
sary. While these images may vary in size initially,
a preprocessing step is required to standardize their
dimensions.
The subsequent phase involves augmenting the
dataset, as depicted in Figure 1 (”image augmenta-
tion”). This augmentation process utilizes a range of
techniques to expand the dataset without the need for
additional data collection. By introducing controlled
random alterations to existing images, such as flips,
augmented duplicates are generated. This technique
enhances the precision of artificial neural networks
during the training process by exposing them to a
broader range of data.
Specifically, data augmentation techniques are
employed to create images with controlled random
changes, such as flips (see (Shorten and Khosh-
goftaar, 2019)). This approach aims to ensure the
model’s effectiveness in recognizing Egyptian hiero-
glyphs regardless of their position within the image.
Additionally, augmented data helps mitigate the risk
of overfitting by preventing the model from becom-
ing overly tailored to specific instances encountered
during training.
Once the (augmented) images, along with associ-
ated details regarding the hieroglyph class and bound-
ing boxes, are obtained, the next step is to implement
a deep learning model (referred to as the ”Object De-
tection model” in Figure 1).
In this paper, we adopt the YOLO 8 model ((Red-
mon et al., 2016)), which revolutionized object de-
tection with its one-stage deep learning approach.
YOLO is designed to classify images and accurately
determine the positions of detected objects within
them.
The key feature distinguishing YOLO from other
models is its ability to perform the entire detection
process in a single step. The YOLO process involves
inputting an image and producing an output consist-
ing of two components: a bounding box vector asso-
ciated with the predicted class of the detected object
for each cell in a grid representing the image.
Each image is divided into a grid of cells, and a
cell is responsible for an object if the object’s center
falls within it. The bounding box prediction includes
five components: (x, y, w, h, confidence), where (x, y)
represent the box’s center normalized to the cell’s po-
sition, and (w, h) represent the box’s dimensions nor-
malized to the image size. Consequently, the predic-
tions consist of S x S x B * 5 outputs for the bounding
boxes ((Jiang et al., 2022)).
Compared to existing models, YOLO offers sig-
nificantly faster performance ((Sanchez et al., 2020;
Sah et al., 2017)), thanks to its single-phase approach,
which predicts bounding boxes, object probabilities,
Egyptian Hieroglyphs Localisation Through Object Detection
435
and classes without multiple sequential steps.
We choose YOLO for its speed and lower likeli-
hood of identifying false positives in the image back-
ground compared to alternative models ((Horak and
Sablatnig, 2019; Jiang et al., 2022)). These qualities
make YOLO one of the most effective convolutional
neural network models for object detection.
The YOLO network consists of a backbone, re-
sponsible for collecting and organizing image fea-
tures, and a Head, which utilizes these features for
box and class prediction. Between the backbone and
the head lies the neck, which integrates image features
before forwarding them for prediction. In this paper,
we experiment with the YOLOv8s model, the smaller
version of YOLO 8 ((Yan et al., 2022)).
3 THE EXPERIMENT
In this section, we showcase the outcomes of our
experimental investigation, which was conducted to
demonstrate the efficacy of employing the YOLO 8
model for detecting and localizing Egyptian Hiero-
glyphs within images.
We compiled images from the COTA COCO anks
Image Dataset, a collection specifically curated for
constructing models geared towards Egyptian Hiero-
glyph detection from images. This dataset is openly
accessible for research endeavors
1
.
The utilized dataset comprises 1729 distinct real-
world images featuring Egyptian Hieroglyphs. Each
image is annotated with a single label, specifically
”ank,” along with its corresponding localization, rep-
resented by a bounding box indicating the position of
the Egyptian Hieroglyph within the image. To ensure
the reproducibility of our results, we opted to utilize a
publicly available dataset.
The images of Egyptian Hieroglyphs are stored in
JPEG format with a resolution of 640 x 640 pixels.
We divided the images as follows: 1230 images for
training, 325 for validation, and the remaining 174
for the test set, representing an approximate split per-
centage of 70:20:10, respectively. The dataset we ac-
quired comes with annotations, meaning that each im-
age includes detailed information about the bounding
box surrounding each Egyptian Hieroglyph. Image
augmentation was performed by generating additional
images for each original image through horizontal and
vertical flips. After implementing data augmentation,
we obtained the final dataset. For model training,
we selected a batch size of 16 and set the number
1
https://universe.roboflow.com/matthew-custer-bclqa
/cota coco anks/dataset/3
of epochs to 10, with an initial learning rate of 0.01.
For model training and testing, we utilized a machine
equipped with an NVIDIA Tesla T4 GPU card boast-
ing 16 GB of memory.
The experimental results obtained from our pro-
posed method are visualized in Figures 2 through var-
ious plots.
Figures 2 present the experimental findings de-
rived from the proposed method, showcased through
multiple plots. In the top row of plots in Figure 2,
the following metrics are depicted: ”train/box loss”
(representing the trend of box loss during training,
measuring the fidelity of predicted bounding boxes
to the ground truth object), ”train/obj loss” (illus-
trating the trend of obj loss during training, where
objectness determines the presence of an object at
an anchor), ”train/cls loss” (displaying the trend of
cls loss during training, which gauges the accuracy
of object classification within each predicted bound-
ing box, with each box potentially containing an ob-
ject class or ”background” – commonly referred to as
cross-entropy loss), precision trend, and recall trend.
In the bottom row of plots in Figure 2, the following
metrics are showcased: ”val/box
loss” (depicting the
trend of box loss in validation), ”val/obj loss” (illus-
trating the trend of obj loss in validation), mean Av-
erage Precision when Intersection over Union is equal
to 0.5 (mAP 0.5), and mean Average Precision when
Intersection over Union ranges between 0.5 and 0.95
(mAP 0.5:0.95).
All the metrics demonstrate the anticipated pat-
terns: precision, recall, mAP 0.5, and mAP 0.5:0.95
are expected to increase as the number of epochs pro-
gresses, indicative of the model effectively learning
to detect objects from Egyptian hieroglyphs images.
Conversely, the other metrics display a decreasing
trend as the number of epochs increases, providing
further evidence that the model is effectively learn-
ing from Egyptian hieroglyph images. Specifically,
the loss metrics generally signify instances where the
model misidentifies a particular object, hence the loss
values typically start high in the initial epochs and
gradually decrease as the model learns to accurately
detect objects of interest.
In Figures 2, the plots for metrics/mAP 0.5 and
metrics/mAP 0.5:0.95 illustrate the mAP value for
IOU=50 and IOU ranging from 50 to 95 (signifying
different IoU thresholds from 0.5 to 0.95, with a step
size of 0.05) on average mAP.
It is noteworthy that both the metrics/mAP 0.5
and metrics/mAP 0.5:0.95 plots in Figure 2 exhibit
an increasing trend. This observation indicates that
the model is effectively learning the spatial locations
within images to accurately identify the objects of in-
ICSOFT 2024 - 19th International Conference on Software Technologies
436
Figure 2: Experimental analysis results.
Figure 3: The Precision-Recall graph.
terest, namely humans and dogs.
Table 1 presents the results for Precision, Recall,
mAP 0.5, and mAP 0.5:0.95 metrics, provided indi-
vidually for the ank Egyptian hieroglyph class.
Table 1: Classification results.
Precision Recall mAP 0.5 mAP 0.5:0.95
0.87074 0.73876 0.83606 0.61249
Observing Table 1, we observe that the Precision
and Recall values are 0.87 and 0.738, respectively for
the ank Egyptian hieroglyph class.
Additionally, for a more comprehensive assess-
ment of the proposed method, precision and recall
values are visualized on the Precision-Recall graph in
Figure 3.
The expected trend of this plot is a monotonically
decreasing one, as there is typically a trade-off be-
tween precision and recall: increasing one usually re-
sults in a decrease in the other. While there can be ex-
ceptions or data limitations that prevent the precision-
recall graph from strictly following this trend, the plot
in Figure 3 demonstrates a decreasing trend for the
relevant labels.
Moreover, the precision-recall plot displays the
Area Under the Curve (AUC) values associated with
the analysed class (i.e., ank Egyptian hieroglyph),
along with the identification the ank class with
mmAP 0.5. As previously mentioned, the precision-
recall trend is anticipated to be monotonically de-
creasing. This behavior is evident from the precision-
recall plot concerning all classes with mAP 0.5,
yielding an AUC of 0.836. Given that these metrics
range from 0 to 1, these values serve as indicative ev-
idence that the proposed model is capable of effec-
tively detecting ank Egyptian hieroglyphs from im-
ages.
To visually verify the proposed method and affirm
its efficacy in real-world scenarios, we present exam-
ples of images with manually performed annotations
(depicted in Figure 4) and the same images with de-
tections and bounding boxes (illustrated in Figure 5).
This enables a direct comparison between the man-
ual annotations and those generated by the proposed
model.
As depicted in Figure 4, we incorporate images
captured from various angles and with subjects po-
sitioned at varying distances. This approach aims to
maximize the model’s generalizability. Notably, the
proposed model demonstrates the capability to detect
more ank Egyptian hieroglyphs within the same im-
ages, unaffected by background color. In Figure 4,
each image includes detailed bounding box annota-
tions for the classes involved in the experiment (high-
lighted in red) along with their respective label i.e.,
ank.
Egyptian Hieroglyphs Localisation Through Object Detection
437
Figure 4: Example of images with the related bounding box around the ank Egyptian hieroglyphs, manually added for model
building.
Figure 5 displays the predictions and bounding
boxes generated by the proposed model during the
testing phase. It is important to note that during test-
ing, the images are inputted to the model without any
pre-existing bounding boxes.
The observations from Figure 5 reveal that the
proposed model demonstrates proficient localization
of areas containing anks. In fact, for the majority
of the images, the bounding boxes closely resemble
those depicted in Figure 4.
As can be observed from the images displaying
both ank manual annotations and ank annotations pre-
dicted by the developed model (i.e., Figure 4 and Fig-
ure 5), the background does not appear to be a dis-
rupting factor for the model. In fact, the ankhs are
correctly identified even when present alongside other
hieroglyphs on materials of the same color. Moreover,
even under low-light conditions and varying angles,
the ankhs are accurately identified. The size of the
ankh also does not pose an issue; as seen from the ex-
amples, regardless of size, the hieroglyph is correctly
identified and notably not confused with other hiero-
glyphs typically present in the images.
ICSOFT 2024 - 19th International Conference on Software Technologies
438
Figure 5: Example of ank predictions automatically performed by the proposed method.
4 RELATED WORK
The problem of ancient Egyptian language retrieval
and classification has been addressed, with different
purposes, in several works and several examples of
applications of the new technologies to the classifi-
cation of ideograms belonging to ancientor no more
used languages can be found in the literature. In this
section, we review the current state-of-the-art litera-
ture about the application of deep learning models on
the DR detection. Below we discuss these papers.
Authors in (Barucci et al., 2021), used ResNet-
50 to develop a classification method to classify the
glyphs. They didn’t find the sufficient dataset to train
their approach, so they used a different dataset then
they used transfer learning. They also implemented a
novel architecture called Glyphnet and trained it on a
small hieroglyphic dataset which is designed for the
specific task of hieroglyph classification and trained
the network on it. The result showed that Glyphnet
achieved an accuracy rate of 96% which is the high-
est accuracy found literature. But the data used in
their work is unfortunately not available for other re-
searchers to validate the results.
Egyptian Hieroglyphs Localisation Through Object Detection
439
Researchers in (Moustafa et al., 2022), show Deep
learning techniques, such as EfficientNet, MobileNet,
and ShuffleNet. This study has been tested on two
hieroglyph datasets. This paper describes a flutter-
based mobile application named Scriba. This appli-
cation provides as an advantage an exact translation
of hieroglyphs.
Authors in (Hamdany et al., 2021),presents a
novel method in 2021 for translating cuneiform writ-
ing by utilizing ANN. For all intents and purposes,
a Multi-Layer Perceptron (MLP) neural network
has been adapted to translate images of Sumerian
cuneiform symbols into corresponding English let-
ters. The process involves utilizing an artificial intelli-
gence technique, the MLP neural network, to translate
images of Sumerian cuneiform symbols into English
letters. The central concept behind the method that
authors have suggested is to acquire an image of any
cuneiform symbol and then generate an indicator for
the corresponding English letter. Considering this, it
is possible for letters of the English alphabet to be in-
telligently generated from cuneiform symbol images.
The proposed method uses neural networks, but it can
also be represented using other information technol-
ogy and artificial intelligence models. This work has
been successfully established, and it achieved a score
of %100; however, it faced challenges such as the in-
ability to recognize chipped circuit boards, lack of
public access to the dataset, difficulty in determin-
ing network structure, and absence of pre-processing.
Consequently, the proposed network is unable to deal
with the variations that can be found in images, such
as rotation, noise, and so on.
In 2023, authors in (Mohsen et al., 2023), devel-
oped the idea of of Aegyptos: Mobile Application
for Hieroglyphs Detection, Translation and Pronun-
ciation, was brought up to help learn how to read
Hieroglyphs and also help to pronounce them by us-
ing a tool at the palm of users’ hands with just their
phones’ live cameras. Performing segmentation on
the symbols using different segmentation techniques
like Otsu Thresholding, while a lightweight CNN
known as SqueezeNet is used for classification with
the help of an API to translate the script into a lan-
guage understood by the user.
5 CONCLUSION AND FUTURE
WORK
In this work, we have explored the capability of deep
learning techniques to face the problem of ancient
Egyptian hieroglyphs classification and analysis of re-
sults was conducted to determine the champion mod-
els and the best data settings. Our approach allows
DL to learn representations of images with better gen-
eralization performance, enabling the discovery of
targets that have been difficult to find in the past.
Moreover, by accelerating the research process, our
method contributes to archaeology by establishing a
new paradigm that combines field surveys and AI,
leading to more efficient and effective investigations.
Even though in this paper we focused on the single
hieroglyph classification task, new and profitable per-
spectives are opened by the application of deep learn-
ing techniques in the Egyptologic field. In this view,
the proposed work can be seen as the starting point
for the implementation of much more complex goals.
In future work, the incorporation of other hieroglyphs
is proposed to expand the algorithm’s database.The
approach would be beneficial for the future of archae-
ology in a new paradigm of combining field survey
and AI.
ACKNOWLEDGMENT
This work has been partially supported by EU DUCA,
EU CyberSecPro, SYNAPSE, PTR 22-24 P2.01 (Cy-
bersecurity) and SERICS (PE00000014) under the
MUR National Recovery, Resilience Plan funded by
the EU - NextGenerationEU projects, by MUR -
REASONING: foRmal mEthods for computAtional
analySis for diagnOsis and progNosis in imagING -
PRIN, e-DAI (Digital ecosystem for integrated anal-
ysis of heterogeneous health data related to high-
impact diseases: innovative model of care and re-
search), Health Operational Plan, FSC 2014-2020,
PRIN-MUR-Ministry of Health and the National
Plan for NRRP Complementary Investments D
3 4
Health: Digital Driven Diagnostics, prognostics and
therapeutics for sustainable Health care and Progetto
MolisCTe, Ministero delle Imprese e del Made in
Italy, Italy, CUP: D33B22000060001, and by Fon-
dazione Intesa SanPaolo Onlus in the ”Doctorates
in Humanities Disciplines” for the ”Artificial Intel-
ligence for the Analysis of Archaeological Finds”
topic.
REFERENCES
Barucci, A., Cucci, C., Franci, M., Loschiavo, M., and Ar-
genti, F. (2021). A deep learning approach to an-
cient egyptian hieroglyphs classification. Ieee Access,
9:123438–123447.
Barucci, A. and Neri, E. (2020). Adversarial radiomics:
the rising of potential risks in medical imaging from
ICSOFT 2024 - 19th International Conference on Software Technologies
440
adversarial learning. European Journal of Nuclear
Medicine and Molecular Imaging, 47(13):2941–2943.
Hamdany, A. H. S., Omar-Nima, R. R., and Albak, L. H.
(2021). Translating cuneiform symbols using artificial
neural network. TELKOMNIKA (Telecommunication
Computing Electronics and Control), 19(2):438–443.
Horak, K. and Sablatnig, R. (2019). Deep learning concepts
and datasets for image recognition: overview 2019.
In Eleventh international conference on digital image
processing (ICDIP 2019), volume 11179, pages 484–
491. SPIE.
Huang, P., Xiao, H., He, P., Li, C., Guo, X., Tian, S.,
Feng, P., Chen, H., Sun, Y., Mercaldo, F., et al.
(2024). La-vit: A network with transformers con-
strained by learned-parameter-free attention for in-
terpretable grading in a new laryngeal histopathol-
ogy image dataset. IEEE Journal of Biomedical and
Health Informatics.
Jiang, P., Ergu, D., Liu, F., Cai, Y., and Ma, B. (2022). A re-
view of yolo algorithm developments. Procedia Com-
puter Science, 199:1066–1073.
Mercaldo, F., Zhou, X., Huang, P., Martinelli, F., and San-
tone, A. (2022). Machine learning for uterine cervix
screening. In 2022 IEEE 22nd International Confer-
ence on Bioinformatics and Bioengineering (BIBE),
pages 71–74. IEEE.
Mohsen, S. E., Mansour, R., Bassem, A., Dessouky, B., Re-
faat, S., and Ghanim, T. M. (2023). Aegyptos: Mo-
bile application for hieroglyphs detection, translation
and pronunciation. In 2023 International Mobile, In-
telligent, and Ubiquitous Computing Conference (MI-
UCC), pages 1–8.
Moustafa, R., Hesham, F., Hussein, S., Amr, B., Refaat,
S., Shorim, N., and Ghanim, T. M. (2022). Hiero-
glyphs language translator using deep learning tech-
niques (scriba). In 2022 2nd International Mobile, In-
telligent, and Ubiquitous Computing Conference (MI-
UCC), pages 125–132. IEEE.
Redmon, J., Divvala, S., Girshick, R., and Farhadi, A.
(2016). You only look once: Unified, real-time object
detection. In Proceedings of the IEEE conference on
computer vision and pattern recognition, pages 779–
788.
Sah, S., Shringi, A., Ptucha, R., Burry, A. M., and Loce,
R. P. (2017). Video redaction: a survey and compar-
ison of enabling technologies. Journal of Electronic
Imaging, 26(5):051406.
Sanchez, S., Romero, H., and Morales, A. (2020). A re-
view: Comparison of performance metrics of pre-
trained models for object detection using the tensor-
flow framework. In IOP Conference Series: Materials
Science and Engineering, volume 844, page 012024.
IOP Publishing.
Shorten, C. and Khoshgoftaar, T. M. (2019). A survey on
image data augmentation for deep learning. Journal
of big data, 6(1):1–48.
Yan, T., Sun, W., and Cui, K. (2022). Real-time ship object
detection with yolor. In Proceedings of the 2022 5th
International Conference on Signal Processing and
Machine Learning, pages 203–210.
Zhou, X., Tang, C., Huang, P., Tian, S., Mercaldo, F., and
Santone, A. (2023). Asi-dbnet: an adaptive sparse
interactive resnet-vision transformer dual-branch net-
work for the grading of brain cancer histopathological
images. Interdisciplinary Sciences: Computational
Life Sciences, 15(1):15–31.
Egyptian Hieroglyphs Localisation Through Object Detection
441