Egyptian Hieroglyphs Localisation Through Object Detection

Lucia Lombardi, Francesco Mercaldo and Antonella Santone

Department of Medicine and Health Sciences “Vincenzo Tiberio”, University of Molise, Campobasso, Italy

Keywords:

Archeology, Hieroglyphs, Deep Learning, Object Detection, YOLO.

Abstract:

Old Egyptians used Hieroglyphic language to record their ﬁndings in medicine, engineering, sciences, achieve-

ments, their religious views, beside facts from their daily life. Thus, it is fundamentally important to under-

stand and digitally store these scripts for anyone who wants to understand the Egyptian history and learn more

about this great civilization. The interpretation of Egyptian hieroglyphs is areasonably broad and highly com-

plex problem, but have always been fascinating with their stories and the ability to be read in several ways

rather than one, which is a challenge in itself to be translated to modern languages. In this paper, we adopt

the YOLO 8 model which revolutionized object detection with its one-stage deep learning approach. YOLO

is designed to classify images and accurately determine the positions of detected objects within them. Using

this DL approach, we were able to signiﬁcantly reduce the time required to investigate the interpretation of

hieroglyphs. To ensure the reproducibility of our results, we opted to utilize a publicly available dataset. All

the metrics demonstrate the anticipated patterns: precision, recall, mAP 0.5, and mAP 0.5:0.95 are expected

to increase as the number of epochs progresses, indicative of the model effectively learning to detect objects

from Egyptian hieroglyphs images.

1 INTRODUCTION

Recent advances in automated sensing enabled by the

proliferation of drones, robotics and Light Detection

and Ranging (LiDAR), Big Data, and artiﬁcial in-

telligence may fuel the next wave of archaeological

discovery (Zhou et al., 2023; Mercaldo et al., 2022;

Huang et al., 2024). Despite the concerns related to

its use (Barucci and Neri, 2020), there is in AI the

amazing power and the perceived hope not only to

automate human tasks, but also to improve human

understanding. Fields such as archaeology, philology

and human sciences are now beginning to be perme-

ated from AI, even though its actual role has still to

be fully understood. The interpretation of Egyptian

hieroglyphs is a reasonably broad and highly com-

plex problem. Hieroglyphs are usually found in dif-

ferent materials, such as stone, ceramics, wood, and

papyrus, among others. Visual identiﬁcation will be

signiﬁcantly different depending on the material used

to elaborate them. This problem is because each ma-

terial provides a distinct colors and textures. On the

other hand, many hieroglyphs can change their hori-

zontal orientation to show a different meaning in what

they want to express. This work focuses on this spe-

ciﬁc aspect, integrating the Egyptological perspective

with the application of the latest information tech-

nologies. The obtained results explore the data pro-

cessing methodology and the creation of a functional

and usable AI model for Egyptologists. The results

represent a contribution to the research on automatic

translation of ancient Egyptian. The AI model is able

to generate reasonably accurate translations and can

be used to facilitate the interpretation of ancient Egyp-

tian texts. However, it also shows that the research

that can be done in this direction is vast and requires

further study.

2 THE METHOD

This section introduces the proposed method de-

signed to identify and pinpoint Egyptian hieroglyphs

within images. Speciﬁcally, we present an ap-

proach intended to autonomously detect Egyptian hi-

eroglyphs directly from images, such as those cap-

tured by robots. Additionally, our method can pre-

cisely locate these hieroglyphs within the image and

provide a prediction percentage for each detected hi-

eroglyphs.

Figure 1 depicts the proposed method.

To develop a proﬁcient deep learning model for

434

Lombardi, L., Mercaldo, F. and Santone, A.

Egyptian Hieroglyphs Localisation Through Object Detection.

DOI: 10.5220/0012787900003753

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 19th International Conference on Software Technologies (ICSOFT 2024), pages 434-441

ISBN: 978-989-758-706-1; ISSN: 2184-2833

Figure 1: The proposed method.

detecting Egyptian hieroglyphs in images, it’s crucial

to possess a dataset containing images sourced from

expert archeologists as well as robots or drones (refer

to ”archeologist” in Figure 1). Robots or drones are

especially valuable for capturing images in remote or

difﬁcult-to-access locations. In order to construct a

model capable of not only identifying Egyptian hi-

eroglyphs within an image but also accurately locat-

ing them, a dataset containing images with detailed

hieroglyph positions is indispensable. These images

are meticulously labeled and annotated by domain ex-

perts (i.e., ”archeologist” in Figure 1) to mark the ar-

eas in the images where the Egyptian hieroglyphs are

present (referred to as ”label and bounding boxes” in

Figure 1). The detection class for the bounding box is

singular, focusing solely on the ankh symbol (indicat-

ing the proposed model’s emphasis on detecting frag-

ments of the ankh Egyptian hieroglyph). The choice

to prioritize the detection of this speciﬁc hieroglyph is

due to its historical signiﬁcance as a widely used dec-

orative motif in ancient Egypt and neighboring cul-

tures, symbolizing ”life” itself.

Furthermore, to ensure the development of an ef-

fective model capable of accurately predicting unseen

images, a diverse set of images captured from vari-

ous angles and under different conditions is neces-

sary. While these images may vary in size initially,

a preprocessing step is required to standardize their

dimensions.

The subsequent phase involves augmenting the

dataset, as depicted in Figure 1 (”image augmenta-

tion”). This augmentation process utilizes a range of

techniques to expand the dataset without the need for

additional data collection. By introducing controlled

random alterations to existing images, such as ﬂips,

augmented duplicates are generated. This technique

enhances the precision of artiﬁcial neural networks

during the training process by exposing them to a

broader range of data.

Speciﬁcally, data augmentation techniques are

employed to create images with controlled random

changes, such as ﬂips (see (Shorten and Khosh-

goftaar, 2019)). This approach aims to ensure the

model’s effectiveness in recognizing Egyptian hiero-

glyphs regardless of their position within the image.

Additionally, augmented data helps mitigate the risk

of overﬁtting by preventing the model from becom-

ing overly tailored to speciﬁc instances encountered

during training.

Once the (augmented) images, along with associ-

ated details regarding the hieroglyph class and bound-

ing boxes, are obtained, the next step is to implement

a deep learning model (referred to as the ”Object De-

tection model” in Figure 1).

In this paper, we adopt the YOLO 8 model ((Red-

mon et al., 2016)), which revolutionized object de-

tection with its one-stage deep learning approach.

YOLO is designed to classify images and accurately

determine the positions of detected objects within

them.

The key feature distinguishing YOLO from other

models is its ability to perform the entire detection

process in a single step. The YOLO process involves

inputting an image and producing an output consist-

ing of two components: a bounding box vector asso-

ciated with the predicted class of the detected object

for each cell in a grid representing the image.

Each image is divided into a grid of cells, and a

cell is responsible for an object if the object’s center

falls within it. The bounding box prediction includes

ﬁve components: (x, y, w, h, conﬁdence), where (x, y)

represent the box’s center normalized to the cell’s po-

sition, and (w, h) represent the box’s dimensions nor-

malized to the image size. Consequently, the predic-

tions consist of S x S x B * 5 outputs for the bounding

boxes ((Jiang et al., 2022)).

Compared to existing models, YOLO offers sig-

niﬁcantly faster performance ((Sanchez et al., 2020;

Sah et al., 2017)), thanks to its single-phase approach,

which predicts bounding boxes, object probabilities,

Egyptian Hieroglyphs Localisation Through Object Detection

435

and classes without multiple sequential steps.

We choose YOLO for its speed and lower likeli-

hood of identifying false positives in the image back-

ground compared to alternative models ((Horak and

Sablatnig, 2019; Jiang et al., 2022)). These qualities

make YOLO one of the most effective convolutional

neural network models for object detection.

The YOLO network consists of a backbone, re-

sponsible for collecting and organizing image fea-

tures, and a Head, which utilizes these features for

box and class prediction. Between the backbone and

the head lies the neck, which integrates image features

before forwarding them for prediction. In this paper,

we experiment with the YOLOv8s model, the smaller

version of YOLO 8 ((Yan et al., 2022)).

3 THE EXPERIMENT

In this section, we showcase the outcomes of our

experimental investigation, which was conducted to

demonstrate the efﬁcacy of employing the YOLO 8

model for detecting and localizing Egyptian Hiero-

glyphs within images.

We compiled images from the COTA COCO anks

Image Dataset, a collection speciﬁcally curated for

constructing models geared towards Egyptian Hiero-

glyph detection from images. This dataset is openly

accessible for research endeavors

The utilized dataset comprises 1729 distinct real-

world images featuring Egyptian Hieroglyphs. Each

image is annotated with a single label, speciﬁcally

”ank,” along with its corresponding localization, rep-

resented by a bounding box indicating the position of

the Egyptian Hieroglyph within the image. To ensure

the reproducibility of our results, we opted to utilize a

publicly available dataset.

The images of Egyptian Hieroglyphs are stored in

JPEG format with a resolution of 640 x 640 pixels.

We divided the images as follows: 1230 images for

training, 325 for validation, and the remaining 174

for the test set, representing an approximate split per-

centage of 70:20:10, respectively. The dataset we ac-

quired comes with annotations, meaning that each im-

age includes detailed information about the bounding

box surrounding each Egyptian Hieroglyph. Image

augmentation was performed by generating additional

images for each original image through horizontal and

vertical ﬂips. After implementing data augmentation,

we obtained the ﬁnal dataset. For model training,

we selected a batch size of 16 and set the number

https://universe.roboflow.com/matthew-custer-bclqa

/cota coco anks/dataset/3

of epochs to 10, with an initial learning rate of 0.01.

For model training and testing, we utilized a machine

equipped with an NVIDIA Tesla T4 GPU card boast-

ing 16 GB of memory.

The experimental results obtained from our pro-

posed method are visualized in Figures 2 through var-

ious plots.

Figures 2 present the experimental ﬁndings de-

rived from the proposed method, showcased through

multiple plots. In the top row of plots in Figure 2,

the following metrics are depicted: ”train/box loss”

(representing the trend of box loss during training,

measuring the ﬁdelity of predicted bounding boxes

to the ground truth object), ”train/obj loss” (illus-

trating the trend of obj loss during training, where

objectness determines the presence of an object at

an anchor), ”train/cls loss” (displaying the trend of

cls loss during training, which gauges the accuracy

of object classiﬁcation within each predicted bound-

ing box, with each box potentially containing an ob-

ject class or ”background” – commonly referred to as

cross-entropy loss), precision trend, and recall trend.

In the bottom row of plots in Figure 2, the following

metrics are showcased: ”val/box

loss” (depicting the

trend of box loss in validation), ”val/obj loss” (illus-

trating the trend of obj loss in validation), mean Av-

erage Precision when Intersection over Union is equal

to 0.5 (mAP 0.5), and mean Average Precision when

Intersection over Union ranges between 0.5 and 0.95

(mAP 0.5:0.95).

All the metrics demonstrate the anticipated pat-

terns: precision, recall, mAP 0.5, and mAP 0.5:0.95

are expected to increase as the number of epochs pro-

gresses, indicative of the model effectively learning

to detect objects from Egyptian hieroglyphs images.

Conversely, the other metrics display a decreasing

trend as the number of epochs increases, providing

further evidence that the model is effectively learn-

ing from Egyptian hieroglyph images. Speciﬁcally,

the loss metrics generally signify instances where the

model misidentiﬁes a particular object, hence the loss

values typically start high in the initial epochs and

gradually decrease as the model learns to accurately

detect objects of interest.

In Figures 2, the plots for metrics/mAP 0.5 and

metrics/mAP 0.5:0.95 illustrate the mAP value for

IOU=50 and IOU ranging from 50 to 95 (signifying

different IoU thresholds from 0.5 to 0.95, with a step

size of 0.05) on average mAP.

It is noteworthy that both the metrics/mAP 0.5

and metrics/mAP 0.5:0.95 plots in Figure 2 exhibit

an increasing trend. This observation indicates that

the model is effectively learning the spatial locations

within images to accurately identify the objects of in-

ICSOFT 2024 - 19th International Conference on Software Technologies

436

Figure 2: Experimental analysis results.

Figure 3: The Precision-Recall graph.

terest, namely humans and dogs.

Table 1 presents the results for Precision, Recall,

mAP 0.5, and mAP 0.5:0.95 metrics, provided indi-

vidually for the ank Egyptian hieroglyph class.

Table 1: Classiﬁcation results.

Precision Recall mAP 0.5 mAP 0.5:0.95

0.87074 0.73876 0.83606 0.61249

Observing Table 1, we observe that the Precision

and Recall values are 0.87 and 0.738, respectively for

the ank Egyptian hieroglyph class.

Additionally, for a more comprehensive assess-

ment of the proposed method, precision and recall

values are visualized on the Precision-Recall graph in

Figure 3.

The expected trend of this plot is a monotonically

decreasing one, as there is typically a trade-off be-

tween precision and recall: increasing one usually re-

sults in a decrease in the other. While there can be ex-

ceptions or data limitations that prevent the precision-

recall graph from strictly following this trend, the plot

in Figure 3 demonstrates a decreasing trend for the

relevant labels.

Moreover, the precision-recall plot displays the

Area Under the Curve (AUC) values associated with

the analysed class (i.e., ank Egyptian hieroglyph),

along with the identiﬁcation the ank class with

mmAP 0.5. As previously mentioned, the precision-

recall trend is anticipated to be monotonically de-

creasing. This behavior is evident from the precision-

recall plot concerning all classes with mAP 0.5,

yielding an AUC of 0.836. Given that these metrics

range from 0 to 1, these values serve as indicative ev-

idence that the proposed model is capable of effec-

tively detecting ank Egyptian hieroglyphs from im-

ages.

To visually verify the proposed method and afﬁrm

its efﬁcacy in real-world scenarios, we present exam-

ples of images with manually performed annotations

(depicted in Figure 4) and the same images with de-

tections and bounding boxes (illustrated in Figure 5).

This enables a direct comparison between the man-

ual annotations and those generated by the proposed

model.

As depicted in Figure 4, we incorporate images

captured from various angles and with subjects po-

sitioned at varying distances. This approach aims to

maximize the model’s generalizability. Notably, the

proposed model demonstrates the capability to detect

more ank Egyptian hieroglyphs within the same im-

ages, unaffected by background color. In Figure 4,

each image includes detailed bounding box annota-

tions for the classes involved in the experiment (high-

lighted in red) along with their respective label i.e.,

ank.

Egyptian Hieroglyphs Localisation Through Object Detection

437

Figure 4: Example of images with the related bounding box around the ank Egyptian hieroglyphs, manually added for model

building.

Figure 5 displays the predictions and bounding

boxes generated by the proposed model during the

testing phase. It is important to note that during test-

ing, the images are inputted to the model without any

pre-existing bounding boxes.

The observations from Figure 5 reveal that the

proposed model demonstrates proﬁcient localization

of areas containing anks. In fact, for the majority

of the images, the bounding boxes closely resemble

those depicted in Figure 4.

As can be observed from the images displaying

both ank manual annotations and ank annotations pre-

dicted by the developed model (i.e., Figure 4 and Fig-

ure 5), the background does not appear to be a dis-

rupting factor for the model. In fact, the ankhs are

correctly identiﬁed even when present alongside other

hieroglyphs on materials of the same color. Moreover,

even under low-light conditions and varying angles,

the ankhs are accurately identiﬁed. The size of the

ankh also does not pose an issue; as seen from the ex-

amples, regardless of size, the hieroglyph is correctly

identiﬁed and notably not confused with other hiero-

glyphs typically present in the images.

ICSOFT 2024 - 19th International Conference on Software Technologies

438

Figure 5: Example of ank predictions automatically performed by the proposed method.

4 RELATED WORK

The problem of ancient Egyptian language retrieval

and classiﬁcation has been addressed, with different

purposes, in several works and several examples of

applications of the new technologies to the classiﬁ-

cation of ideograms belonging to ancientor no more

used languages can be found in the literature. In this

section, we review the current state-of-the-art litera-

ture about the application of deep learning models on

the DR detection. Below we discuss these papers.

Authors in (Barucci et al., 2021), used ResNet-

50 to develop a classiﬁcation method to classify the

glyphs. They didn’t ﬁnd the sufﬁcient dataset to train

their approach, so they used a different dataset then

they used transfer learning. They also implemented a

novel architecture called Glyphnet and trained it on a

small hieroglyphic dataset which is designed for the

speciﬁc task of hieroglyph classiﬁcation and trained

the network on it. The result showed that Glyphnet

achieved an accuracy rate of 96% which is the high-

est accuracy found literature. But the data used in

their work is unfortunately not available for other re-

searchers to validate the results.

Egyptian Hieroglyphs Localisation Through Object Detection

439

Researchers in (Moustafa et al., 2022), show Deep

learning techniques, such as EfﬁcientNet, MobileNet,

and ShufﬂeNet. This study has been tested on two

hieroglyph datasets. This paper describes a ﬂutter-

based mobile application named Scriba. This appli-

cation provides as an advantage an exact translation

of hieroglyphs.

Authors in (Hamdany et al., 2021),presents a

novel method in 2021 for translating cuneiform writ-

ing by utilizing ANN. For all intents and purposes,

a Multi-Layer Perceptron (MLP) neural network

has been adapted to translate images of Sumerian

cuneiform symbols into corresponding English let-

ters. The process involves utilizing an artiﬁcial intelli-

gence technique, the MLP neural network, to translate

images of Sumerian cuneiform symbols into English

letters. The central concept behind the method that

authors have suggested is to acquire an image of any

cuneiform symbol and then generate an indicator for

the corresponding English letter. Considering this, it

is possible for letters of the English alphabet to be in-

telligently generated from cuneiform symbol images.

The proposed method uses neural networks, but it can

also be represented using other information technol-

ogy and artiﬁcial intelligence models. This work has

been successfully established, and it achieved a score

of %100; however, it faced challenges such as the in-

ability to recognize chipped circuit boards, lack of

public access to the dataset, difﬁculty in determin-

ing network structure, and absence of pre-processing.

Consequently, the proposed network is unable to deal

with the variations that can be found in images, such

as rotation, noise, and so on.

In 2023, authors in (Mohsen et al., 2023), devel-

oped the idea of of Aegyptos: Mobile Application

for Hieroglyphs Detection, Translation and Pronun-

ciation, was brought up to help learn how to read

Hieroglyphs and also help to pronounce them by us-

ing a tool at the palm of users’ hands with just their

phones’ live cameras. Performing segmentation on

the symbols using different segmentation techniques

like Otsu Thresholding, while a lightweight CNN

known as SqueezeNet is used for classiﬁcation with

the help of an API to translate the script into a lan-

guage understood by the user.

5 CONCLUSION AND FUTURE

WORK

In this work, we have explored the capability of deep

learning techniques to face the problem of ancient

Egyptian hieroglyphs classiﬁcation and analysis of re-

sults was conducted to determine the champion mod-

els and the best data settings. Our approach allows

DL to learn representations of images with better gen-

eralization performance, enabling the discovery of

targets that have been difﬁcult to ﬁnd in the past.

Moreover, by accelerating the research process, our

method contributes to archaeology by establishing a

new paradigm that combines ﬁeld surveys and AI,

leading to more efﬁcient and effective investigations.

Even though in this paper we focused on the single

hieroglyph classiﬁcation task, new and proﬁtable per-

spectives are opened by the application of deep learn-

ing techniques in the Egyptologic ﬁeld. In this view,

the proposed work can be seen as the starting point

for the implementation of much more complex goals.

In future work, the incorporation of other hieroglyphs

is proposed to expand the algorithm’s database.The

approach would be beneﬁcial for the future of archae-

ology in a new paradigm of combining ﬁeld survey

and AI.

ACKNOWLEDGMENT

This work has been partially supported by EU DUCA,

EU CyberSecPro, SYNAPSE, PTR 22-24 P2.01 (Cy-

bersecurity) and SERICS (PE00000014) under the

MUR National Recovery, Resilience Plan funded by

the EU - NextGenerationEU projects, by MUR -

REASONING: foRmal mEthods for computAtional

analySis for diagnOsis and progNosis in imagING -

PRIN, e-DAI (Digital ecosystem for integrated anal-

ysis of heterogeneous health data related to high-

impact diseases: innovative model of care and re-

search), Health Operational Plan, FSC 2014-2020,

PRIN-MUR-Ministry of Health and the National

Plan for NRRP Complementary Investments D

∧

3 4

Health: Digital Driven Diagnostics, prognostics and

therapeutics for sustainable Health care and Progetto

MolisCTe, Ministero delle Imprese e del Made in

Italy, Italy, CUP: D33B22000060001, and by Fon-

dazione Intesa SanPaolo Onlus in the ”Doctorates

in Humanities Disciplines” for the ”Artiﬁcial Intel-

ligence for the Analysis of Archaeological Finds”

topic.

REFERENCES

Barucci, A., Cucci, C., Franci, M., Loschiavo, M., and Ar-

genti, F. (2021). A deep learning approach to an-

cient egyptian hieroglyphs classiﬁcation. Ieee Access,

9:123438–123447.

Barucci, A. and Neri, E. (2020). Adversarial radiomics:

the rising of potential risks in medical imaging from

ICSOFT 2024 - 19th International Conference on Software Technologies

440

adversarial learning. European Journal of Nuclear

Medicine and Molecular Imaging, 47(13):2941–2943.

Hamdany, A. H. S., Omar-Nima, R. R., and Albak, L. H.

(2021). Translating cuneiform symbols using artiﬁcial

neural network. TELKOMNIKA (Telecommunication

Computing Electronics and Control), 19(2):438–443.

Horak, K. and Sablatnig, R. (2019). Deep learning concepts

and datasets for image recognition: overview 2019.

In Eleventh international conference on digital image

processing (ICDIP 2019), volume 11179, pages 484–

491. SPIE.

Huang, P., Xiao, H., He, P., Li, C., Guo, X., Tian, S.,

Feng, P., Chen, H., Sun, Y., Mercaldo, F., et al.

(2024). La-vit: A network with transformers con-

strained by learned-parameter-free attention for in-

terpretable grading in a new laryngeal histopathol-

ogy image dataset. IEEE Journal of Biomedical and

Health Informatics.

Jiang, P., Ergu, D., Liu, F., Cai, Y., and Ma, B. (2022). A re-

view of yolo algorithm developments. Procedia Com-

puter Science, 199:1066–1073.

Mercaldo, F., Zhou, X., Huang, P., Martinelli, F., and San-

tone, A. (2022). Machine learning for uterine cervix

screening. In 2022 IEEE 22nd International Confer-

ence on Bioinformatics and Bioengineering (BIBE),

pages 71–74. IEEE.

Mohsen, S. E., Mansour, R., Bassem, A., Dessouky, B., Re-

faat, S., and Ghanim, T. M. (2023). Aegyptos: Mo-

bile application for hieroglyphs detection, translation

and pronunciation. In 2023 International Mobile, In-

telligent, and Ubiquitous Computing Conference (MI-

UCC), pages 1–8.

Moustafa, R., Hesham, F., Hussein, S., Amr, B., Refaat,

S., Shorim, N., and Ghanim, T. M. (2022). Hiero-

glyphs language translator using deep learning tech-

niques (scriba). In 2022 2nd International Mobile, In-

telligent, and Ubiquitous Computing Conference (MI-

UCC), pages 125–132. IEEE.

Redmon, J., Divvala, S., Girshick, R., and Farhadi, A.

(2016). You only look once: Uniﬁed, real-time object

detection. In Proceedings of the IEEE conference on

computer vision and pattern recognition, pages 779–

788.

Sah, S., Shringi, A., Ptucha, R., Burry, A. M., and Loce,

R. P. (2017). Video redaction: a survey and compar-

ison of enabling technologies. Journal of Electronic

Imaging, 26(5):051406.

Sanchez, S., Romero, H., and Morales, A. (2020). A re-

view: Comparison of performance metrics of pre-

trained models for object detection using the tensor-

ﬂow framework. In IOP Conference Series: Materials

Science and Engineering, volume 844, page 012024.

IOP Publishing.

Shorten, C. and Khoshgoftaar, T. M. (2019). A survey on

image data augmentation for deep learning. Journal

of big data, 6(1):1–48.

Yan, T., Sun, W., and Cui, K. (2022). Real-time ship object

detection with yolor. In Proceedings of the 2022 5th

International Conference on Signal Processing and

Machine Learning, pages 203–210.

Zhou, X., Tang, C., Huang, P., Tian, S., Mercaldo, F., and

Santone, A. (2023). Asi-dbnet: an adaptive sparse

interactive resnet-vision transformer dual-branch net-

work for the grading of brain cancer histopathological

images. Interdisciplinary Sciences: Computational

Life Sciences, 15(1):15–31.

Egyptian Hieroglyphs Localisation Through Object Detection

441