discusses the obtained results. We end with a general
conclusion and some prospects.
2 RELATED WORKS
Due to recent advancements in deep learning tech-
nology, numerous deep learning-based solutions were
put up for the challenge of Arabic text recognition. In
2008, Graves (Graves and Schmidhuber, 2008) ini-
tially published the first deep learning-based method
for AHTR from document images. The Multi-
Dimensional Long Short Term Memory (MDLSTM)
network and Connectionist Temporal Loss (CTC)
were employed. The suggested model’s accuracy was
equal to 91.4% on the IFN/ENIT dataset.
As we continue our study of the development of
deep-learning-based techniques, we discuss the one
presented in (Abandah et al., 2014). It is based on the
graphemic division of cursive words. A characteris-
tics vector is retrieved and sent to a BLSTM, which
uses graphemes to exploit the transcript sequences.
IA segmentation-free RNN strategy using a four-
layer bidirectional Gated Recurrent Unit (GRU) net-
work with a CTC output layer and the dropout tech-
nique was described by Chen et al. in 2017 (Chen
et al., 2017). The ”abcd-e” scenario was used by
the authors to assess the system performance on the
IFN/ENIT database, and the accuracy rate reached
was 86.4 %.
In 2019, a Convolutional Deep Belief Network
(CDBN) framework was suggested for handwritten
Arabic text recognition in (Elleuch and Kherallah,
2019). The authors employed data augmentation and
dropout regularization to improve the model’s func-
tionality and prevent over-fitting. The model’s accu-
racy rate was 98.86% when initially tested against the
HACDB character database. Further, it was tested on
the IFN/ENIT database and attained an accuracy of
92.9
In 2020, the authors of (Ahmad et al., 2020) sug-
gested a deep learning-based strategy for Arabic text
recognition. They employed preprocessing, which in-
cluded de-skewing the skewed text lines and pruning
extra white spaces. They also used data augmentation
to train the proposed MDLSTM-CTC model using the
KHATT database and achieved a character recogni-
tion rate of 80.02 %. In the same year, Mohamed
Eltay’s (Eltay et al., 2020) Exploring approach con-
sists of a CNN for feature extraction followed by a
recurrent neural network concatenated to a CTC layer
for the learning and transcription of Arabic handwrit-
ten words. The model is trained and tested with the
INF/ENIT and AHDB databases. Recognition rates
of 98.10 % and 93.57 were reached on the IFN/ENIT
and AHDB databases, respectively.
As the last work, we cite the recent one presented
in (Albattah and Albahli, 2022), where several deep
learning and hybrid models were developed. They
used deep learning for feature extraction and machine
learning for classification to build hybrid models. The
transfer-learning model on the MNIST dataset pro-
duced the best results among the standalone deep-
learning models trained on the two datasets used in
the trials, with an accuracy of 99.67%. While accu-
racy measures for all of the hybrid models using the
MNIST dataset were greater than 0.9%, the results for
the hybrid models using the Arabic character dataset
were inferior.
3 PROPOSED SYSTEM
The proposed deep neural network for resolving the
AHTR problem is presented in this section. It con-
sists of three main end-to-end components: a CNN,
an RNN, and a CTC (the used architecture is pre-
sented in Figure 2(a)). This combination is the most
promising alternative because it outperforms all other
strategies. CNN performs the extraction of sequence
characteristics from the input pictures. Furthermore,
information inside this sequence is propagated via the
RNN. It produces a matrix of character ratings for
each element of the sequence. The suggested model
will be trained using the CTC function, which is used
to make inferences for the input image. The CTC de-
codes the output matrix of the RNN to infer the text
recognized from the input image. Without the need
for character-level segmentation, word-level recogni-
tion is made possible by these two associated net-
works coupled with the CTC.
3.1 Preprocessing
The first fundamental step in an OCR model is pre-
processing, which aims to improve the quality of the
images in the database by suppressing distortions or
enhancing features to get better results. Even though
the IFN/ENIT images had already been extracted and
binarized, we used a preprocessing step to resize the
input images to the 32 × 128 shape without distortion.
3.2 Basic Model: CNN+RNN
The hybrid CNN-RNN model has given excellent re-
sults in different domains, such as visual description,
video recognition of emotions, etc. We performed
ICPRAM 2023 - 12th International Conference on Pattern Recognition Applications and Methods
530