ments based on a sensor then the processing is done
by a the Kplus algorithm for classification and recog-
nition of signs. In (Ibrahim et al., 2018) an automatic
visual system has been designed, it translates isolated
Arabic word signs into text. This translation system
has four main steps: segmentation and tracking of the
hand by a skin detector, feature extraction and finally
classification is done by Euclidean distance. Another
model for the recognition of Arabic sign language
alphabets was designed in (Al-Jarrah and Halawani,
2001) the work was done by training a set of ANFIS
models, each of them being dedicated to the recogni-
tion of a given gesture. Without the need for gloves,
an image of the gesture is acquired using a camera
connected to a computer. The proposed system is ro-
bust to changes in the position, size and/or direction
of the gesture in the image. depends on the calcula-
tion of 30 vectors between the center of the gesture
area and the useful part of the gesture edge. These
vectors are then fed into ANFIS to assign them to a
specific class (gesture). The proposed system is pow-
erful when faced with changes in the position, size
and/or direction of the gesture in the image. This is
due to the fact that the extracted features are assumed
to be invariant in terms of translation, scale and ro-
tation. The simulation results showed the model was
able to achieve a recognition rate of 93.55%.
Shanableh et al (Shanableh et al., 2007) proposed
a system based on gesture classification with KNN
method. KNN has proved its performance with an
accuracy of 97%. AL-Rousan et al. (Maraqa and
Abu-Zaiter, 2008) proposed an automatic Arab sign
language (ArSL) recognition system based on Hid-
den Markov models (HMMs). Experimental results
on using real ArSL data collected from deaf people
demonstrate that the proposed system has high recog-
nition rate for all modes.
For signer-dependent case, the system obtains a
word recognition rate of 98.13%, 96.74%, and 93.8%,
on the training data in offline mode, on the test data in
offline mode, and on the test data in online mode re-
spectively. On the other hand, for signer-independent
case the system obtains a word recognition rate of
94.2% and 90.6% for offline and online modes respec-
tively. The system does not rely on the use of data
gloves or other means as input devices, and it allows
the deaf signers to perform gestures freely.
In (Alzohairi et al., 2018) the authors propose the
use of a system based on feed-forward neural net-
works and recurrent neural networks with its own
architectures; partially and fully recurrent networks.
they obtained results with an accuracy of 95% for
static signs recognition. Alzohairi and al. in (Al-
Rousan et al., 2009) introduced an Arabic alphabet
recognition system. This system determines the HOG
descriptor and transfers a set of features to the SVM.
The proposed system achieved an accuracy of 63.5%
for the Arabic alphabet signs. Recently, several re-
searchers have developed a deep CNNs that identify
ArSL alphabets with a high level of accuracy. The ta-
ble 1 presents a summary of the methods based on the
CNN architecture.
3 DATASET
In this paper, we used the ArSL2018(Latif et al.,
2019) dataset which is composed of 54 049 gray
scale images with a size of 64x64.Variations of im-
ages have been introduced with different lighting and
backgrounds. The dataset was randomly divided into
eighty percent training set and twenty percent test set.
The total number of output classes is 32, ranging from
0 to 31, each representing an ArSL sign, as shown in
Figure 1.
4 PROPOSED MODEL FOR ASL
RECOGNITION
We propose a convolutional neural network(CNN) for
Arabic sign letters recognition, inspired by the great
success of CNN for image analysis. CNN is a system
that utilizes perception, algorithms in machine learn-
ing (ML) in the execution of its functions for analyz-
ing the data. This system falls in the category of ar-
tificial neural network (ANN). CNN is mostly appli-
cable in the field of computer vision. It mainly helps
in image classification and recognition. Our model
is based on centralized DL techniques, (Boughorbel
et al., 2019) and clean corpora (Boughorbel et al.,
2018). Our proposed architecture CNN-5 is com-
posed with 5 convolution layers. Then maximum
pooling layers follow each convolution layer. The
convolution layers have different structure in the first
layer and the second layer there are 64 kernels the
size of each kernel is similar 5*5 . Each pair of con-
volution and pooling layers was checked with a regu-
larization value of elimination which was 50% . The
activation function of the fully connected layer uses
ReLu and Softmax to decide whether the neuron fires
or not. The system was trained for hundred epochs
by the RMSProp optimizer with a cost function based
on categorical cross-entropy, as it converged well be-
fore hundred epochs, so the weights were stored with
the system. We have several parameters to set for the
model:the number of epochs during training and the
KEOD 2022 - 14th International Conference on Knowledge Engineering and Ontology Development
166