As research in artificial neural networks has
deepened, many deep learning methods, including
Long Short Term Memory (LSTM) and
Convolutional Neural Networks (CNN) have become
enriched and matured and have gained significant
attention as promising research directions in the field
of handwritten font recognition. Specifically,
Artificial Neural Network (ANN) has an advantage in
pattern recognition for handwritten characters
(Abiodun et al 2019), and it is competent for ANN to
handle non-linear tasks. Regarding CNN (Vaidya et al
2018), it can successfully capture the tiniest
information in images, facilitating feature extraction
for handwritten characters. CNN models leverage
parameter sharing, reducing the number of learnable
parameters and streamlining deep network training.
Moreover, their ability to capture spatial hierarchies of
features enables CNN models to excel in identifying
stroke patterns, connectivity, and overall font
structural characteristics. In contrast, using RNNs,
especially LSTM units as fundamental structures
further enhances recognition capabilities by
considering sequential information and dependencies
within the handwritten text. Adapting an LSTM
structure allows RNNs to effectively maintain and
update internal states when dealing with long
sequences in time-series machine-learning tasks
(Abiodun et al 2019). This ability to capture temporal
dependencies is particularly beneficial in deciphering
the nuances of handwritten fonts. The advantages of
deep learning, such as scalability, adaptability to
diverse data, and improved recognition accuracy,
make them pivotal in the ongoing progress of
handwritten font recognition research. This paper
endeavors to enhance the model's performance
through an initial deskewing of handwritten images,
followed by the application of various techniques such
as cropping and scaling. These techniques effectively
adapt the images to a size and style conducive to
utilization as input to the model. Additionally, to
enhance the model’s precision in identifying
handwritten fonts, I adopted a CNN-RNN hybrid
model to extract image features and complete the
classification problem of word recognition. In
addition, a single CNN method is used for comparison
to highlight the improvement in recognition of the
CNN-RNN hybrid model.
The paper is structured in the following manner:
Section II highlights the contribution of related works
in the handwriting recognition field. Section III
demonstrates the fundamental architecture and
principles of the CNN-RNN hybrid model. Section IV
describes the entire experimental designation and the
outcomes of model performance. Section V pertains to
the overall conclusion and suggests feasible actions
for future research.
2 RELATED WORKS
Significant advancements in handwriting recognition
have been made through the emergence of innovative
approaches in recent research. Geetha et al. introduced
a hybrid model that employs a CNN that can grasp tiny
featural image details and an RNN-LSTM to improve
recognition precision (Geetha et al 2021). More
recently, an effective method has been proposed to
generate similar image samples with arbitrary lengths
from original handwriting recognition samples (Kang
et al 2022). This alleviates the problem of manually
annotating handwritten data and enables training the
handwriting recognition model with smaller image
samples. Additionally, Gupta and Bag introduced a
polygonal approximation-based approach for
Devanagari character recognition, validated using
multiple neural networks (Gupta and Bag 2022).
Furthermore, Zouari et al. presented a fusion
model that utilized beta-ellipse parameters of
segmented handwritten characters, combining TDNN-
SVM for clustering and training, yielding impressive
outcomes on extensive multilingual datasets (Zouari et
al 2018). Carbune et al. described a multilingual
system that services online handwriting based on
LSTM architecture in conjunction with Bézier curves
(Carbune et al 2020). In another study, Alam et al.
developed a hybrid recognizer that combines LSTM
and CNN models for recognizing writing trajectories
in motion gesture digit and letter datasets (Alam et al
2020). Lastly, Zhang et al. explored a tree-BLSTM
architecture, a variant of the LSTM model, to
recognize two-dimensional mathematical expressions
(Zhang et al 2020).
3 METHODS
3.1 Handwriting Image Preprocessing
Inspired by Hu's seven-moment invariants, a method
that uses all pixel information to calculate the center
distance is adopted (Devi and Amitha 2014). The
image skew is determined for each handwritten text
image by dividing the second-order central moment
by the first-order central moment with regard to pixels
and intensities. The image's skew is then rectified
through the inverse mapping technique alongside
linear interpolation via an affine transformation, and
the correction matrix is computed and included within
the skew parameter. After the corrected image is