Recognition of Online Handwritten Gurmukhi Strokes

using Convolutional Neural Networks

Rishabh Budhouliya

, Rajendra Kumar Sharma

and Harjeet Singh

Department of Computer Science and Engineering, Thapar Institute of Engineering and Technology, Punjab, India

Chitkara University Institute of Engineering and Technology, Chitkara University, Punjab, India

Keywords:

Convolutional Neural Networks, Data Augmentation, Stroke Warping, Gurmukhi Strokes, Online

Handwritten Character Recognition.

Abstract:

In this paper, we attempt to explore and experiment multiple variations of Convolutional Neural Networks on

the basis of their distributions of trainable parameters between convolution and fully connected layers, so as

to achieve a state-of-the-art recognition accuracy on a primary dataset which contains isolated stroke samples

of Gurmukhi script characters produced by 190 native writers. Furthermore, we investigate the beneﬁt of data

augmentation with synthetically generated samples using an approach called stroke warping on the aforemen-

tioned dataset with three variants of a Convolutional Neural Network classiﬁer. It has been found that this

approach improves classiﬁcation performance and reduces over-ﬁtting. We extend this ﬁnding by suggesting

that stroke warping helps in estimating the inherent variances induced in the original data distribution due to

different writing styles and thus, increases the generalisation capacity of the classiﬁer.

1 INTRODUCTION

Research in Online Handwritten Recognition Sys-

tems seems to have peaked since the late 1990s, con-

sidering the fact that Google

s Multilingual Online

Handwritten Recognition System is able to support

22 scripts and 97 languages, which is being currently

used by certain commercial products like Google

Translate successfully (Keysers et al., 2017). Despite

this success in Google

s system, we want to bring cer-

tain factors into light which are limited to the scope of

Indo-Aryan Languages -

• Each written language has a strong variability as-

sociated with the writing style depending upon

certain demographic factors like region, age and

culture.

• In the case of such languages, major issues tackled

by any researcher include the shape complexity of

the characters, features for recognition of charac-

ters and stroke level recognition.

Motivated by these factors, our work aims to:

• Use a primary dataset called OHWR-Gurmukhi,

described in section IV, to sub sample a dataset

which includes 79 stroke classes with 100 sam-

ples each, referred as dataset 100. A Convolu-

tional Neural Network has been used to classify

the strokes to achieve a state-of-the-art accuracy.

• Explore and evaluate multiple CNN variants on

the basis of their distribution of trainable param-

eters between convolution layer and fully con-

nected layer on datasets with varying data samples

per class.

• Study the effect of stroke warping, a technique to

produce random variances within the stroke sam-

ples. The warped augmented dataset is created by

applying a combination of afﬁne transformation

(rotation) and elastic distortions to images of the

existing stroke samples.

• Experimentally evaluate the effect of data aug-

mentation on classiﬁcation accuracy of the clas-

siﬁer.

This paper is structured as follows. In section II, there

is a discussion about the development of character

recognition and the progression in Gurmukhi script

recognition. In section III, we introduce the Gur-

mukhi script, covering its features and the different

styles of writing the script. The generation of syn-

thetic data is covered in section IV, where we use

the technique called stroke warping, to augment the

dataset 100. After that, we delve into our main idea

in section V, an experiment which tests three classi-

ﬁers to validate the beneﬁt of data augmentation and

578

Budhouliya, R., Sharma, R. and Singh, H.

Recognition of Online Handwritten Gurmukhi Strokes using Convolutional Neural Networks.

DOI: 10.5220/0008960005780586

In Proceedings of the 12th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2020) - Volume 2, pages 578-586

ISBN: 978-989-758-395-7; ISSN: 2184-433X

to achieve an optimal classiﬁcation accuracy. In sec-

tion VI, the framework of the experiment is laid out

and results of the performed experiments are shown.

We discuss the result and their consequences on the

objectives of the paper in section VII and we conclude

the ﬁndings in section VIII.

2 RELATED WORK

LeNet was the very ﬁrst Convolutional Neural Net-

work used for visual detection tasks including char-

acter recognition and document analysis (Jackel et al.,

1995). This neural network was used to extract local

geometric features from the input image in a way that

preserved approximate relative locations of these fea-

tures. By convolving the input image with a trainable

kernel, the network was able to produce high level

feature maps which were then fed to a linear classiﬁ-

cation layer. For the system they created, an overall

OCR accuracy exceeding 99.00% was achieved.

Owing to the continuous academic research in

the Online Handwritten Chinese Character Recogni-

tion, it has been demonstrated (Xiao et al., 2017) that

methods based on CNNs can learn more discrimi-

native features from source data, which may lead to

a better end-to-end solution for Online Handwritten

Recognition problems. The authors of this paper con-

tinued to design a compact CNN classiﬁer for On-

line Handwritten Chinese Character Recognition us-

ing DropWeight for pruning redundant connections

in a CNN architecture maintaining an accuracy of

96.88%. Handwritten Bangla Digit Recognition us-

ing a CNN with Gaussian and Gabor Filters, achieved

98.78% recognition accuracy (Alom et al., 2017).

Along with working on improving the recognition

accuracy of the classiﬁers, researchers also worked

upon writer adaptation for online handwritten recog-

nition where they used lexemes to identify the styles

present in a particular writer’s sample data which re-

sulted in the reduction of average error rate on hand-

written words (Connell and Jain, 2002).

In the present paper, we demonstrate a method

to capture the variability induced by different writ-

ing styles, thus enhancing the generalization accu-

racy of the classiﬁer. It is pertinent here to discuss

the progression in Gurmukhi script recognition. A

recognizer using pre-processing algorithms (Normal-

ization, Interpolation and Slant Correction) has been

proposed to recognize loops, headline, straight line

and dot features from online handwritten Gurmukhi

strokes collected on a pen-tablet interface by 60 writ-

ers (Sharma et al., 2007). After this, a post proces-

sor for improving the accuracy of character recog-

nition was built to detect and aggregate strokes us-

ing set theory to recognize characters with an accu-

racy of 95.60% for single character stroke sequencing

(Kumar and Sharma, 2013). This work was done on

a dataset of 27,231 samples categorized on the ba-

sis of the proﬁciency of the writers. A signiﬁcant

shift came after Hidden Markov Models(HMM) and

Support Vector Machines(SVM) were used for clas-

siﬁcation while employing the features extracted on

the basis of region and cursiveness. This experiment

resulted in a 96.70% recognition rate of Gurmukhi

characters (Verma and Sharma, 2016). Their experi-

ments consisted of the methods to extract features and

then classify them using SVM or HMM for classiﬁca-

tion. Our aim in this work is to use the Deep Learn-

ing concept of learning features and then performing

extensive experimentation using CNNs to obtain bet-

ter recognition accuracy. One of the main advantage

of using a CNN is that it is able to extract features

automatically and is invariant to shift and distortion

(Wong et al., 2016).

3 GURMUKHI SCRIPT

Punjabi language is spoken by about 130 million peo-

ple, mainly in West Punjab in Pakistan and in East

Punjab in India. Indian Punjabi is written using the

Gurmukhi script, which has a fairly complex system

of tonal variance.

Some notable features of Gurmukhi script are :

• Gurmukhi script is cursive and written in left to

right direction with top down approach.

• A horizontal line, called a “shirorekha” is found

on the upper part of almost all the characters.

• Any Gurmukhi word can be divided into three

sections viz. Upper Zone, Middle Zone and the

Lower Zone. All the strokes are classiﬁed into

one of the three zone. The upper zone consists of

the region above the head line where some of the

vowels reside. The middle zone is the most popu-

lated zone, consisting of consonants and some of

the vowels. The lower zone contains some vow-

els and half characters that lie below the foot of

consonants (Verma and Sharma, 2017).

3.1 Different Styles of Writing

Gurmukhi Script

The critical area of research in Character recognition

is to capture and detect the complex nature of any

script that results in the variation of writing styles.

The documented reasons for variation in handwriting

Recognition of Online Handwritten Gurmukhi Strokes using Convolutional Neural Networks

579

Figure 1: Stroke classes for Gurmukhi character set.

style can be attributed to a distinct way of writing for

each person, and the ways can be:

• Speed of writing

• Style of holding the pen

• Formation of a character can be inﬂuenced by

the amount of strokes used to create a character.

Some users use a single stroke while others may

use multiple strokes to write the same thing.

3.2 Data Collection

The source population of our dataset is created by 190

writers of different age group to bring maximum vari-

ability in the population. A touch based, Tablet PC

has been used as input interface. The collected data

was annotated at stroke level with respective stroke

classes characterized by three zones; Upper, Middle

and the Lower zone as shown in Figure 1.

4 DATA AUGMENTATION

The data for this experiment was extracted from the

dataset called OHWR-Gurmukhi, which contains x-

and y- coordinates of the strokes captured using a

tablet PC. The x- and y- coordinates from an XML

ﬁle were parsed and converted into binary images us-

ing Python. We have chosen to work for the Mid-

dle Zone stroke set, particularly for the reason that

it contains some of the most complex strokes in any

character. Hence, dataset 100 contains 79 classes,

each class representing a distinct stroke, having 100

labeled 60×60 pixel images. This dataset was used

for two purposes. Firstly, it was used to create the

synthetic data through the augmentation techniques.

After that it was divided into a 75 images per class

training set, and 25 images per class testing set. No

Figure 2: The distribution of dataset 100 into training and

test set.

validation set was created due to the small size of the

dataset 100.

An important question to be answered is why does

an increase in data size by augmentation of sample

images increases the generalization accuracy? Also,

if there is an increase, could it be attributed to the fact

that stroke warping is able to capture the variability in

the empirical population due to the writer’s personal

characteristics as discussed above. To answer these

questions, we have set-up an experiment where the

correlation between change in nature of the dataset

due to addition of synthetic data samples and increase

in recognition accuracy can be seen and veriﬁed by

comparing performances of three different CNN clas-

siﬁers on those different datasets.

4.1 Augmentation in Data-space

The debate on the correct usage of synthetic data for

training a model was provoked at the 5th ICDAR con-

ference (Baird, 1989), resulting in a list of conclu-

sions:

• Usage of synthetic data was deﬁnitely beneﬁcial

as the model that is trained on the most data wins.

• Although producing synthetic data was consid-

ered a good practice, it was not considered safe.

Training on a mixture of real data and synthetic

data was considered the safest.

• Testing on synthetic data to claim good perfor-

mance is conceptually wrong.

At present, it has been well established that data aug-

mentation is a powerful technique to regularize neural

ICAART 2020 - 12th International Conference on Agents and Artiﬁcial Intelligence

580

Figure 3: Stroke ID:169 and Stroke ID:223, former being

the ﬁrst picture position wise. Median Filter was used to

smoothen the stroke images ﬁrst, then stroke warping was

applied, with a combination of varying α withing the range

of 4-6 pixels and random rotation within the range of 5 to

30 degrees in both directions.

networks to prevent over ﬁtting and improve perfor-

mance in an imbalanced class situation. Moreover,

they have shown that there is a direct correlation be-

tween the improved object detection performance and

the increase in average number of training samples

per class. The techniques involved in creating artiﬁ-

cial training samples range from cropping, rotating to

ﬂipping input images. Our method of data augmen-

tation is a combination of such simple techniques and

elastic deformation, explained in the next sub-section.

4.2 Stroke Warping

Stroke warping is the technique used during the train-

ing process in order to produce random variations in

the data (Yaegar et al., 1996). This technique pro-

duces a series of characters which are consistent with

stylistic variations within the writers. In the experi-

ment, we try to ﬁnd if these stylistic variations cre-

ate a synthetic sample distribution which is close to

the real population distribution of the 190 writers. In

our experiment, the warped character stroke was cre-

ated using a combination of afﬁne transformation (ro-

tation) and elastic deformation to the existing dataset

as shown in Figure 3.

The elastic deformations on the images were cre-

ated by ﬁrst generating normalized random displace-

ment ﬁeld r(x, y) where (x, y) is the location of a pixel

such that

= R

+ α ∗ r(x, y) (1)

where R

and R

explain the location of the pixels in

the warped and original images respectively. Here, α

decides the magnitude of displacement. The displace-

ment ﬁelds are convolved with a Gaussian of standard

deviation of σ which is the smoothness factor of the

extent of elastic deformation.

Figure 4: Distribution of dataset 2000 into training and val-

idation set. Testing set was chosen to be the dataset 100.

4.3 Generation of Synthetic Data using

Stroke Warping

The binary images of size 60×60 pixel taken as bi-

nary images from the dataset 100 were ﬁrst convolved

with a median ﬁlter, with kernel size 3×3 pixels to

smoothen out the irregularities in the stroke images.

After this, the images were subjected to a random

rotational transformation varying from 5

◦

to 30

◦

both clockwise and anti-clockwise direction. The im-

ages were distorted using elastic distortion, with σ

as 37, and α randomly varied from 4-5 pixels with

each image. This whole process was used to extrapo-

late 2,000 images per stroke from the 100 image per

stroke dataset. Hence, the produced synthetic dataset

had 1,58,000 samples in total, with 1,800 samples per

stroke devoted for training data, rest 200 images per

stroke for the validation set as shown in Figure 4. This

dataset will be referred as dataset 2000 in this paper.

It is worth mentioning that the amounts of each dis-

tortion in an image to be applied was examined by

human eye to verify that they induce a natural range

of variation in the dataset.

4.4 Division of Synthetic Dataset into

500, 1,000 and 1,500 Samples per

Stroke

The synthetic dataset, dataset 2000, was randomly

divided into smaller subsets of sizes 500, 1,000

and 1,500 images per class as shown in Figure 5.

These smaller subsets will be referred as dataset 500,

dataset 1000 and dataset 1500, respectively. This was

done with a purpose to validate the hypothesis that

classiﬁcation accuracy can be increased by increasing

the data samples with the help of augmentation. All

the three classiﬁer models have been trained on these

Recognition of Online Handwritten Gurmukhi Strokes using Convolutional Neural Networks

581

Figure 5: Redistribution of dataset 2000 into dataset 500,

dataset 1000, dataset 1500.

subsets and further tested on the original images and it

has been expected that the testing accuracy increases

as the size of the training dataset increases.

5 METHODOLOGY

In accordance to the initial objectives, we pro-

pose an experiment which is designed to ﬁnd

the optimal recognition accuracy for Gurmukhi

strokes. To do so, the experiment starts with

training three different CNN classiﬁers on ﬁve

datasets (dataset 100, dataset 500, dataset 1000,

dataset 1500, dataset 2000). Now, 20.00% of the in-

put dataset is used for validation and testing is done

on the dataset

100, with 7,900 images. The reason

for choosing such a testing set was to ensure that our

testing accuracy depicts a good picture of the capac-

ity of the model to imitate the original distribution of

dataset 100. This process has furnished three graphs,

one for each model, depicting the performance of

each model with varying datasets. These graphs will

be used to validate the ﬁndings of the experiment.

The CNN classiﬁer has three variants, all of them

vary on the basis of their structure in terms of number

of convolution layers or fully connected layers used

in the model. In the next section, we introduce the

reason for deﬁning these three models and after that

we explain them in detail.

5.1 Reason behind Selection of Multiple

Classiﬁers for Experimentation

Our ﬁrst step in understanding the dataset 100 was

to empirically train and test it on a CNN model with

9,35,760 learnable parameters. The model is de-

scribed below:

• Convolutional Layers: The model had four con-

volutional layers, all of them having a kernel of

size 5×5 pixels with number of kernels doubling

from 16 to 128 in each layer. The kernels were

initialized with He normal initializer which draws

samples from a truncated normal distribution (He

Figure 6: Classiﬁcation of three CNN models on the basis of

the learnable parametric distribution between convolutional

layers and fully connected layers.

et al., 2015). ReLU was chosen as the activation

function.

• Pooling layers: Max Pooling was applied with

2×2 pixel sized ﬁlter at a sliding stride of 1.

• Fully Connected (FC) Layers: Two fully con-

nected layers were connected to the feature ex-

tracting layers, one with 512 neurons and the next

one with 128 neurons.

• Activation function for both of them was ReLU

function.

• Dropout Layers: After each FC layer, a dropout

layer with a dropout probability of 25.00% was

placed to reduce over ﬁtting.

• Classiﬁcation Layer: A fully connected layer with

79 neurons (each representing a distinct stroke

class) with Softmax activation function for ﬁnd-

ing the categorical distribution has been used.

This model was trained with 6,320 images (80 stroke

images per class) and tested on an unseen data of

1,580 images (20 stroke images per class). It is ev-

ident that this dataset is small, hence a validation set

could not be created. With a batch size of 246 samples

and 35 epochs in total, testing accuracy turned out to

be 93.65% with a training accuracy of 99.17%. There

ICAART 2020 - 12th International Conference on Agents and Artiﬁcial Intelligence

582

Figure 7: The gap between the training accuracy and vali-

dation accuracy depicts the high bias within the model.

is clear gap between the two accuracies, which can

be seen in Figure 7, guiding us to the fact that over-

ﬁtting is deteriorating the generalization capacity of

the model. After this experiment, repeated scenarios

were created, and after running our data through those

scenarios, we reached to the following conclusions:

• Regularization: Among L1 regularization, L2 reg-

ularization and Dropout regularization, Dropout

seems to work best to reduce over ﬁtting. Thus

for our experiment, we have chosen dropout lay-

ers only.

• Convolutional Layers vs Fully Connected Lay-

ers: Repeated experiments implied that fully con-

nected layers may have a greater percentage of

contribution towards the over-ﬁtting in the model.

The reason for such a phenomenon could be at-

tributed to the fact that larger percentage of train-

able parameters reside in these layers. For com-

parison, only 28.80% of the trainable parameter

are contributed by the convolutional layers, while

the rest of the 71.20% lies within the FC layers for

this model.

These conclusions were the foundation for creating

three variations of a CNN classiﬁer. The variations

have been referred as ConvNet One, ConvNet Two

and ConvNet Three. We assume that the distribution

of trainable parameters between convolutional layer

and fully connected layer, shown in Table 1 can be

used as a useful hyper parameter to control the over-

ﬁtting in the model, thus ﬁnding an optimal classiﬁer

in the process.

5.2 Proposed Models: ConvNet One,

ConvNet Two and ConvNet Three

In Figure 6, the structure of each model is explained

graphically. The fundamental difference between

these models is the percentage of trainable parameters

shared between convolutional layers and fully con-

nected layers. We have stacked convolutional layers

in the second and third model. We hypothesize that

with each model, the effects of over-ﬁtting should di-

minish due to balanced share of trainable parameters.

Stacking convolutional layers could help in extract-

ing enhanced features and we wish to test this in the

experiment.

Table 1: Distribution of trainable parameters between con-

volutional and fully connected layer.

• ConvNet One: This is the simplest model in terms

of number of layers and trainable parameters, the

conﬁguration of this model is as follows:

– Convolutional Layers: The model has three

convolutional layers, all of them having a ker-

nel size of 5×5 pixels with number of ker-

nels doubling from 16 to 64 for the third con-

volutional layer. The kernels were initialized

with He normal initializer which draws samples

from a truncated normal distribution. ReLU

was chosen as the activation function.

– Pooling Layers: Max Pooling was applied with

2 by 2 pixel sized ﬁlter at a sliding stride of 1.

– Fully connected layers: Two FC layers were

connected to the feature extracting layers, one

with 128 neurons and the next one with 64 neu-

rons. Activation function for both of them was

ReLU function.

– Dropout Layers: After each FC layer, a dropout

layer with a dropout probability of 25.00% was

placed to reduce over ﬁtting.

– Classiﬁcation Layer: A fully connected layer

with 79 neurons (each representing a distinct

stroke class) with Softmax activation function

for ﬁnding out the categorical distribution.

• ConvNet Two: This model has two convolutional

layers stacked together three times with dropout

layer after each stack of convolutional layers.

The dropout probability progress from 15.00% to

25.00% and then to 35.00% right after the FC

layer. Other than this, it shares the same conﬁgu-

ration with the ﬁrst model.

• ConvNet Three: In this network, three convolu-

tional layers are stacked together three times with

dropout layer after each stack of convolutional

layers. The dropout probability progresses in the

Recognition of Online Handwritten Gurmukhi Strokes using Convolutional Neural Networks

583

Table 2: Conﬁgurational parameters for the experiment.

same fashion as in ConvNet Two. It shares the

same conﬁgurational details for the rest of the pa-

rameters as ConvNet One or ConvNet Two.

6 EXPERIMENTS AND RESULTS

The experiment, as explained in the methodology sec-

tion, is structured as follows:

• The experiment is divided into 5 training sessions,

each session is associated with training on a par-

ticular dataset. Each session has the same models,

with the same conﬁguration.

• The conﬁgurational parameters for the whole ex-

periment are given in Table 2. The optimizer for

every session was chosen to be an algorithm based

on adaptive moment estimation, called Adam with

the default parameters as mentioned in the origi-

nal paper (Kingma and Ba, 2017).

• Metrics to evaluate the performance of each

model is the testing accuracy on the unseen data.

• The testing set was chosen to be dataset 100 with

7,900 images, which is the original dataset. As

each model was trained in each session, testing

accuracy was recorded for each dataset and we

have produced a graph for each model with ‘im-

age samples per class’ being the independent vari-

able and ‘testing accuracy’ being the dependent

variable.

• The difference between training accuracy and val-

idation accuracy (on the validation dataset) was

recorded for each observation as it is an indica-

tion of decreased over-ﬁtting in a model.

• Among all the recorded observations, the highest

testing accuracy achieved would be awarded the

state of the art Gurmukhi character recognition ac-

curacy for OHWR-Gurmukhi dataset.

This experiment was created and executed to achieve

the goals laid out in the introduction section, and we

discuss them in the next sections, where we present

the model wise results in conformance to the structure

of the experiment presented above.

Table 3: A tabular representation of the progress of training

ConvNet One.

94.15

93.7

94.56

95.12

94.53

500 1000 1500 2000

ConvNet_One: samples per class

testing accuracy

Figure 8: A graphical representation of the performance of

ConvNet One.

6.1 ConvNet One Results

Starting with the difference between the training ac-

curacy and validation accuracy, it is clear that over-

ﬁtting decreases in ConvNet One as the difference

decreases with an increase in data size except for

dataset 2000, shown in Table 3. Apart from that, a

look into Figure 8 reveals that testing accuracy in-

creases as samples per class increase but to a cer-

tain extent, as at dataset 2000, the accuracy has actu-

ally decreased from the previous observation, which

could be attributed due to the increase in over-ﬁtting,

validated from the table, where the difference has in-

creased as well.

6.2 ConvNet Two Results

Here, by looking at Figure 9, it can be observed

that performance of ConvNet Two is better than Con-

vNet One in terms of testing accuracy. Since the

datasets for every model are same, this observation

could be reasoned with the fact that ConvNet Two

has a greater percentage of trainable parameters in the

convolution layers which in turn might have reduced

the over-ﬁtting of the model. Also, the difference be-

tween the training and validation accuracy, as shown

in Table 4, has decreased in accordance to the claim

ICAART 2020 - 12th International Conference on Agents and Artiﬁcial Intelligence

584

that data augmentation can decrease over-ﬁtting.

Table 4: A tabular representation of the progress of training

ConvNet Two.

95.08

95.88

95.49

96.79

500 1000 1500 2000

ConvNet_Two: samples per class

testing accuracy

Figure 9: A graphical representation of the ConvNet Two’s

performance.

92.45

94.65

94.3

95.91

95.65

500 1000 1500 2000

ConvNet_Three: samples per class

testing accuracy

Figure 10: A graphical representation of the Con-

vNet Three’s performance.

6.3 ConvNet Three Results

ConvNet Three has the most equitable distribution of

trainable parameters. Despite this, it is not able to

outperform ConvNet Two, evident in Figure 10 and

Table 5. This shows that the relationship between dis-

tribution of trainable parameters and performance of

a model is non linear, and the experiment conducted

is inadequate to claim that a 50-50% share of train-

Table 5: A tabular representation of the progress of training

ConvNet Three.

able parameters between convolution layer and fully

connected layer produces an optimal classiﬁer.

7 DISCUSSION

It is imperative to note that all three models showcase

a common behaviour, an increase of data sample per

class improves their performance. This gives us two

results : First, we were able to achieve an optimal

Gurmukhi character recognition accuracy of 96.79%.

This is by far the highest accuracy achieved by a clas-

siﬁer on OHWR-Gurmukhi. Secondly, we were able

to prove that usage of data augmentation is beneﬁcial

to the performance of a classiﬁer but to a certain ex-

tent, as at dataset 2000, we saw a slight decline in

performance of ConvNet One and ConvNet Three.

Since data augmentation was done solely through

stroke warping, we believe that this technique is able

to produce alternate character forms that are consis-

tent with the stylistic variation within and between

writers of the collected dataset as claimed earlier by

the cited authors. This belief can be reinforced by our

results, as the generalization accuracy of each of our

model increases with the increase of synthetic sam-

ples generated through stroke warping.

8 CONCLUSION

This paper demonstrates the beneﬁts and limitations

of data augmentation on the Gurmukhi character

strokes dataset, OHWR-Gurmukhi. It also attempts

to ﬁnd a relation between distribution of trainable pa-

rameters in a Convolutional Neural Network and the

performance of a classiﬁer, where ConvNet Two, a

model with a pair of convolution layers stacked to-

gether, is able to achieve the highest testing accuracy

among the three proposed models. We also ﬁnd stroke

warping as an effective technique to augment the data

with. We would like to perform data augmentation

through Generative Adversarial Networks on OHWR-

Gurmukhi. Also, there is a scope of analyzing the

distribution of stylistic variations within and between

Recognition of Online Handwritten Gurmukhi Strokes using Convolutional Neural Networks

585

writers for a particular language which would help in

building more powerful classiﬁers in this classiﬁca-

tion ﬁeld.

REFERENCES

Alom, M. Z., Sidike, P., Taha, T. M., and Asari, V. K.

(2017). Handwritten Bangla Digit Recognition using

Deep Learning. https://arxiv.org/abs/1705.02680/.

Baird, H. S. (1989). The State of the Art of Document Im-

age Degradation Modeling. Xerox Palo Alto Research

Center.

Connell, S. D. and Jain, A. K. (2002). Writer Adaptation

for Online Handwriting Recognition. IEEE Transac-

tions on Pattern Analysis and Machine Intelligence,

24(3):329–346.

He, K., Zhang, X., Ren, S., and Sun, J. (2015). Delving

Deep into Rectiﬁers: Surpassing Human-Level Per-

formance on ImageNet Classiﬁcation. https://arxiv.

org/abs/1502.01852.

Jackel, L., Battista, M., Baird, H., Ben, J., Bromley, J.,

Burges, C., Cosatto, E., Denker, J., Graf, H., Kat-

seff, H., Lecun, Y., Nohl, C., Sackinger, E., Shamil-

ian, J., Shoemaker, T., Stenard, C., Strom, I., Ting, R.,

Wood, T., and Zuraw, C. (1995). Neural-net applica-

tions in Character Recognition and Document Analy-

sis. In Neural-net applications in telecommunications.

Kluwer Academic Publishers.

Keysers, D., Deselaers, T., Rowley, H., Wang, L., and Car-

bune, V. (2017). Multi-Language Online Handwriting

Recognition. IEEE Transactions on Pattern Analysis

and Machine Intelligence, 39(6):1180–1194.

Kingma, D. P. and Ba, J. L. (2017). ADAM: A METHOD

FOR STOCHASTIC OPTIMIZATION. https://arxiv.

org/abs/1412.6980.

Kumar, R. and Sharma, R. K. (2013). An Efﬁcient Post

Processing Alogrithm For Online Handwriting Gur-

mukhi Character Recognition Using Set Theory. In-

ternational Journal of Pattern Recognition and Artiﬁ-

cial Intelligence, 27(4).

Sharma, A., Sharma, R., and Kumar, R. (2007). Online

Handwritten Gurmukhi Strokes Preprocessing. In Ma-

chine GRAPHICS & VISION.

Verma, K. and Sharma, R. K. (2016). Comparison of

HMM- and SVM-based Stroke Classiﬁers for Gur-

mukhi Script. In The Natural Computing Applications

Forum 2016.

Verma, K. and Sharma, R. K. (2017). Recognition of Online

Handwritten Gurmukhi Characters Based on Zone and

Stroke identiﬁcation. Sadhana, 42:701–712.

Wong, S. C., Gatt, A., Stamatescu, V., and McDonnell,

M. D. (2016). Understanding data augmentation for

classiﬁcation: when to warp? https://arxiv.org/abs/

1609.08764.

Xiao, X., Yang, Y., Ahmad, T., Jin, L., and Chang, T.

(2017). Design of a Very Compact CNN Classiﬁer for

Online Handwritten Chinese Character Recognition

Using DropWeight and Global Pooling. 14th IAPR

International Conference on Document Analysis and

Recognition, pages 891–895.

Yaegar, L., Lyon, R., and Webb, B. (1996). Effective Train-

ing of a Neural Network Character Classiﬁer for Word

Recognition. In Advances in Neural Information Pro-

cessing Systems (NIPS).

ICAART 2020 - 12th International Conference on Agents and Artiﬁcial Intelligence

586