instrument is analyzed on statistical basis. We take
the distance of feature vectors between writings of
the same writing instrument and call it as a within
writing instrument distance denoted by d
w
. The be-
tween writing instrument distance d
b
is obtained by
measuring the distance between two different writing
instruments.
d
w
= d(f
ij
− f
ik
) where i = 1 to n and j, k=1 to m.
d
b
= d(f
ij
− f
kl
) where i, k= 1 to n and j, l=1 to m.
where n is the number of writing instruments, m
is the number of sample images written by each writ-
ing instrument, f
ij
etc. are the feature vectors of the
corresponding images, and d is the distance between
two feature vectors of an image. Let n
w
and n
b
are the
sizes of within and between writing instrument dis-
tance classes respectively. If n writing instruments
provide m writings, there are n
w
= m
C
2
×n within
writing instrument distance data and n
b
= m×m×n
C
2
.
In our data collection we have taken 15 ball pens and
15 gel and roller pens. For each pen 15 images are
taken.
n
w
= 30 x 15
C
2
= 3150 data.
n
b
= 15 x 15 x 30
C
2
= 97,875 data.
A good descriptive way to represent the relation-
ship between two classes is calculating overlaps be-
tween two distributions. It can be done with two types
of errors. Type I error occurs when the images of
same writing instrument are identified as of different
writing instruments. The type II error occurs when the
images of different writing instruments is classified as
of same writing instrument.
4 METHODOLOGY
Twenty Five black pens including ball,roller,gel and
fountain pens with different manufactures were taken.
A page containing 100 words was written by each of
these pens on A4 size white xerox paper and from that
page selected samples (20) were scanned at high opti-
cal resolution i.e,1200 dpi. Algorithm for classifying
inks is comprising of the following steps.
1. Select suitable threshold values of R, G and B
from the RGB histograms to seperate background
and foreground pixels.
2. Fit the foreground data into a plane in the RGB
cube.
3. Find the distance d from the fitted plane to the
pure black point (0,0,0). Find R
2
coefficient of
determination and MSE.
4. Classify liquid and Viscous Inks using the dis-
tance from the pure black (if d is ≥ 15, it is vis-
cous ink otherwise it is liquid ink) .
Figure 1: Writings of sample image of cello ball and Add
roller pens.
Algorithm for identification of inks comprises the fol-
lowing steps.
1. Seperate background and foreground pixels.
2. Find the feature vector (mean colour) (r,g,b) of the
foreground pixels. Find its equivalent h, s, v val-
ues. Find the distance d between the mean colour
vectors of two images.
3. Label the distance as within writing instrument
distance, if the two images belong to the same pen
or as between writing instrument distance, if two
images belong to different pens.
4. Repeat the above step for all the images of the
database. Find the Type I and II errors from the
distributions of within and between writing instru-
ment distances.
5 RESULTS AND DISCUSSION
Figure 1 shows the sample images taken using Cello
ball, Add roller pen. The data from the sample writ-
ings of different pens is fitted into a corresponding
plane in the RGB cube. Figure 2 shows the fitted data
of ball pen, actual data in to the corresponding planes
in the RGB cubic space. The green coloured plane in-
dicates the fitted plane of the data. The red coloured
pixels indicate actual data of the scanned image. The
blue coloured pixels indicated the estimated pixels.
The coefficient of determination ”R-Squared ratio” in
fitting the data of Cello ballpen image using regres-
sion is 0.999268 and MSE is 14.7248. The R
2
ratio in
fitting the data of Add roller pen image using regres-
sion is 0.999586 and MSE is 7.6987. We can observe
that R
2
ratio is closure to 1 indicating that regression
is ”good”.
We have taken two datasets, one for training and
analysis and another for testing purpose.The first
dataset comprises of 15 ball pens and 10 roller/gel
pens each of 20 sample images. The second dataset
testset comprises of 10 ball pens and 10 gel/roller
pens each of 10 sample images. The results were
analysed using False Acception Ratio (FAR) and
False Rejection Ratio (FRR). The FAR and FRR are
calculated for classification of liquid inks and viscous