if (
), then S
i = 1
else
min ,
max ,
ij
i
ij
fg
S
fg
Where:
f
i
is the feature vector of the i-th character
in the mean set of the training sets
g
j
is the feature vector of the j-th character
in the document.
The similarity degree between character images
f(x) and g(x) is the sum of the similarity degrees
between the corresponding n elements of the feature
vectors derived from the two images, and defined as:
Where:
n is the number of features , n<=66
10 EXPERIMENTAL RESULTS
AND DISCUSSION
The effectiveness and performance of our algorithm
have been tested on samples collected from various
images of legal documents belonging to one city.
For testing our method, around 200 printed Tamil
text documents are scanned at 300 dpi and binarized
using the two-stage method described in (Dhanya et
al., 2001). For providing the text documents, the
Azhagi editor is used. Six different fonts (Figure 1)
are implemented for creating the documents.
The textual lines and words segments are
determined from valley points in the horizontal and
vertical projection profiles. A one-pixel margin is
kept while detecting zone boundaries of the
characters. However, it is assumed that all the
characters of a text line are of the same font size.
The extracted characters are normalized to the size
of 40×40 pixels. After the character boxes are
extracted, before starting the character recognition
and recognizing the touching characters process, the
documents are checked for their skew (Pilevar and
Ramakrishnan, 2006). In the ultimate experiment,
ten sets of separate documents given to the EDM
software system, the documents are segmented into
about 100000 characters, and more than 97% of
characters are recognized correctly. However we
couldn’t find any similar work to compare ours with,
but we believe that the outcome of this research is
satisfactory and can be used as a base in Tamil
character recognition systems in practical works.
REFERENCES
Bansal V., R. Sinha, “Segmentation of touching and fused
Devanagari characters”, Pattern Recognition 35, 875–
893, 2002.
Davessar N. M., S. Madan, and H. Singh, “A Hybrid
Approach to Character Segmentation of Gurmukhi
Script Characters,” Pattern Recognition, pp. 4-8, 2003.
Dhanya, D.: “Bilingual OCR for Tamil and Roman scripts.
Master’s thesis, Department of Electrical
Engineering”, Indian Institute of Science, 2001.
Electronics N., C. T. Center, and K. Luang, “Using
Projection and Loop for Segmentation of Touching
Thai Typewritten,” Analysis, vol. 2004, pp. 504-508,
2004.
Faure, C., Vincent, N., “Simultaneous detection of vertical
and horizontal text lines based on perceptual organi-
zation”, Proceedings of SPIE - The International
Society for Optical Engineering, Volume 7247, 2009.
Grailu, H., Lotfizad, M., Sadoghi-Yazdi, H, “A
lossy/lossless compression method for printed typeset
bi-level text images based on improved pattern
matching”, International Journal on Document
Analysis and Recognition, pp. 1-24, 2009.
Hotta, Y., Fujimoto, K., “Line-touching character
recognition based on dynamic reference feature
synthesis”, Proceedings of SPIE - The International
Society for Optical Engineering Volume 6815, 2008.
Kumar S. and Muhammad Mashroor Ali, “An Efficient
Object Scaling Algorithm for raster device”, Graphics
and Image Processing, NCCIS, 1997.
Li Y., S. Naoi, and M. Cheriet, “A Segmentation Method
for Touching Italic Characters,” Pattern Recognition,
pp. 2-5, 2004.
Li Y., S., M. Cheriet, Ching Y, Suen, “A Segmentation
Method for Touching Italic Characters”, Proceedings
of the 17th International Conference on Pattern
Recognition (ICPR’04), 1051-4651/ 2004.
Liang S., M. Shridhar and M. Ahmadi, “Segmentation of
touching characters in printed document recognition”,
Pattern Recognition, Vol. 27, No. 6, pp. 825 840,
1994.
Lu X., X. Liu, G. Xiao, E. Song, P. Li, and Q. Luo, “A
Segment Extraction Algorithm Based on Polygonal
Approximation for On-Line Chinese Character
Recognition,” Japan-China Joint Workshop on
Frontier of Computer Science and Technology, pp.
204-207, 2008.
Ode, Ã., Tveit, M., Fry, G., “Capturing landscape visual
character using indicators: Touching base with
landscape aesthetic theory”, Landscape Research,
volume 33, Issue 1, pp. 89-117, February 2008.
Pilevar A. H, A. G. Ramakrishnan, “Inversion detection in
text document images”, 9th Joint Conference on
Information Science, Taiwan, 2006
Pilevar A. H., “Retrieval of signal from Biomedical
Databases some new approaches”, Ph D thesis,
University of Mysore, 2005.
Sattar, Md. A., Mahmud, K., Arafat, H., Noor Uz Zaman,
A. F. M., “Segmenting Bangla text for optical
ICSOFT 2012 - 7th International Conference on Software Paradigm Trends
492