database. Also, the HLS features can be combined
with other features based on the detailed analysis of
error and misclassification to improve the retrieval re-
sults.
REFERENCES
Antani, S. and Agnihotri, L. (1999). Gujarati character
recognition. In Proc. of the 5th Int. Conf. on Doc-
ument Analysisand Recognition (ICDAR’99), pages
418–421.
Aparna, K. and Ramakrishnan, A. (2002). A complete tamil
optical character recognition system. In Lopresti, D.,
Hu, J., and Kashi, R., editors, Document Analysis Sys-
tems V, pages 53–57. Springer Berlin / Heidelberg.
Bhardwaj, A., Damien, J., and Govindaraju, V. (2008).
Script independent word spotting in multilingual doc-
uments. In Proc. of 2nd Int. Workshop on Cross Lin-
gual Information Access, pages 48–54.
Charles, S. and McCallum, A. (2011). Introduction to con-
ditional random fields. Foundation and Trends in Ma-
chine Learning, 4(4):267–373.
Chaudhuri, B. and Pal, U. (1998). A complete printed
bangla ocr system. Pattern Recognition, 31(5):531–
549.
Chaudhuri, B., Pal, U., and Mitra, M. (2001). Automatic
recognition of printed oriya script. In Proc. of the 6th
Int. Conf. on Document Analysis and Recognition (IC-
DAR’01), pages 795–799. IEE.
Chaudhury, S., Sethi, G., Vyas, A., and Harit, G. (2003).
Devising interactive access techniques for indian lan-
guage document images. In Proc. of the Int. Conf. on
Document Analysis and Recognition (ICDAR), pages
885–889.
Dholakia, J., Yajnik, A., and Negi, A. (2007). Wavelet fea-
ture based confusion character sets for gujarati script.
In Proc. of the Int. Conf. on Computational Intelli-
gence and Multimedia Applications, pages 366–370.
Doermann, D. (1998). The indexing and retrieval of docu-
ment images: A survey. Computer Vision and Image
Understanding, 70(3):287–298.
Goswami, M. and Mitra, S. K. (2015). Classification of
printed gujarati characters using low-level stroke fea-
tures. ACM Trans. Asian Low-Resour. Lang. Inf. Pro-
cess., 15(4):25:1–26.
Goswami, M., Prajapati, H., and Dabhi, V. (2011). Classi-
fication of printed gujarati characters using som based
k-nearest neighbor classifier. In Proc. of the Int. Conf.
on Image Information Processing, pages 1–5. IEEE.
Hassan, E., Chaudhury, S., and Gopal, M. (2009). Shape de-
scriptor based document image indexing and symbol
recognition. In Proc. of the 10th Int. Conf. on Doc-
ument Analysis and Recognition (ICDAR’09), pages
206–210.
Hassan, E., Chaudhury, S., and Gopal, M. (2014). Feature
combination for binary pattern classification. Interna-
tional Journal of Document Analysis and Recognition
(IJDAR), 17(4):375–392.
Jawahar, C., Kumar, P., and Kiran, S. (2003). A bilingual
ocr for hindi-telugu documents and its applications. In
Proc. of the 7th Int. Conf. on Document Analysis and
Recognition (ICDAR’03), pages 408–412.
Jawahar, C. V., Balasubramanian, A., and M., M. (2004).
Word-level access to document image datasets. In
Proceedings of the workshop on computer vision,
graphics and image processing.
Kompalli, S., Setlur, S., and Govindaraju, V. (2005). Chal-
lenges in ocr of devanagari documents. In Proc. of the
8th Int. Conf. on Document Analysis and Recognition
(ICDAR’05), pages 1–5. IEEE.
Kumar, A., Jawahar, C., and Manmatha, R. (2007). Efficient
search in document image collections. In Yagi, Y.,
editor, ACCV:LNCS, volume 1 of 4843, pages 586–
595. Springer-Verlag Berlin / Heidelberg.
Lakshmi, C. and Patvardhan, C. (2002). A multi-font ocr
system for printed telugu text. In Proc. of the Lan-
gauge Engineering Conference, pages 7–17.
Lehal, G. and Singh, C. (2000). A gurmukhi script recogni-
tion system. In Proc. of the 15th Int. Conf. on Pattern
Recognition (ICPR’00), pages 557–560.
Meshesha, M. and Jawahar, C. (2008). Matching of word
image for content-based retrieval from printed doc-
ument images. International Journal of Document
Analysis and Recognition (IJDAR), 11(1):29–38.
Murphy, K. (2012). Machine Learning: A Probabilistic Per-
spective. The MIT Press, Cambridge, Massachusetts
London, England.
Needleman, S. B. and Wunsch, C. D. (1970). A gen-
eral method applicable to the search for similarities
in the amino acid sequence of two proteins. Journal
of Molecular Biology, 48(3):443–453.
Rath, T. and Manmatha, R. (2003). Word image match-
ing using dynamic time wrapping. In Proc. of the
Int. Conf. on Computer Vision and Pattern Recogni-
tion (ICVRP), volume 2, pages 521–527.
Srihari, S., Srinivasan, H., Huang, C., and Shetty, S. (2006).
Spotting words in latin, devanagari and arabic scripts.
Vivek, 16(3):2–9.
Suthar, S., Goswami, M., and Thakkar, A. (2014). Empir-
ical study of thinning algorithms on printed gujarati
characters and handwritten numerals. In Meenakshi,
N., editor, Proc. of the 2nd Int. Conf. on Emerging Re-
search in Computing, Information, Communication,
and Applications (ERCICA’14), volume 2, pages 104–
110. ELSEVIER.
Tarafdar, A., Mondal, R., Pal, S., Pal, U., and Kimura, F.
(2010). Shape code based word-image matching for
retrieval of indian multi-lingual documents. In Proc.
of the Int. Conf. on Pattern Recognition (ICPR), pages
1989–1992.
Yang, M., Kpalma, K., and Ronsin, J. (2008). A survey of
shape feature extraction techniques. In Yin, P., editor,
Pattern Recognition, pages 43–90. IN-TECH.
High Level Shape Representation in Printed Gujarati Character
425