INFORMATION RETRIEVAL FROM HISTORICAL DOCUMENT IMAGE BASE

Khurram Khurshid; Imran Siddiqi; Claudie Faure; Nicole Vincent

doi:10.5220/0003087401880193

INFORMATION RETRIEVAL FROM HISTORICAL DOCUMENT IMAGE BASE

Khurram Khurshid, Imran Siddiqi, Claudie Faure, Nicole Vincent

2010

Abstract

This communication presents an effective method for information retrieval from historical document image base. Proposed approach is based on word and character extraction in the text and attributing certain feature vectors to each of the character images. Words are matched by comparing their characters through a multi-stage Dynamic Time warping (DTW) stage on the extracted feature set. The approach exhibits extremely promising results reading more than 96% retrieval/recognition rate.

References

Adamek, T., O'Connor, N. E., Smeaton, A. F., 2007. Word matching using single closed contours for indexing handwritten historical documents, IJDAR, 9, 153 - 165
Baird H. S., 2004. Difficult and urgent open problems in document image analysis for libraries, 1st International workshop on Document Image Analysis for Libraries
Digital Library of BIUM (Bibliothèque Interuniversitaire de Médecine, Paris), http://www.bium.univparis5.fr/histmed/medica.htm
Keogh, E. and Pazzani, M., 2001. Derivative Dynamic Time Warping, First SIAM International Conference on Data Mining, Chicago, IL.
Rath, T. M, Manmatha, R., 2007. Word Spotting for historical documents, IJDAR, 9, 139-152
Khurshid, K., Faure, C., Vincent, N., 2008. Feature based word spotting in ancient printed documents, 8th International workshop on pattern recognition in information systems, Spain
Khurshid, K., Faure, C., Vincent, N., 2009. A novel approach for word spotting using Merge-Split Edit distance, CAIP'09, Germany
Khurshid, K., Siddiqi, I., Faure, C., Vincent, N., 2009. Comparison of Niblack inspired binarization techniques for ancient document images, 16th internation conference of document recognition and retrieval (DRR), USA.
Leedham, G., Yan, C., Takru, K., Hadi, J., Tan, N., Mian, L., 2003. Comparison of Some Thresholding Algorithms for Text/Background Segmentation in Difficult Document Images, 7th International Conference on Document Analysis and Recognition, USA
Pujari, A. K., Naidu, C. D., Jinaga, B. C. 2002. An adaptive character recogniser for telugu scripts using multiresolution analysis and associative memory, ICVGIP'02
Rothfeder, J. L., Feng, S., Rath, T. M., 2003. Using corner feature correspondences to rank word images by similarity, Conference on Computer Vision and Pattern Recognition Workshop, Madison, USA
Wagner, R. A., Fischer, M. J., 1974. The string-to-string correction Problem, Journal of ACM, v21, pp 168-173
Wang, K. Y., Casey, R. G., Wahl, F. M., 1982. Document analysis system, IBM J. Res.Development, Vol. 26, pp. 647-656

Download

Paper Citation

in Harvard Style

Khurshid K., Siddiqi I., Faure C. and Vincent N. (2010). INFORMATION RETRIEVAL FROM HISTORICAL DOCUMENT IMAGE BASE . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010) ISBN 978-989-8425-28-7, pages 188-193. DOI: 10.5220/0003087401880193

in Bibtex Style

@conference{kdir10,
author={Khurram Khurshid and Imran Siddiqi and Claudie Faure and Nicole Vincent},
title={INFORMATION RETRIEVAL FROM HISTORICAL DOCUMENT IMAGE BASE},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010)},
year={2010},
pages={188-193},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003087401880193},
isbn={978-989-8425-28-7},
}

in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010)
TI - INFORMATION RETRIEVAL FROM HISTORICAL DOCUMENT IMAGE BASE
SN - 978-989-8425-28-7
AU - Khurshid K.
AU - Siddiqi I.
AU - Faure C.
AU - Vincent N.
PY - 2010
SP - 188
EP - 193
DO - 10.5220/0003087401880193