INFORMATION RETRIEVAL FROM HISTORICAL DOCUMENT IMAGE BASE

Khurram Khurshid, Imran Siddiqi, Claudie Faure, Nicole Vincent

Abstract

This communication presents an effective method for information retrieval from historical document image base. Proposed approach is based on word and character extraction in the text and attributing certain feature vectors to each of the character images. Words are matched by comparing their characters through a multi-stage Dynamic Time warping (DTW) stage on the extracted feature set. The approach exhibits extremely promising results reading more than 96% retrieval/recognition rate.

References

  1. Adamek, T., O'Connor, N. E., Smeaton, A. F., 2007. Word matching using single closed contours for indexing handwritten historical documents, IJDAR, 9, 153 - 165
  2. Baird H. S., 2004. Difficult and urgent open problems in document image analysis for libraries, 1st International workshop on Document Image Analysis for Libraries
  3. Digital Library of BIUM (Bibliothèque Interuniversitaire de Médecine, Paris), http://www.bium.univparis5.fr/histmed/medica.htm
  4. Keogh, E. and Pazzani, M., 2001. Derivative Dynamic Time Warping, First SIAM International Conference on Data Mining, Chicago, IL.
  5. Rath, T. M, Manmatha, R., 2007. Word Spotting for historical documents, IJDAR, 9, 139-152
  6. Khurshid, K., Faure, C., Vincent, N., 2008. Feature based word spotting in ancient printed documents, 8th International workshop on pattern recognition in information systems, Spain
  7. Khurshid, K., Faure, C., Vincent, N., 2009. A novel approach for word spotting using Merge-Split Edit distance, CAIP'09, Germany
  8. Khurshid, K., Siddiqi, I., Faure, C., Vincent, N., 2009. Comparison of Niblack inspired binarization techniques for ancient document images, 16th internation conference of document recognition and retrieval (DRR), USA.
  9. Leedham, G., Yan, C., Takru, K., Hadi, J., Tan, N., Mian, L., 2003. Comparison of Some Thresholding Algorithms for Text/Background Segmentation in Difficult Document Images, 7th International Conference on Document Analysis and Recognition, USA
  10. Pujari, A. K., Naidu, C. D., Jinaga, B. C. 2002. An adaptive character recogniser for telugu scripts using multiresolution analysis and associative memory, ICVGIP'02
  11. Rothfeder, J. L., Feng, S., Rath, T. M., 2003. Using corner feature correspondences to rank word images by similarity, Conference on Computer Vision and Pattern Recognition Workshop, Madison, USA
  12. Wagner, R. A., Fischer, M. J., 1974. The string-to-string correction Problem, Journal of ACM, v21, pp 168-173
  13. Wang, K. Y., Casey, R. G., Wahl, F. M., 1982. Document analysis system, IBM J. Res.Development, Vol. 26, pp. 647-656
Download


Paper Citation


in Harvard Style

Khurshid K., Siddiqi I., Faure C. and Vincent N. (2010). INFORMATION RETRIEVAL FROM HISTORICAL DOCUMENT IMAGE BASE . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010) ISBN 978-989-8425-28-7, pages 188-193. DOI: 10.5220/0003087401880193


in Bibtex Style

@conference{kdir10,
author={Khurram Khurshid and Imran Siddiqi and Claudie Faure and Nicole Vincent},
title={INFORMATION RETRIEVAL FROM HISTORICAL DOCUMENT IMAGE BASE},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010)},
year={2010},
pages={188-193},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003087401880193},
isbn={978-989-8425-28-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010)
TI - INFORMATION RETRIEVAL FROM HISTORICAL DOCUMENT IMAGE BASE
SN - 978-989-8425-28-7
AU - Khurshid K.
AU - Siddiqi I.
AU - Faure C.
AU - Vincent N.
PY - 2010
SP - 188
EP - 193
DO - 10.5220/0003087401880193