Automatic Analysis of Historical Manuscripts

Costantino Grana, Daniele Borghesani, Rita Cucchiara

Abstract

In this paper a document analysis tool for historical manuscripts is proposed. The goal is to automatically segment layout components of the page, that is text, pictures and decorations. We specifically focused on the pictures, proposing a set of visual features able to identify significant pictures and separating them from all the floral and abstract decorations. The analysis is performed by blocks using a limited set of color and texture features, including a new texture descriptor particularly effective for this task, namely Gradient Spatial Dependency Matrix. The feature vectors are processed by an embedding procedure which allows increased performance in later SVM classification.

References

  1. Nagy, G.: Twenty years of document image analysis in PAMI. IEEE Trans Pattern Anal Mach Intell 22 (2000) 38-62
  2. Chen, N., Blostein, D.: A survey of document image classification: problem statement, classifier architecture and performance evaluation. Int J Doc Anal Recogn 10 (2007) 1-16
  3. Journet, N., Ramel, J., Mullot, R., Eglin, V.: Document image characterization using a multiresolution analysis of the texture: application to old documents. Int J Doc Anal Recogn 11 (2008) 9-18
  4. Nicolas, S., Dardenne, J., Paquet, T., Heutte, L.: Document Image Segmentation Using a 2D Conditional Random Field Model. In: Proc Int Conf on Document Analysis and Recognition. Volume 1. (2007) 407-411
  5. Meng, G., Zheng, N., Song, Y., Zhang, Y.: Document Images Retrieval Based on Multiple Features Combination. In: Proc Int Conf on Document Analysis and Recognition. Volume 1. (2007) 143-147
  6. Kitamoto, A., Onishi, M., Ikezaki, T., Deuff, D., Meyer, E., Sato, S., Muramatsu, T., Kamida, R., Yamamoto, T., Ono, K.: Digital Bleaching and Content Extraction for the Digital Archive of Rare Books. In: Proc Int Conf on Document Image Analysis for Libraries. (2006) 133-144
  7. Le Bourgeois, F., Trinh, E., Allier, B., Eglin, V., Emptoz, H.: Document Images Analysis Solutions for Digital libraries. In: Proc Int Workshop on Document Image Analysis for Libraries. (2004) 2-24
  8. Le Bourgeois, F., Emptoz, H.: DEBORA: Digital accEss to BOoks of the RenAissance. Int J Doc Anal Recogn 9 (2007) 193-221
  9. Grana, C., Borghesani, D., Cucchiara, R.: Describing Texture Directions with Von Mises Distributions. In: Proc Int Conf on Pattern Recognition. (2008)
  10. Konidaris, T., Gatos, B., Ntzios, K., Pratikakis, I., Theodoridis, S., Perantonis, S.: Keywordguided word spotting in historical printed documents using synthetic data and user feedback. Int J Doc Anal Recogn 9 (2007) 167-177
  11. Grana, C., Vezzani, R., Cucchiara, R.: Enhancing HSV Histograms with Achromatic Points Detection for Video Retrieval. In: Proc Int Conf on Image and Video Retrieval. (2007) 302- 308
  12. Haralick, R.M. and Shanmugam, K. and Dinstein, I.: Textural features for image classification. IEEE Trans Syst Man Cybern 3 (1973) 610-621
  13. Jain, A., Dubes, R.: Algorithms for clustering data. Prentice-Hall, Inc. (1988)
  14. Pekalska, E., Duin, R.P.W.: Dissimilarity representations allow for building good classifiers. Pattern Recognition Letters 23 (2002) 943-956
Download


Paper Citation


in Harvard Style

Grana C., Borghesani D. and Cucchiara R. (2009). Automatic Analysis of Historical Manuscripts . In Proceedings of the 9th International Workshop on Pattern Recognition in Information Systems - Volume 1: PRIS, (ICEIS 2009) ISBN 978-989-8111-89-0, pages 93-102. DOI: 10.5220/0002200500930102


in Bibtex Style

@conference{pris09,
author={Costantino Grana and Daniele Borghesani and Rita Cucchiara},
title={Automatic Analysis of Historical Manuscripts},
booktitle={Proceedings of the 9th International Workshop on Pattern Recognition in Information Systems - Volume 1: PRIS, (ICEIS 2009)},
year={2009},
pages={93-102},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002200500930102},
isbn={978-989-8111-89-0},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 9th International Workshop on Pattern Recognition in Information Systems - Volume 1: PRIS, (ICEIS 2009)
TI - Automatic Analysis of Historical Manuscripts
SN - 978-989-8111-89-0
AU - Grana C.
AU - Borghesani D.
AU - Cucchiara R.
PY - 2009
SP - 93
EP - 102
DO - 10.5220/0002200500930102