The GIDOC Prototype

N. Serrano, L. Tarazón, D. Pérez, O. Ramos Terrades, A. Juan


Transcription of handwritten text in (old) documents is an important, time-consuming task for digital libraries. In this paper, an efficient interactivepredictive transcription prototype called GIDOC (Gimp-based Interactive transcription of old text DOCuments) is presented. GIDOC is a first attempt to provide integrated support for interactive-predictive page layout analysis, text line detection and handwritten text transcription. It is based on GIMP and uses advanced techniques and tools for language and handwritten text modelling. Results are given on a real transcription task on a 764-page Spanish manuscript from 1891.


  1. R. Bertolami and H. Bunke. Hidden Markov model-based ensemble methods for offline handwritten text line recognition. Pattern Recognition, 41:3452-3460, 2008.
  2. L. Likforman-Sulem, A. Zahour, and B. Taconet. Text line segmentation of historical documents: a survey. International Journal on Document Analysis and Recognition, 9, 2007.
  3. Daniel Pérez, Lionel Tarazón, Nicolas Serrano, Francisco-Manuel Castro, Oriol RamosTerrades, and Alfons Juan. The GERMANA database. In Proc. of the 10th Int. Conf. on Document Analysis and Recognition (ICDAR 2009), pages 301-305, Barcelona (Spain), July 2009.
  4. O. Ramos-Terrades, N. Serrano, A. Gordó, E. Valveny, and A. Juan. Interactive-predictive detection of handwritten text blocks. In Document Recognition and Retrieval XVII (Proc. of SPIE-IS&T Electronic Imaging), pages 75340Q-(1-10), San Jose, CA (USA), January 2010.
  5. N. Serrano, F. Castro, and A. Juan. The RODRIGO database. In Proc. of the 8th Language Resources and Evaluation Conf. (LREC 2010), Valleta (Malta), May 2010.
  6. Nicolás Serrano, Alfons Juan, et al. The GIDOC prototype., 2009.
  7. Nicolás Serrano, Daniel Pérez, Albert Sanchis, and Alfons Juan. Adaptation from Partially Supervised Handwritten Text Transcriptions. In Proc. of the 11th Int. Conf. on Multimodal Interfaces and the 6th Workshop on Machine Learning for Multimodal Interaction (ICMIMLMI 2009), pages 289-292, Cambridge, MA (USA), November 2009.
  8. Nicolás Serrano, Albert Sanchis, and Alfons Juan. Balancing error and supervision effort in interactive-predictive handwriting recognition. In Proc. of the 14th Int. Conf. on Intelligent User Interfaces (IUI 2010), Hong Kong (China), February 2010.
  9. A. Stolcke. SRILM - An Extensible Language Modeling Toolkit. In Proc. of the Int. Conf. on Spoken Language Processing, pages 901-904, Denver, CO (USA), 2002.
  10. A. H. Toselli, A. Juan, D. Keysers, J. Gonzlez, I. Salvador, H. Ney, E. Vidal, and F. Casacuberta. Integrated Handwriting Recognition and Interpretation using Finite-State Models. International Journal of Pattern Recognition and Artificial Intelligence, 18(4):519-539, 2004.
  11. S. Young et al. The HTK Book. Cambridge University Engineering Department, 1995.

Paper Citation

in Harvard Style

Serrano N., Tarazón L., Pérez D., Ramos Terrades O. and Juan A. (2010). The GIDOC Prototype . In Proceedings of the 10th International Workshop on Pattern Recognition in Information Systems - Volume 1: PRIS, (ICEIS 2010) ISBN 978-989-8425-14-0, pages 82-89. DOI: 10.5220/0003028300820089

in Bibtex Style

author={N. Serrano and L. Tarazón and D. Pérez and O. Ramos Terrades and A. Juan},
title={The GIDOC Prototype},
booktitle={Proceedings of the 10th International Workshop on Pattern Recognition in Information Systems - Volume 1: PRIS, (ICEIS 2010)},

in EndNote Style

JO - Proceedings of the 10th International Workshop on Pattern Recognition in Information Systems - Volume 1: PRIS, (ICEIS 2010)
TI - The GIDOC Prototype
SN - 978-989-8425-14-0
AU - Serrano N.
AU - Tarazón L.
AU - Pérez D.
AU - Ramos Terrades O.
AU - Juan A.
PY - 2010
SP - 82
EP - 89
DO - 10.5220/0003028300820089