UNSUPERVISED HANDWRITTEN GRAPHICAL SYMBOL LEARNING - Using Minimum Description Length Principle on Relational Graph

Jinpeng Li, Christian Viard-Gaudin, Harold Mouchere

Abstract

Generally, the approaches encountered in the field of handwriting recognition require the knowledge of the symbol set, and of as many as possible ground-truthed samples, so that machine learning based approaches can be implemented. In this work, we propose the discovery of the symbol set that is used in the context of a graphical language produced by on-line handwriting. We consider the case of a two-dimensional graphical language such as mathematical expression composition, where not only left to right layouts have to be considered. Firstly, we select relevant graphemes using hierarchical clustering. Secondly, we build a relational graph between the strokes defining an handwritten expression. Thirdly, we extract the lexicon which is a set of graph substructures using the minimum description length principle. For the assessment of the extracted lexicon, a hierarchical segmentation task is introduced. From the experiments we conducted, a recall rate of 84.2% is reported on the test part of our database produced by 100 writers.

References

  1. Alexander Clark, C. F. and Lappin, S. (2010). The Handbook of Computational Linguistics and Natural Language Processing. Wiley-Blackwell.
  2. Awal, A. M. (2010). Reconnaissance de structures bidimensionnelles: Application aux expressions mathématiques manuscrites en-ligne. PhD thesis, Ecole polytechnique de l'université de Nantes, France.
  3. Baird, J. C. (1970). Psychophysical analysis of visual space. Oxford, London: Pergamon Press.
  4. Carroll, G., Carroll, G., Charniak, E., and Charniak, E. (1992). Two experiments on learning probabilistic dependency grammars from corpora. In Working Notes of the Workshop Statistically-Based NLP Techniques, pages 1-13. AAAI.
  5. Chartrand, G. (1985). Introductory Graph Theory. Dover Publications.
  6. Chomsky, N. (1956). Three models for the description of language. IRE Transactions on Information Theory, 2:113-124.
  7. Cook, D. J. and Holder, L. B. (1994). Substructure discovery using minimum description length and background knowledge. J. Artif. Int. Res., 1:231-255.
  8. Cook, D. J. and Holder, L. B. (2011). Substructure discovery using examples. http://ailab.wsu.edu/subdue/.
  9. Francis, N. W. and Kuc?era, H. (1982). Frequency Analysis of English Usage: Lexicon and Grammar., volume 18. Houghton Mifflin, Boston.
  10. Gold, E. M. (1967). Language identification in the limit. Information and Control, 10(5):447-474.
  11. Jonyer, I., Holder, L. B., and Cook, D. J. (2000). Graphbased hierarchical conceptual clustering. International Journal on Artificial Intelligence Tools, 2:107- 135.
  12. Li, J., Mouchere, H., and Viard-Gaudin, C. (2011). Symbol knowledge extraction from a simple graphical language. In ICDAR2011.
  13. Marcken, C. D. (1996a). Linguistic structure as composition and perturbation. In In Meeting of the Association for Computational Linguistics, pages 335-341. Morgan Kaufmann Publishers.
  14. Marcken, C. D. (1996b). Unsupervised Language Acquisition. PhD thesis, Massachusetts Institute of Technology.
  15. Rhee, T. H. and Kim, J. H. (2009). Efficient search strategy in structural analysis for handwritten mathematical expression recognition. Pattern Recognition, 42(12):3192 - 3201.
  16. Rissanen, J. (1978). Modeling by shortest data description. Automatica, 14(5):465 - 471.
  17. Schneider, M. and Behr, T. (2006). Topological relationships between complex spatial objects. ACM Trans. Database Syst., 31:39-81.
  18. Solan, Z., Horn, D., Ruppin, E., and Edelman, S. (2005). Unsupervised learning of natural languages. PNAS, 102(33):11629-11634.
  19. Vuori, V. (2002). Adaptive Methods for On-Line Recognition of Isolated Handwritten Characters. PhD thesis, Helsinki University of Technology (Espoo, Finland).
Download


Paper Citation


in Harvard Style

Li J., Viard-Gaudin C. and Mouchere H. (2011). UNSUPERVISED HANDWRITTEN GRAPHICAL SYMBOL LEARNING - Using Minimum Description Length Principle on Relational Graph . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011) ISBN 978-989-8425-79-9, pages 164-170. DOI: 10.5220/0003637901720178


in Bibtex Style

@conference{kdir11,
author={Jinpeng Li and Christian Viard-Gaudin and Harold Mouchere},
title={UNSUPERVISED HANDWRITTEN GRAPHICAL SYMBOL LEARNING - Using Minimum Description Length Principle on Relational Graph},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011)},
year={2011},
pages={164-170},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003637901720178},
isbn={978-989-8425-79-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011)
TI - UNSUPERVISED HANDWRITTEN GRAPHICAL SYMBOL LEARNING - Using Minimum Description Length Principle on Relational Graph
SN - 978-989-8425-79-9
AU - Li J.
AU - Viard-Gaudin C.
AU - Mouchere H.
PY - 2011
SP - 164
EP - 170
DO - 10.5220/0003637901720178