UNSUPERVISED HANDWRITTEN GRAPHICAL SYMBOL LEARNING - Using Minimum Description Length Principle on Relational Graph
Jinpeng Li, Christian Viard-Gaudin, Harold Mouchere
2011
Abstract
Generally, the approaches encountered in the field of handwriting recognition require the knowledge of the symbol set, and of as many as possible ground-truthed samples, so that machine learning based approaches can be implemented. In this work, we propose the discovery of the symbol set that is used in the context of a graphical language produced by on-line handwriting. We consider the case of a two-dimensional graphical language such as mathematical expression composition, where not only left to right layouts have to be considered. Firstly, we select relevant graphemes using hierarchical clustering. Secondly, we build a relational graph between the strokes defining an handwritten expression. Thirdly, we extract the lexicon which is a set of graph substructures using the minimum description length principle. For the assessment of the extracted lexicon, a hierarchical segmentation task is introduced. From the experiments we conducted, a recall rate of 84.2% is reported on the test part of our database produced by 100 writers.
References
- Alexander Clark, C. F. and Lappin, S. (2010). The Handbook of Computational Linguistics and Natural Language Processing. Wiley-Blackwell.
- Awal, A. M. (2010). Reconnaissance de structures bidimensionnelles: Application aux expressions mathématiques manuscrites en-ligne. PhD thesis, Ecole polytechnique de l'université de Nantes, France.
- Baird, J. C. (1970). Psychophysical analysis of visual space. Oxford, London: Pergamon Press.
- Carroll, G., Carroll, G., Charniak, E., and Charniak, E. (1992). Two experiments on learning probabilistic dependency grammars from corpora. In Working Notes of the Workshop Statistically-Based NLP Techniques, pages 1-13. AAAI.
- Chartrand, G. (1985). Introductory Graph Theory. Dover Publications.
- Chomsky, N. (1956). Three models for the description of language. IRE Transactions on Information Theory, 2:113-124.
- Cook, D. J. and Holder, L. B. (1994). Substructure discovery using minimum description length and background knowledge. J. Artif. Int. Res., 1:231-255.
- Cook, D. J. and Holder, L. B. (2011). Substructure discovery using examples. http://ailab.wsu.edu/subdue/.
- Francis, N. W. and Kuc?era, H. (1982). Frequency Analysis of English Usage: Lexicon and Grammar., volume 18. Houghton Mifflin, Boston.
- Gold, E. M. (1967). Language identification in the limit. Information and Control, 10(5):447-474.
- Jonyer, I., Holder, L. B., and Cook, D. J. (2000). Graphbased hierarchical conceptual clustering. International Journal on Artificial Intelligence Tools, 2:107- 135.
- Li, J., Mouchere, H., and Viard-Gaudin, C. (2011). Symbol knowledge extraction from a simple graphical language. In ICDAR2011.
- Marcken, C. D. (1996a). Linguistic structure as composition and perturbation. In In Meeting of the Association for Computational Linguistics, pages 335-341. Morgan Kaufmann Publishers.
- Marcken, C. D. (1996b). Unsupervised Language Acquisition. PhD thesis, Massachusetts Institute of Technology.
- Rhee, T. H. and Kim, J. H. (2009). Efficient search strategy in structural analysis for handwritten mathematical expression recognition. Pattern Recognition, 42(12):3192 - 3201.
- Rissanen, J. (1978). Modeling by shortest data description. Automatica, 14(5):465 - 471.
- Schneider, M. and Behr, T. (2006). Topological relationships between complex spatial objects. ACM Trans. Database Syst., 31:39-81.
- Solan, Z., Horn, D., Ruppin, E., and Edelman, S. (2005). Unsupervised learning of natural languages. PNAS, 102(33):11629-11634.
- Vuori, V. (2002). Adaptive Methods for On-Line Recognition of Isolated Handwritten Characters. PhD thesis, Helsinki University of Technology (Espoo, Finland).
Paper Citation
in Harvard Style
Li J., Viard-Gaudin C. and Mouchere H. (2011). UNSUPERVISED HANDWRITTEN GRAPHICAL SYMBOL LEARNING - Using Minimum Description Length Principle on Relational Graph . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011) ISBN 978-989-8425-79-9, pages 164-170. DOI: 10.5220/0003637901720178
in Bibtex Style
@conference{kdir11,
author={Jinpeng Li and Christian Viard-Gaudin and Harold Mouchere},
title={UNSUPERVISED HANDWRITTEN GRAPHICAL SYMBOL LEARNING - Using Minimum Description Length Principle on Relational Graph},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011)},
year={2011},
pages={164-170},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003637901720178},
isbn={978-989-8425-79-9},
}
in EndNote Style
TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011)
TI - UNSUPERVISED HANDWRITTEN GRAPHICAL SYMBOL LEARNING - Using Minimum Description Length Principle on Relational Graph
SN - 978-989-8425-79-9
AU - Li J.
AU - Viard-Gaudin C.
AU - Mouchere H.
PY - 2011
SP - 164
EP - 170
DO - 10.5220/0003637901720178