Lanna Dharma Printed Character Recognition using k-Nearest
Neighbor and Conditional Random Fields
Chutima Chueaphun
1
, Atcharin Klomsae
1
, Sanparith Marukatat
2
and Jeerayut Chaijaruwanich
1
1
Department of Computer Science, Faculty of Science, Chiang Mai University, Chiang Mai, 50200, Thailand
2
National Electronics and Computer Technology Center, Pathumthani, 12120, Thailand
Keywords: Lanna Dharma, Character Recognition, k-Nearest Neighbor, Conditional Random Fields.
Abstract: For centuries, in the North of Thailand, many books of Lanna Dharma characters had been printed. These
books are the important sources of the knowledge of ancient Lanna wisdom. At present, the books are found
old and damaged. Most of characters are rough and not clear according to its early printing technology at
that time. Moreover, some sets of characters are relatively very similar which cause the difficulty to
recognize them. This paper proposes a Lanna Dharma printed character recognition technique using k-
Nearest Neighbor and Conditional Random Fields. The accuracy of recognition rate is about 82.61 percent.
1 INTRODUCTION
Currently, there are many optical character
recognition (OCR) researches which allow
conversion of the text in scanned images into the
machine-encoded text. OCR systems are available in
many languages such as English, Japanese, Chinese,
Arabic, Thai, etc. However, there are not yet any for
Lanna Dharma character.
Hundreds of years ago, Lanna language was
widely used in the northern part of Thailand during
the time of the Lanna kingdom, which was founded
in 1259. The Lanna Dharma character is a
descendant of the old Mon character like Lao and
Burmese characters. Since 1892, the typed Lanna
Dharma character was first printed as books
including history, medicine, literature and
Buddhism. The Lanna Dharma books are the
important sources of Lanna regional knowledge.
However, after the invading of Ayutthaya kingdom
from the central of Thailand, the Central Thai
language became the official language learned in
school. Now a day, the Lanna Dharma character has
almost been forgotten. There are now only few
people, usually old ones, who can read it. In
addition, most of the Lanna Dharma books are lost
and destroyed. Therefore, Lanna Dharma printed
character recognition will help to preserve the
ancient Lanna knowledge. Furthermore, the
knowledge can be delivered to general public with
electronically retrievable.
The Lanna Dharma writing system has no white-
space between each word, but there is a white-space
at the end of a clause or sentence. Lanna Dharma
word consists of consonants, vowels, and tones at
different levels. Specifically, many consonants have
their alternate form when they directly follow other
consonant. We distinguish the types of characters for
this paper into eight different types as shown in
Table 1.
The writing example of Lanna Dharma word is
shown in Figure 1. It can be written by organizing
the consonants or the middle vowels in level 1; the
upper vowels are in level 2; the tones are in level 2
or 3; the final consonants are in level 4 and the lower
vowels are in level 4 or 5.
In this paper, we mainly focus on the solution to
the problem of distinguish character belonging to
confusion sets. Indeed, Lanna Dharma printed
character have relatively similar patterns which
cause the recognition error. Figure 2 shows the
example of confusion sets. We propose the use of k-
Nearest Neighbor (k-NN) to firstly classify the class
of character images. Then, sequence of character
classes which is the output of k-NN is reclassified
again by Conditional Random Fields (CRFs). CRFs
are the conditional undirected graphical models,
which model the conditional probabilities of
character sequence. Therefore, it is expected to
resolve the problem of k-NN for confusion sets and
improve the final character recognition.
169
Chueaphun C., Klomsae A., Marukatat S. and Chaijaruwanich J..
Lanna Dharma Printed Character Recognition using k-Nearest Neighbor and Conditional Random Fields.
DOI: 10.5220/0004112801690174
In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval (KDIR-2012), pages 169-174
ISBN: 978-989-8565-29-7
Copyright
c
2012 SCITEPRESS (Science and Technology Publications, Lda.)