# WORD SEGMENTATION BASED ON HIDDEN MARKOV MODEL USING MARKOV CHAIN MONTE CARLO METHOD

### Takuya Fukuda, Takao Miura

#### Abstract

It is well-known that Japanese has no word boundary, so that we should think about how to separate each sentence into words by means of morphological analysis or some other word segmentation analysis. It is said, however, that the separation depends on domain specific rules. The author have proposed a sophisticated word separation method based on Conditional Random Fields (CRF). Unfortunately we need a huge amount of test corpus in application domains as well as computation time for learning. In this investigation, we propose a new approach to obtain test corpus based on Markov Chain Monte Carlo (MCMC) method, by which we can obtan efficient Markov model for segmentation.

#### References

- Abney, S.: Part of Speech Tagging and Partial Parsing, In Corpus-Based Methods in Language and Speech, Kluwer Academic Publishers, 1996
- Fukuda, T., Izumi, M. and Miura, T.: Word Segmentation using Domain Knowledge Based On Conditional Random Fields, proc. Tools with Artificial Intelligence (ICTAI), pp.436-439, 2007
- Gelfond, A.E. and Smith, A.F.M.: Sampling-based Approaches to Calculating Marginal Densities, J. of the American Stat. Assoc. Vol.85, pp.398-409, 1990
- Igarashi, H. and Takaoka, Y. Japanese into Braille Translating for the Internet with ChaSen proc.18th JCMI, 2K6-2, 1998
- Kita, K.: Probabilistic Language Model, Univ. of Tokyo Press, 1999 (in Japanese)
- Kudo, T., Yamamoto, K. and Matsumoto, Y.: Applying conditional random Fields to Japanese morphological analysis, proc. EMNLP, 2004
- Mitchell, T.: Machine Learning, McGraw Hill Companies, 1997
- Ohmori, Y.: Recent Trends in Markov Chain Monte Carlo Methods, J.of the Japan. Stat.Assoc. , Vol.31, pp.305- 344, 2001 (in Japanese)

#### Paper Citation

#### in Harvard Style

Fukuda T. and Miura T. (2009). **WORD SEGMENTATION BASED ON HIDDEN MARKOV MODEL USING MARKOV CHAIN MONTE CARLO METHOD** . In *Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,* ISBN 978-989-8111-66-1, pages 123-129. DOI: 10.5220/0001666501230129

#### in Bibtex Style

@conference{icaart09,

author={Takuya Fukuda and Takao Miura},

title={WORD SEGMENTATION BASED ON HIDDEN MARKOV MODEL USING MARKOV CHAIN MONTE CARLO METHOD},

booktitle={Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,},

year={2009},

pages={123-129},

publisher={SciTePress},

organization={INSTICC},

doi={10.5220/0001666501230129},

isbn={978-989-8111-66-1},

}

#### in EndNote Style

TY - CONF

JO - Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,

TI - WORD SEGMENTATION BASED ON HIDDEN MARKOV MODEL USING MARKOV CHAIN MONTE CARLO METHOD

SN - 978-989-8111-66-1

AU - Fukuda T.

AU - Miura T.

PY - 2009

SP - 123

EP - 129

DO - 10.5220/0001666501230129