WORD SEGMENTATION BASED ON HIDDEN MARKOV MODEL USING MARKOV CHAIN MONTE CARLO METHOD

Takuya Fukuda, Takao Miura

Abstract

It is well-known that Japanese has no word boundary, so that we should think about how to separate each sentence into words by means of morphological analysis or some other word segmentation analysis. It is said, however, that the separation depends on domain specific rules. The author have proposed a sophisticated word separation method based on Conditional Random Fields (CRF). Unfortunately we need a huge amount of test corpus in application domains as well as computation time for learning. In this investigation, we propose a new approach to obtain test corpus based on Markov Chain Monte Carlo (MCMC) method, by which we can obtan efficient Markov model for segmentation.

References

  1. Abney, S.: Part of Speech Tagging and Partial Parsing, In Corpus-Based Methods in Language and Speech, Kluwer Academic Publishers, 1996
  2. Fukuda, T., Izumi, M. and Miura, T.: Word Segmentation using Domain Knowledge Based On Conditional Random Fields, proc. Tools with Artificial Intelligence (ICTAI), pp.436-439, 2007
  3. Gelfond, A.E. and Smith, A.F.M.: Sampling-based Approaches to Calculating Marginal Densities, J. of the American Stat. Assoc. Vol.85, pp.398-409, 1990
  4. Igarashi, H. and Takaoka, Y. Japanese into Braille Translating for the Internet with ChaSen proc.18th JCMI, 2K6-2, 1998
  5. Kita, K.: Probabilistic Language Model, Univ. of Tokyo Press, 1999 (in Japanese)
  6. Kudo, T., Yamamoto, K. and Matsumoto, Y.: Applying conditional random Fields to Japanese morphological analysis, proc. EMNLP, 2004
  7. Mitchell, T.: Machine Learning, McGraw Hill Companies, 1997
  8. Ohmori, Y.: Recent Trends in Markov Chain Monte Carlo Methods, J.of the Japan. Stat.Assoc. , Vol.31, pp.305- 344, 2001 (in Japanese)
Download


Paper Citation


in Harvard Style

Fukuda T. and Miura T. (2009). WORD SEGMENTATION BASED ON HIDDEN MARKOV MODEL USING MARKOV CHAIN MONTE CARLO METHOD . In Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 1: ICAART, ISBN 978-989-8111-66-1, pages 123-129. DOI: 10.5220/0001666501230129


in Bibtex Style

@conference{icaart09,
author={Takuya Fukuda and Takao Miura},
title={WORD SEGMENTATION BASED ON HIDDEN MARKOV MODEL USING MARKOV CHAIN MONTE CARLO METHOD},
booktitle={Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,},
year={2009},
pages={123-129},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001666501230129},
isbn={978-989-8111-66-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,
TI - WORD SEGMENTATION BASED ON HIDDEN MARKOV MODEL USING MARKOV CHAIN MONTE CARLO METHOD
SN - 978-989-8111-66-1
AU - Fukuda T.
AU - Miura T.
PY - 2009
SP - 123
EP - 129
DO - 10.5220/0001666501230129