Authors:
Takuya Fukuda
and
Takao Miura
Affiliation:
Hosei University, Japan
Keyword(s):
Word Segmentation, Hidden Markov Models, Markov Chain Monte Carlo (MCMC) method.
Related
Ontology
Subjects/Areas/Topics:
Applications
;
Artificial Intelligence
;
Computational Intelligence
;
Evolutionary Computing
;
Knowledge Discovery and Information Retrieval
;
Knowledge Engineering and Ontology Development
;
Knowledge-Based Systems
;
Machine Learning
;
Natural Language Processing
;
Pattern Recognition
;
Soft Computing
;
Symbolic Systems
Abstract:
It is well-known that Japanese has no word boundary, so that we should think about how to separate each sentence into words by means of morphological analysis or some other word segmentation analysis. It is said, however, that the separation depends on domain specific rules. The author have proposed a sophisticated word separation method based on Conditional Random Fields (CRF). Unfortunately we need a huge amount of test corpus in application domains as well as computation time for learning. In this investigation, we propose a new approach to obtain test corpus based on Markov Chain Monte Carlo (MCMC) method, by which we can obtan efficient Markov model for segmentation.