Segmentation of Review Texts by using Thesaurus and Corpus-based Word Similarity

Yoshimi Suzuki, Fumiyo Fukumoto

Abstract

Recently, we can refer to user reviews in the shopping or hotel reservation sites. However, with the exponential growth of information of the Internet, it is becoming increasingly difficult for a user to read and understand all the materials from a large-scale reviews that is potentially of interest. In this paper, we propose a method for review texts segmentation by guest’s criteria, such as service, location and facilities. Our system firstly extracts words which represent criteria from hotel review texts. We focused on topic markers such as ``ha'' in Japanese to extract guest’s criteria. The extracted words are classified into classes with similar words. The classification is proceeded by using Japanese WordNet. Then, for each hotel, each text with all of the guest reviews is segmented into word sequence by using criteria classes. Review text segmentation is difficult because of short text. We thus used Japanese WordNet, extracted similar word pairs, and indexes of Wikipedia. We performed text segmentation of hotel review. The results showed the effectiveness of our method and indicated that it can be used for review summarization by guest’s criteria.

References

  1. Allan, J., Carbonell, J., Doddington, G., Yamron, J., and Yang, Y. (1998). Topic detection and tracking pilot study final report. In the DARPA Broadcast News Transcription and Understanding Workshop.
  2. Bond, F., Isahara, H., Uchimoto, K., Kuribayashi, T., and Kanzaki, K. (2009). Enhancing the japanese wordnet. In The 7th Workshop on Asian Language Resources, in conjunction with ACL-IJCNLP.
  3. D.Lin (1998). Automatic retrieval and clustering of similar words. In Proceedings of 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics Proceedings of the Conference, pages 768-774.
  4. Fukushima, T., Ehara, T., and Shirai, K. (1999). Partitioning long sentences for text summarization. Journal of Natural Language Processing (in Japanese), 6(6):131- 147.
  5. Hearst, M. A. (1997). Texttiling: Segmenting text into multi-paragraph subtopic passages. In Association for Computational Linguistics, pages 111-112.
  6. Hirao, T., Kitauchi, A., and Kitani, T. (2000). Text segmentation based on lexical cohesion and word importance. Information Processing Society of Japan, 41(SIG3(TOD6)):24-36.
  7. Kozima, H. (1993). Text segmentation based on similarity between words. In Proceedings of the 31th Annual Meeting, pages 286-288.
  8. Kudo, T. and Matsumoto, Y. (2002). Japanese dependency analysis using cascaded chunking. In CoNLL 2002:Proceedings of the 6th Conference on Natural Language Learning 2002, pages 63-69.
  9. Utiyama, M. and Isahara, H. (2001). A statistical model for domain-independent text segmentation. In Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, pages 499-506.
Download


Paper Citation


in Harvard Style

Suzuki Y. and Fukumoto F. (2012). Segmentation of Review Texts by using Thesaurus and Corpus-based Word Similarity . In Proceedings of the International Conference on Knowledge Engineering and Ontology Development - Volume 1: KEOD, (IC3K 2012) ISBN 978-989-8565-30-3, pages 381-384. DOI: 10.5220/0004140903810384


in Bibtex Style

@conference{keod12,
author={Yoshimi Suzuki and Fumiyo Fukumoto},
title={Segmentation of Review Texts by using Thesaurus and Corpus-based Word Similarity},
booktitle={Proceedings of the International Conference on Knowledge Engineering and Ontology Development - Volume 1: KEOD, (IC3K 2012)},
year={2012},
pages={381-384},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004140903810384},
isbn={978-989-8565-30-3},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Engineering and Ontology Development - Volume 1: KEOD, (IC3K 2012)
TI - Segmentation of Review Texts by using Thesaurus and Corpus-based Word Similarity
SN - 978-989-8565-30-3
AU - Suzuki Y.
AU - Fukumoto F.
PY - 2012
SP - 381
EP - 384
DO - 10.5220/0004140903810384