words such as named entities which are not index
words of Japanese WordNet. We employed Lin’s
method (D.Lin, 1998) for extracting similar word
pairs in hotel review texts.
We had experiments for dividing reviews into every
criterion. We used review texts of 5 hotels. The av-
erage number of review texts per hotel was 51.2. The
number of criteria consists of 256. Table 4 shows the
results of text segmentation.
Table 4: Results of text segmentation.
Our method TextTiling
Precision 863/1024=0.842 742/1024=0.725
Recall 863/902=0.925 742/1119=0.663
F-measure 0.882 0.692
We compared our method with TextTiling (Hearst,
1997) which is a well known text segmentation tech-
nique. TextTiling is a technique for subdividing texts
into multi-paragraph units that represent passages, or
subtopics. The discourse cues for identifying major
subtopic shifts are patterns of lexical co-occurrence
and distribution. TextTiling shows high perfor-
mance/accuracy for documents which have rather
long topics such as magazine articles. However, Text-
Tiling could not obtain good results as review text for
each criterion of is very short. As can be seen clearly
from Table 4, the results obtained by our method were
much better than those of TextTiling. This demon-
strates that lexical information such as similarity be-
tween words, hypernyms of words and named entities
were effective for text segmentation.
8 CONCLUSIONS
In this paper, we proposed a method for review texts
segmentation by guest’s criteria, such as service, lo-
cation and facilities. The results showed the effec-
tiveness of our method as the results attained at 0.84
precision, 0.92 recall, and 0.88 F-measure, and it
was outperformed the results obtained by TextTiling
which is used for text segmentation. Future work will
include: (i) applying the method to a large number of
guests reviews for quantitative evaluation, (ii) apply-
ing the method to other data such as grocery stores:
LeShop
3
, TaFeng
4
and movie data: MovieLens
5
to
evaluate the robustness of the method.
3
www.beshop.ch
4
aiia.iis.sinica.edu.tw/index.php?option=com docman&
task=cat view&gid=34&Itemid=41
5
http://www.grouplens.org/node/73
ACKNOWLEDGEMENTS
The authors would like to thank the referees for their
comments on the earlier version of this paper. This
work was partially supported by The Telecommuni-
cations Advancement Foundation.
REFERENCES
Allan, J., Carbonell, J., Doddington, G., Yamron, J., and
Yang, Y. (1998). Topic detection and tracking pilot
study final report. In the DARPA Broadcast News
Transcription and Understanding Workshop.
Bond, F., Isahara, H., Uchimoto, K., Kuribayashi, T., and
Kanzaki, K. (2009). Enhancing the japanese wordnet.
In The 7th Workshop on Asian Language Resources,
in conjunction with ACL-IJCNLP.
D.Lin (1998). Automatic retrieval and clustering of similar
words. In Proceedings of 36th Annual Meeting of the
Association for Computational Linguistics and 17th
International Conference on Computational Linguis-
tics Proceedings of the Conference, pages 768–774.
Fukushima, T., Ehara, T., and Shirai, K. (1999). Partitioning
long sentences for text summarization. Journal of Nat-
ural Language Processing (in Japanese), 6(6):131–
147.
Hearst, M. A. (1997). Texttiling: Segmenting text into
multi-paragraph subtopic passages. In Association for
Computational Linguistics, pages 111–112.
Hirao, T., Kitauchi, A., and Kitani, T. (2000). Text seg-
mentation based on lexical cohesion and word im-
portance. Information Processing Society of Japan,
41(SIG3(TOD6)):24–36.
Kozima, H. (1993). Text segmentation based on similarity
between words. In Proceedings of the 31th Annual
Meeting, pages 286–288.
Kudo, T. and Matsumoto, Y. (2002). Japanese depen-
dency analysis using cascaded chunking. In CoNLL
2002:Proceedings of the 6th Conference on Natural
Language Learning 2002, pages 63–69.
Utiyama, M. and Isahara, H. (2001). A statistical model for
domain-independent text segmentation. In Proceed-
ings of the 39th Annual Meeting on Association for
Computational Linguistics, pages 499–506.
KEOD2012-InternationalConferenceonKnowledgeEngineeringandOntologyDevelopment
384