AUTOMATIC SUMMARIZATION OF ONLINE CUSTOMER REVIEWS

Jiaming Zhan, Han Tong Loh, Ying Liu

Abstract

Online customer reviews offer valuable information for merchants and potential shoppers in e-Commerce and e-Business. However, even for a single product, the number of reviews often amounts to hundreds or thousands. Thus, summarization of multiple reviews is helpful to extract the important issues that merchants and customers are concerned about. Existing methods of multi-document summarization divide documents into non-overlapping clusters first and then summarize each cluster of documents individually with the assumption that each cluster discusses a single topic. When applied to summarize customer reviews, it is however difficult to determine the number of clusters without the prior domain knowledge, and moreover, topics often overlap with each other in a collection of customer reviews. In this paper, we propose a summarization approach based on the topical structure of multiple customer reviews. Instead of clustering and summarization, our approach extracts topics from a collection of reviews and further ranks the topics based on their frequency. The summary is then generated according to the ranked topics. The evaluation results showed that our approach outperformed the baseline summarization systems, i.e. Copernic summarizer and clustering-summarization, in terms of users’ responsiveness.

References

  1. Ahonen, H. (1999). Finding all maximal frequent sequences in text. In Proceedings of the ICML'99 Workshop on Machine Learning in Text Data Analysis, Bled, Slovenia.
  2. Barzilay, R. & Elhadad, M. (1997). Using lexical chains for text summarization. In Proceedings of the ACL'97/EACL'97 Workshop on Intelligent Scalable Text Summarization, Madrid, Spain, pages 10-17.
  3. Boros, E., Kantor, P. B. & Neu, D. J. (2001). A clustering based approach to creating multi-document summaries. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, LA.
  4. Carbonell, J. & Goldstein, J. (1998). The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia, pages 335-336.
  5. Choi, F. Y. Y. (2000). Advances in domain independent linear text segmentation. In Proceedings of the 1st North American Chapter of the Association for Computational Linguistics, Seattle, WA, pages 26-33.
  6. Edmundson, H. P. (1969). New methods in automatic extracting. Journal of the ACM (JACM), 16(2):264- 285.
  7. Gong, Y. & Liu, X. (2001). Generic text summarization using relevance measure and latent semantic analysis. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, LA, pages 19-25.
  8. Hearst, M. A. (1997). TextTiling: segmenting text into multi-paragraph subtopic passages. Computational Linguistics, 23(1):33-64.
  9. Hovy, E. & Lin, C.-Y. (1997). Automated text summarization in SUMMARIST. In Proceedings of the ACL'97/EACL'97 Workshop on Intelligent Scalable Text Summarization, Madrid, Spain, pages 18-24.
  10. Hu, M. & Liu, B. (2004). Mining and summarizing customer reviews. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, pages 168- 177.
  11. Jing, H., Barzilay, R., McKeown, K. & Elhadad, M. (1998). Summarization evaluation methods: experiments and analysis. In Proceedings of the AAAI'98 Workshop on Intelligent Text Summarization, Stanford, CA, pages 60-68.
  12. Karypis, G. (2002). Cluto: A software package for clustering high dimensional datasets. Release 1.5. Department of Computer Science, University of Minnesota.
  13. Kupiec, J., Pedersen, J. & Chen, F. (1995). A trainable document summarizer. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, WA, pages 68-73.
  14. Liu, Y. (2005). A concept-based text classification system for manufacturing information retrieval. Ph.D. Thesis, National University of Singapore.
  15. Luhn, H. P. (1958). The automatic creation of literature abstracts. IBM Journal of Research and Development, 2(2):159-165.
  16. Maña-López, M. J. (2004). Multidocument summarization: an added value to clustering in interactive retrieval. ACM Transaction on Information Systems, 22(2):215-241.
  17. Mani, I. & Bloedorn, E. (1999). Summarizing similarities and differences among related documents. Information Retrieval, 1(1-2):35-67.
  18. Mann, W. & Thompson, S. (1988). Rhetorical structure theory: toward a functional theory of text organization. Text, 8(3):243-281.
  19. Marcu, D., (1999). Discourse trees are good indicators of importance in text. In I. Mani & M. Maybury (editors), Advances in automatic text summarization, pages 123- 136. Cambridge, MA: The MIT Press.
  20. McKeown, K. & Radev, D. R. (1995). Generating summaries of multiple news articles. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, WA, pages 74-82.
  21. Moens, M.-F. & De Busser, R. (2001). Generic topic segmentation of document texts. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, LA, pages 418-419.
  22. Ponte, J. M. & Croft, W. B. (1997). Text segmentation by topic. In Proceedings of the 1st European Conference on Research on Advanced Technology for Digital Libraries, Pisa, Italy, pages 113-125.
  23. Popescu, A.-M. & Etzioni, O. (2005). Extracting product features and opinions from reviews. In Proceedings of Joint Conference on Human Language Technology / Empirical Methods in Natural Language Processing (HLT/EMNLP'05), Vancouver, Canada, pages 339- 346.
  24. Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14(3):130-137.
  25. Radev, D. R., Jing, H., Stys, M. & Tam, D. (2004). Centroid-based summarization of multiple documents. Information Processing & Management, 40(6):919- 938.
  26. Roussinov, D. G. & Chen, H. (2001). Information navigation on the web by clustering and summarizing query results. Information Processing & Management, 37(6):789-816.
  27. Tombros, A. & Sanderson, M. (1998). Advantages of query biased summaries in information retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia, pages 2- 10.
  28. Turney, P. D. (2001). Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, Philadelphia, PA, pages 417-424.
  29. Yeh, J.-Y., Ke, H.-R., Yang, W.-P. & Meng, I-H. (2005). Text summarization using a trainable summarizer and latent semantic analysis. Information Processing & Management, 41(1):75-95.
Download


Paper Citation


in Harvard Style

Zhan J., Tong Loh H. and Liu Y. (2007). AUTOMATIC SUMMARIZATION OF ONLINE CUSTOMER REVIEWS . In Proceedings of the Third International Conference on Web Information Systems and Technologies - Volume 3: WEBIST, ISBN 978-972-8865-79-5, pages 5-12. DOI: 10.5220/0001266400050012


in Bibtex Style

@conference{webist07,
author={Jiaming Zhan and Han Tong Loh and Ying Liu},
title={AUTOMATIC SUMMARIZATION OF ONLINE CUSTOMER REVIEWS},
booktitle={Proceedings of the Third International Conference on Web Information Systems and Technologies - Volume 3: WEBIST,},
year={2007},
pages={5-12},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001266400050012},
isbn={978-972-8865-79-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Third International Conference on Web Information Systems and Technologies - Volume 3: WEBIST,
TI - AUTOMATIC SUMMARIZATION OF ONLINE CUSTOMER REVIEWS
SN - 978-972-8865-79-5
AU - Zhan J.
AU - Tong Loh H.
AU - Liu Y.
PY - 2007
SP - 5
EP - 12
DO - 10.5220/0001266400050012