Generation of Non-redundant Summary based on Sentence Clustering Algorithms of NSGA-II and SPEA2

Jung Song Lee, Han Hee Hahm, Seong Soo Chang, Soon Cheol Park

Abstract

In this paper, automatic document summarization using the sentence clustering algorithms, NSGA-II and SPEA2, is proposed. These algorithms are very effective to extract the most important and non-redundant sentences from a document. Using these, we cluster similar sentences as many groups as we need and extract the most important sentence in each group. After clustering, we rearrange the extracted sentences in the same order as in the document to generate readable summary. We tested this technique with two of the open benchmark datasets, DUC01 and DUC02. To evaluate the performances, we used F-measure and ROUGE. The experimental results show the performances of these MOGAs, NSGA-II and SPEA2, are better than those of the existing algorithms.

References

  1. Shen, D., Sun, J. T., Li, H., Yang, Q., and Chen, Z. 2007. Document summarization using conditional random fields. In Proceedings of IJCAI. 2862-2867.
  2. Lee, J. S., Choi, L. C., and Park, S. C., 2011. Multiobjective genetic algorithms, NSGA-II and SPEA2, for document clustering. Communications in Computer and Information Science. 257:219-227.
  3. Song, W., and Park, S. C., 2009. Genetic algorithm for text clustering based on latent semantic indexing. Computers and Mathematics with Applications. 57:1901-1907
  4. Song W., and Park, S. C., 2010. Latent semantic analysis for vector space expansion and fuzzy logic-based genetic clustering. Knowledge and Information Systems. 22:347-369.
  5. Censor, Y., 1977. Pareto optimality in multiobjective problems. Applied Mathematics and Optimization. 4:41-59.
  6. Knonak, A., Coit, D. W and Smith, A. E., 2006. Multiobjective optimization using genetic algorithms : A tutorial. Reliability Engineering and System Safety. 91:992-1007.
  7. Deb, K., Pratap, A., Agarwal, S., and Meyarivan, T., 2002. A fast elitist multiobjective genetic algorithm: NSGAII. IEEE Transaction on Evolutionary Computation. 6(2):182-197.
  8. Zitzler, E., Laumanns, M., and Thiele, L., 2002. SPEA2: Improving the strength pareto evolutionary algorithm. Proceedings of the EROGEN.
  9. Cilibrasi, R. L., Vitányi, P. M. B., 2007. The Google similarity measure. IEEE Transaction on Knowledge and Data Engineering. 19:370-383.
  10. Aliguliyev, R. M., 2009. A new sentence similarity measure and sentence based extractive technique for
  11. automatic summarization. Expert Systems with Applications. 36 (4):7764-7772.
  12. Calinski, T., Harabasz, J., 1974. A dendrite method for cluster analysis. Communucations in Statistics.
  13. Davies, D. L., Bouldin, D. W., 1979. A cluster separation measure. IEEE transactions on Pattern analysis and Machine Intelligene.
  14. Pavan, M., Pelillo, M., 2007. Dominant sets and pairwise clustering. IEEE Transactions on Pattern Analysis and Machine Learning. 29:167-172.
  15. Fragoudis, D., Meretakis, D., and Likothanassis, S., 2005. Best terms:an efficient feature-selection algorithm for text categorization. Knowledge and Information Systems.
  16. Lin, C. Y., Hovy, E. H., 2003. Automatic evaluation of summaries using N-gram co-occurrence statistics. In Proceedings of the NAACL on HLT 2003. 1:71-78.
  17. Wan, X., Yang, J., and Xiao, J. 2007. Manifold-ranking based topic-focused multi-document summarization. In Proceedings of the 20th international joint conference on artificial intelligence. 2903-2908.
  18. Svore, K. M., Vanderwende, L., and Burges, C. J. C. 2007. Enhancing single-document summarization by combining RankNet and third-party sources. In Proceedings of the EMNLP-CoNLL. 448-457.
  19. Dunlavy, D. M., O'Leary, D. P., Conroy, J. M., and Schlesinger, J. D., 2007. QCS: A system for querying, clustering and summarizing documents. Information Processing and Management. 43:1588-1605.
  20. Yeh, J. Y., Ke, H. R., Yang, W. P., and Meng, I. H., 2005. Text summarization using a trainable summarizer and latent semantic analysis. Information Processing and Management. 41:75-95.
Download


Paper Citation


in Harvard Style

Lee J., Hahm H., Chang S. and Park S. (2012). Generation of Non-redundant Summary based on Sentence Clustering Algorithms of NSGA-II and SPEA2 . In Proceedings of the 4th International Joint Conference on Computational Intelligence - Volume 1: ECTA, (IJCCI 2012) ISBN 978-989-8565-33-4, pages 176-182. DOI: 10.5220/0004134501760182


in Bibtex Style

@conference{ecta12,
author={Jung Song Lee and Han Hee Hahm and Seong Soo Chang and Soon Cheol Park},
title={Generation of Non-redundant Summary based on Sentence Clustering Algorithms of NSGA-II and SPEA2},
booktitle={Proceedings of the 4th International Joint Conference on Computational Intelligence - Volume 1: ECTA, (IJCCI 2012)},
year={2012},
pages={176-182},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004134501760182},
isbn={978-989-8565-33-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 4th International Joint Conference on Computational Intelligence - Volume 1: ECTA, (IJCCI 2012)
TI - Generation of Non-redundant Summary based on Sentence Clustering Algorithms of NSGA-II and SPEA2
SN - 978-989-8565-33-4
AU - Lee J.
AU - Hahm H.
AU - Chang S.
AU - Park S.
PY - 2012
SP - 176
EP - 182
DO - 10.5220/0004134501760182