Polytope Model for Extractive Summarization

Marina Litvak, Natalia Vanetik

Abstract

The problem of text summarization for a collection of documents is defined as the problem of selecting a small subset of sentences so that the contents and meaning of the original document set are preserved in the best possible way. In this paper we present a linear model for the problem of text summarization, where we strive to obtain a summary that preserves the information coverage as much as possible in comparison to the original document set. We construct a system of linear inequalities that describes the given document set and its possible summaries and translate the problem of finding the best summary to the problem of finding the point on a convex polytope closest to the given hyperplane. This re-formulated problem can be solved efficiently with the help of quadratic programming.

References

  1. Alfonseca, E. and Rodriguez, P. (2003). Generating extracts with genetic algorithms. In Proceedings of the 2003 European Conference on Information Retrieval (ECIR'2003), pages 511-519.
  2. Berkelaar, M. (1999). lp-solve free software. http://lpsolve.sourceforge.net/5.5/.
  3. Filatova, E. (2004). Event-based extractive summarization. In In Proceedings of ACL Workshop on Summarization, pages 104-111.
  4. Gillick, D. and Favre, B. (2009). A Scalable Global Model for Summarization. In Proceedings of the NAACL HLT Workshop on Integer Linear Programming for Natural Language Processing, pages 10-18.
  5. Hassel, M. and Sjobergh, J. (2006). Towards holistic summarization: Selecting summaries, not sentences. In Proceedings of LREC - International Conference on Language Resources and Evaluation.
  6. Hitoshi Nishikawa, Takaaki Hasegawa, Y. M. and Kikui, G. (2010). Opinion Summarization with Integer Linear Programming Formulation for Sentence Extraction and Ordering. In Coling 2010: Poster Volume, pages 910-918.
  7. Karmarkar, N. (1984). New polynomial-time algorithm for linear programming. Combinatorica, 4:373-395.
  8. Khachiyan, L. G. (1996). Rounding of polytopes in the real number model of computation. Mathematics of Operations Research, 21:307-320.
  9. Khuller, S., Moss, A., and Naor, J. S. (1999). The budgeted maximum coverage problem. Information Precessing Letters, 70(1):39-45.
  10. Litvak, M., Last, M., and Friedman, M. (2010). A new approach to improving multilingual summarization using a Genetic Algorithm. In ACL 7810: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 927-936.
  11. Liu, D., Wang, Y., Liu, C., and Wang, Z. (2006). Multiple Documents Summarization Based on Genetic Algorithm. In Fuzzy Systems and Knowledge Discovery, volume 4223 of Lecture Notes in Computer Science, pages 355-364.
  12. Makhorin, A. O. (2000). GNU Linear Programming Kit. http://www.gnu.org/software/glpk/.
  13. Makino, T., Takamura, H., and Okumura, M. (2011). Balanced coverage of aspects for text summarization. In TAC 7811: Proceedings of Text Analysis Conference.
  14. Mani, I. and Maybury, M. (1999). Advances in Automatic Text Summarization. MIT Press, Cambridge, MA.
  15. Ouyang, Y., Li, W., Li, S., and Lu, Q. (2011). Applying regression models to query-focused multi-document summarization. Information Processing and Management, 47:227-237.
  16. Salton, G., Yang, C., and Wong, A. (1975). A vector-space model for information retrieval. Communications of the ACM, 18.
  17. Takamura, H. and Okumura, M. (2009). Text summarization model based on maximum coverage problem and its variant. In EACL 7809: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pages 781-789.
  18. Woodsend, K. and Lapata, M. (2010). Automatic Generation of Story Highlights. In ACL 7810: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 565-574.
Download


Paper Citation


in Harvard Style

Litvak M. and Vanetik N. (2012). Polytope Model for Extractive Summarization . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2012) ISBN 978-989-8565-29-7, pages 281-286. DOI: 10.5220/0004170902810286


in Bibtex Style

@conference{kdir12,
author={Marina Litvak and Natalia Vanetik},
title={Polytope Model for Extractive Summarization},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2012)},
year={2012},
pages={281-286},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004170902810286},
isbn={978-989-8565-29-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2012)
TI - Polytope Model for Extractive Summarization
SN - 978-989-8565-29-7
AU - Litvak M.
AU - Vanetik N.
PY - 2012
SP - 281
EP - 286
DO - 10.5220/0004170902810286