Compressing Multi-document Summaries through Sentence Simplification

Sara Botelho Silveira, Antonio Branco

Abstract

Multi-document summarization aims at creating a single summary based on the information conveyed by a collection of texts. After the candidate sentences have been identified and ordered, it is time to select which will be included in the summary. In this paper, we propose an approach that uses sentence simplification, both lexical and syntactic, to help improve the compression step in the summarization process. Simplification is performed by removing specific sentential constructions conveying information that can be considered to be less relevant to the general message of the summary. Thus, the rationale is that sentence simplification not only removes expendable information, but also makes room for further relevant data in a summary.

References

  1. Aleixo, P. and Pardo, T. A. S. (2008). CSTNews: Um córpus de textos jornalísticos anotados segundo a teoria discursiva multidocumento CST (cross-document structure theory). Technical report, Universidade de Sa˜o Paulo.
  2. Blair-Goldensohn, S., Evans, D., Hatzivassiloglou, V., Mckeown, K., Nenkova, A., Passonneau, R., Schiffman, B., Schlaikjer, A., Advaith, Siddharthan, A., and Siegelman, S. (2004). Columbia university at duc 2004. In Proceedings of the 2004 document understanding conference (DUC 2004), HLT/NAACL 2004, pages 23-30, Boston, Massachusetts.
  3. Chandrasekar, R., Doran, C., and Srinivas, B. (1996). Motivations and methods for text simplification. In In Proceedings of the Sixteenth International Conference on Computational Linguistics (COLING 7896), pages 1041-1044.
  4. Cohn, T. and Lapata, M. (2009). Sentence compression as tree transduction. J. Artif. Intell. Res. (JAIR), 34:637- 674.
  5. Conroy, J., Schlesinger, J., and Stewart, J. (2005). Classy query-based multidocument summarization. In Proceedings of 2005 Document Understanding Conference, Vancouver, BC.
  6. Feng, L. (2008). Text simplification: A survey. Technical report, The City University of New York.
  7. Filippova, K. (2010). Multi-sentence compression: finding shortest paths in word graphs. In Proceedings of the 23rd International Conference on Computational Linguistics, COLING 7810, pages 322-330, Stroudsburg, PA, USA. ACL.
  8. Jing, H. (2000). Sentence reduction for automatic text summarization. In Proceedings of the sixth conference on Applied natural language processing, pages 310-315, Morristown, NJ, USA. Association for Computational Linguistics.
  9. Jing, H. and McKeown, K. R. (2000). Cut and paste based text summarization. In Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference, NAACL 2000, pages 178-185, Stroudsburg, PA, USA. ACL.
  10. Levy, R. and Andrew, G. (2006). Tregex and Tsurgeon: Tools for querying and manipulating tree data structures. In Proceedings of the 5th Language Resources and Evaluation Conference (LREC).
  11. Lin, C.-Y. (2004). Rouge: A package for automatic evaluation of summaries. In Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, pages 74- 81, Barcelona, Spain. ACL.
  12. Lloret, E. (2011). Text Summarisation based on Human Language Technologies and its Applications. PhD thesis, Universidad de Alicante.
  13. Pardo, T. A. S., Rino, L. H. M., and das Grac¸as V. Nunes, M. (2003). Gistsumm: A summarization tool based on a new extractive method. In PROPOR, Lecture Notes in Computer Science, pages 210-218. Springer.
  14. Siddharthan, A., Nenkova, A., and McKeown, K. (2004). Syntactic simplification for improving content selection in multi-document summarization. In COLING 7804: Proceedings of the 20th international conference on Computational Linguistics, page 896, Morristown, NJ, USA. ACL.
  15. Silva, J., Branco, A., Castro, S., and Reis, R. (2010). Outof-the-box robust parsing of Portuguese. In Proceedings of the 9th Encontro para o Processamento Computacional da Língua Portuguesa Escrita e Falada (PROPOR), pages 75-85.
  16. Silveira, S. B. and Branco, A. (2012a). Combining a double clustering approach with sentence simplification to produce highly informative multi-document summaries. In IRI 2012: 14th International Conference on Artificial Intelligence, pages 482-489, Las Vegas, USA.
  17. Silveira, S. B. and Branco, A. (2012b). Enhancing multidocument summaries with sentence simplification. In ICAI 2012: International Conference on Artificial Intelligence, Las Vegas, USA.
  18. Wubben, S., van den Bosch, A., and Krahmer, E. (2012). Sentence simplification by monolingual machine translation. In ACL - The 50th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, July 8-14, 2012, Jeju Island, Korea - Volume 1: Long Papers, pages 1015- 1024. The Association for Computer Linguistics.
  19. Yoshikawa, K., Iida, R., Hirao, T., and Okumura, M. (2012). Sentence compression with semantic role constraints. In ACL - The 50th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, July 8-14, 2012, Jeju Island, Korea - Volume 2: Short Papers, pages 349-353. The Association for Computer Linguistics.
  20. Zajic, D., Dorr, B. J., Lin, J., and Schwartz, R. (2007). Multi-candidate reduction: Sentence compression as a tool for document summarization tasks. Inf. Process. Manage., 43(6):1549-1570.
Download


Paper Citation


in Harvard Style

Botelho Silveira S. and Branco A. (2013). Compressing Multi-document Summaries through Sentence Simplification . In Proceedings of the 5th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, ISBN 978-989-8565-39-6, pages 111-119. DOI: 10.5220/0004251201110119


in Bibtex Style

@conference{icaart13,
author={Sara Botelho Silveira and Antonio Branco},
title={Compressing Multi-document Summaries through Sentence Simplification},
booktitle={Proceedings of the 5th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,},
year={2013},
pages={111-119},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004251201110119},
isbn={978-989-8565-39-6},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 5th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,
TI - Compressing Multi-document Summaries through Sentence Simplification
SN - 978-989-8565-39-6
AU - Botelho Silveira S.
AU - Branco A.
PY - 2013
SP - 111
EP - 119
DO - 10.5220/0004251201110119