AUTOLSA: AUTOMATIC DIMENSION REDUCTION OF LSA FOR SINGLE-DOCUMENT SUMMARIZATION

Haidi Badr, Nayer Wanas, Magda Fayek

Abstract

The role of text summarization algorithms is increasing in many applications; especially in the domain of information retrieval. In this work, we propose a generic single-document summarizer which is based on using the Latent Semantic Analysis (LSA). Generally in LSA, determining the dimension reduction ratio is usually performed experimentally which is data and document dependent. In this work, we propose a new approach to determine the dimension reduction ratio, DRr, automatically to overcome the manual determination problems. The proposed approach is tested using two benchmark datasets; namely DUC02 and LDC2008T19. The experimental results illustrate that the dimension reduction ratio obtained automatically improves the quality of the text summarization while providing a more optimal value for the DRr.

References

  1. Ding, C. (2005). A probabilistic model for latent semantic indexing. The American Society for Information Science and Technology, 56:597-608.
  2. Gong, Y. and Liu, X. (2002). Generic text summarization using relevance measure and latent semantic analysis. In 24th annual international ACM SIGIR conference on Research and development in information retrieval.
  3. Steinberger, J. and Jezek, K. (2004). Text summarization and singular value decomposition. Springer-Verlag, LNCS, 2457:245-254.
  4. Steinberger, J. and Kristan, M. (2007). Lsa-based multidocument summarization. In 8th International Workshop on Systems and Control.
  5. Steinberger, J., Poesio, M., Kabadjov, M., and Jezek, K. (2007). Two uses of anaphora resolution in summarization. Information Processing and Management, 43:1663-1680.
  6. Yeh, J., Ke, H., Yang, W., and Meng, I. (2005). Text summarization using a trainable summarizer and latent semantic analysis. Information Processing and Management on An Asian digital libraries perspective, 41:75- 95.
Download


Paper Citation


in Harvard Style

Badr H., Wanas N. and Fayek M. (2010). AUTOLSA: AUTOMATIC DIMENSION REDUCTION OF LSA FOR SINGLE-DOCUMENT SUMMARIZATION . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010) ISBN 978-989-8425-28-7, pages 444-448. DOI: 10.5220/0003091904440448


in Bibtex Style

@conference{kdir10,
author={Haidi Badr and Nayer Wanas and Magda Fayek},
title={AUTOLSA: AUTOMATIC DIMENSION REDUCTION OF LSA FOR SINGLE-DOCUMENT SUMMARIZATION},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010)},
year={2010},
pages={444-448},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003091904440448},
isbn={978-989-8425-28-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010)
TI - AUTOLSA: AUTOMATIC DIMENSION REDUCTION OF LSA FOR SINGLE-DOCUMENT SUMMARIZATION
SN - 978-989-8425-28-7
AU - Badr H.
AU - Wanas N.
AU - Fayek M.
PY - 2010
SP - 444
EP - 448
DO - 10.5220/0003091904440448