Authors:
Haidi Badr
1
;
Nayer Wanas
2
and
Magda Fayek
3
Affiliations:
1
Electronics Researches Institute, Egypt
;
2
Cairo Microsoft Innovation Lab, Egypt
;
3
Cairo University, Egypt
Keyword(s):
LSA, Automatic dimension reduction ratio, Document-summarization.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Information Extraction
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Mining Text and Semi-Structured Data
;
Symbolic Systems
Abstract:
The role of text summarization algorithms is increasing in many applications; especially in the domain of information retrieval. In this work, we propose a generic single-document summarizer which is based on using the Latent Semantic Analysis (LSA). Generally in LSA, determining the dimension reduction ratio is usually performed experimentally which is data and document dependent. In this work, we propose a new approach to determine the dimension reduction ratio, DRr, automatically to overcome the manual determination problems. The proposed approach is tested using two benchmark datasets; namely DUC02 and LDC2008T19. The experimental results illustrate that the dimension reduction ratio obtained automatically improves the quality of the text summarization while providing a more optimal value for the DRr.