Genetic Algorithm and Latent Semantic Analysis based Documents Summarization Technique
Imen Tanfouri, Fethi Jarray, Fethi Jarray
2022
Abstract
Automatic text summarization (ATS) is the process of generating or extracting a shorter text of the original document while preserving relevant and important information. Nowadays, it is a hot research topic in natural language processing with various applications, including social networks and the healthcare domain. The task of summarizing can be divided into two categories, extractive and abstractive. In this paper, we are concerned with extractive summarization for a single Arabic document. In this contribution, we propose a combination of semantic and combinatorial methods to summarize a document by clustering its content through topic modeling techniques and subsequently generating an extractive summary for each of the identified topics using genetic algorithms. This approach ensures that the final summary covers all important topics in the document. We achieve state-of-the-art performance on the common Arabic summarization benchmark datasets. The obtained results show the effectiveness of combining genetic algorithms (GA) and latent semantic analysis (LSA) for document summarization.
DownloadPaper Citation
in Harvard Style
Tanfouri I. and Jarray F. (2022). Genetic Algorithm and Latent Semantic Analysis based Documents Summarization Technique. In Proceedings of the 14th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2022) - Volume 2: KEOD; ISBN 978-989-758-614-9, SciTePress, pages 223-227. DOI: 10.5220/0011585700003335
in Bibtex Style
@conference{keod22,
author={Imen Tanfouri and Fethi Jarray},
title={Genetic Algorithm and Latent Semantic Analysis based Documents Summarization Technique},
booktitle={Proceedings of the 14th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2022) - Volume 2: KEOD},
year={2022},
pages={223-227},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011585700003335},
isbn={978-989-758-614-9},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 14th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2022) - Volume 2: KEOD
TI - Genetic Algorithm and Latent Semantic Analysis based Documents Summarization Technique
SN - 978-989-758-614-9
AU - Tanfouri I.
AU - Jarray F.
PY - 2022
SP - 223
EP - 227
DO - 10.5220/0011585700003335
PB - SciTePress