Genetic Algorithm and Latent Semantic Analysis based Documents Summarization Technique

Imen Tanfouri, Fethi Jarray, Fethi Jarray

2022

Abstract

Automatic text summarization (ATS) is the process of generating or extracting a shorter text of the original document while preserving relevant and important information. Nowadays, it is a hot research topic in natural language processing with various applications, including social networks and the healthcare domain. The task of summarizing can be divided into two categories, extractive and abstractive. In this paper, we are concerned with extractive summarization for a single Arabic document. In this contribution, we propose a combination of semantic and combinatorial methods to summarize a document by clustering its content through topic modeling techniques and subsequently generating an extractive summary for each of the identified topics using genetic algorithms. This approach ensures that the final summary covers all important topics in the document. We achieve state-of-the-art performance on the common Arabic summarization benchmark datasets. The obtained results show the effectiveness of combining genetic algorithms (GA) and latent semantic analysis (LSA) for document summarization.

Download


Paper Citation


in Harvard Style

Tanfouri I. and Jarray F. (2022). Genetic Algorithm and Latent Semantic Analysis based Documents Summarization Technique. In Proceedings of the 14th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2022) - Volume 2: KEOD; ISBN 978-989-758-614-9, SciTePress, pages 223-227. DOI: 10.5220/0011585700003335


in Bibtex Style

@conference{keod22,
author={Imen Tanfouri and Fethi Jarray},
title={Genetic Algorithm and Latent Semantic Analysis based Documents Summarization Technique},
booktitle={Proceedings of the 14th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2022) - Volume 2: KEOD},
year={2022},
pages={223-227},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011585700003335},
isbn={978-989-758-614-9},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 14th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2022) - Volume 2: KEOD
TI - Genetic Algorithm and Latent Semantic Analysis based Documents Summarization Technique
SN - 978-989-758-614-9
AU - Tanfouri I.
AU - Jarray F.
PY - 2022
SP - 223
EP - 227
DO - 10.5220/0011585700003335
PB - SciTePress