Authors:
Amina Merniz
;
Anja Habacha Chaibi
and
Henda Hajjami Ben Ghézala
Affiliation:
National School of Computer Science, University of Manouba, Tunisia
Keyword(s):
Text Summarization, Multi-document Summarization, Pagerank Algorithm, Thematic Annotation.
Abstract:
Reduce document(s) by keeping keys and significant sentences from a set of data is called text summarization. It has been around for a long time in natural language processing research, it is improving over the years due to a considerable number of methods and research in this area. The paper suggests Arabic multi-document text summarization. The originality of the approach is that the summary based on thematic annotation such as input documents are analyzed and segmented using LDA. Then segments of each topic are represented by a separate graph because of the redundancy problem in multi-document summarization. In the last step, the proposed approach applies a modified pagerank algorithm that utilizes cosine similarity measure as a weight between edges. Vertices that have high scores are essential. Therefore, they construct the final summary. To evaluate summary systems, researchers develop serval metrics divided into three categories, namely: automatic, semi-automatic and manual. Th
is study research chooses automatic evaluation methods for text summarization, mainly Rouge measure (Rouge-1, Rouge-2, Rouge-L, and Rouge-SU4).
(More)