Authors:
Mohamed Hédi Mâaloul
1
;
Iskandar Keskes
2
;
Lamia Hadrich Belguith
2
and
Philippe Blache
2
Affiliations:
1
Laboratoire LPL, France
;
2
MIRACL Laboratory, Tunisia
Keyword(s):
Rhetorical Structure Theory, Rhetorical relations, Linguistic markers, Automatic summarization of Arabic texts.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Artificial Intelligence and Decision Support Systems
;
Enterprise Information Systems
;
Human-Computer Interaction
;
Intelligent User Interfaces
;
Methodologies, Processes and Platforms
;
Model-Driven Software Development
;
Natural Language Interfaces to Intelligent Systems
;
Software Engineering
;
Systems Engineering
;
User Needs
Abstract:
We present in this paper an automatic summarization technique of Arabic texts, based on RST. We first present a corpus study which enabled us to specify, following empirical observations, a set of relations and rhetorical frames. Then, we present our method to automatically summarize Arabic texts. Finally, we present the architecture of the ARSTResume system. Our method is based on the Rhetorical Structure Theory (Mann, 1988) and uses linguistic knowledge. It relies on three pillars. The first consists in locating the rhetorical relations between the minimal units of the text by applying rhetorical rules. One of these units is the nucleus (the segment necessary to maintain coherence) and the other can be either nucleus or satellite (an optional segment). The second pillar is the representation and the simplification of the RST-tree that represents the source text in hierarchical form. The third pillar is the selection of sentences for the final summary, which takes into account the t
ype of the rhetorical relations chosen for the extract.
(More)