criteria. These rhetorical rules are applied to build
the rhetorical tree thereafter.
4.3 Construction of RST Tree
In order to build the various hierarchical structures
(RST trees) describing the structural organization of
the original text, this stage calls for a certain number
of rules and rhetorical diagrams.
The rhetorical rules are used to prioritize and
refine the RST tree. They use heuristics adopted
after observing the results. We give here as
representative a rhetorical rule.
Table 4: Example of a rhetorical rule.
IF
(index release is at the beginning of the sentence)
THEN
The sentence is annotated in connection with the
passage that preceds.
End if
The rule of table 4 can not linked between the
minimal units but between sentences and paragraphs
of text, because the rhetorical diagrams are
insufficient to represent the full text.
The various structures of the text are thus defined
in terms of compositions of applications of
diagrams, etc.
The rhetorical diagrams are represented under
five models of diagrams which can be recursively
used to describe texts of arbitrary size.
Generally, the most frequent diagram is the one
linking a single satellite to a single nucleus.
4.4 Selection of the Summary
Sentences
The final extract posts the nucleus units retained
after the simplification of the RST tree.
Basing on the analytical study that we have
conducted on a hundred summaries performed by
three experts, we determined a list of rhetorical
relations for each summary type.
The reduction of RST tree is done by the
removal of all the descendants which come from a
rhetorical relation that has not been selected for a
given summary type.
5 CONCLUSIONS
AND PERSPECTIVES
In this paper, we proposed a method of automatic
summarization of Arabic texts. Our method is
implemented in the ARSTRemue system and is
based on the RST technique (Mann, 1988), which
uses purely linguistic knowledge. The goal of our
proposal is to represent the text in the form of a tree
in order to determine the nucleus sentences forming
the final summary, then we generate the summary
according to the types of rhetorical relations that
correspond to the extract type (the extract type is
either chosen by the user or determined from its
profile; if not the indicative type is considered as a
default type).
The ARSTResume evaluation has showed
encouraging results based on 50 texts.
As a perspective, we plan to extend our
evaluation on a larger corpus and to study the effect
of other rhetorical rules which take into
consideration the morpho-syntaxic features of the
words forming the minimal units.
REFERENCES
Alrahabi, M., 2006. Annotation Sémantique des
Énonciations en Arabe", XXIV
ème
Congrès en
INFormatique des Organisations et Systèmes
d’Information et de décision, Hammamet-Tunisie.
Belguith, H., L., Baccour L., Mourad G., 2005.
Segmentation de textes arabes basée sur l'analyse
contextuelle des signes de ponctuations et de certaines
particules. Actes de la 12ème conférence sur le
Traitement Automatique des Langues Naturelles
TALN’2005, Vol. 1, p : 451–456, Dourdan-France.
Christophe, L., 2001. Une typologie des énumérations
basée sur les structures rhétoriques et architecturales
du texte. TALN – Tours, France.
Mâaloul, M.H., 2007. Al Lakas El’eli /
: Un
système de résumé automatique de documents arabes,
IBIMA.
Mann, W., C., Thompson, S., A., 1988. Rhetorical
structure theory: Toward a functional theory of text
organization.”Text, 8(3): p: 243 – 281.
Mathkour, H., I., Touir A., Al-Sanie, W., 2008. Parsing
Arabic Texts Using Rhetorical Structure Theory,
Journal of Computer Science 4 (9): p:713–720.
Minel, J-L., 2002. Filtrage sémantique : du résumé
automatique à la fouille de textes, Paris : Hermès
Science Publications.
Teufel, S., Marc, M., 1997. Sentence extraction as a
classification task. In Proceedings of the
ACL'97/EACL'97 Workshop on Intelligent Scalable
Text Summarization, p: 58-65, Madrid- Spain.
Udo, H., and Holger, S., 2000. Phrases as carriers of
coherence relations, In Lila R. Gleitman and Aravind
K. Joshi, Proceedings of the 22nd Annual.
AUTOMATIC SUMMARIZATION OF ARABIC TEXTS BASED ON RST TECHNIQUE
437