Authors:
Hussein Awdeh
1
;
Adelle Abdallah
1
;
Gilles Bernard
1
and
Mohammad Hajjar
2
Affiliations:
1
LIASD Lab, Paris 8 University, 2 rue de la Liberté 93526 Saint-Denis, Cedex, France
;
2
Faculty of Technology, Lebanese University, Hisbeh Street, Saida, Lebanon
Keyword(s):
Arabic Sentence Corpora, Arabic Language, Supervised Learning, Arabic Natural Language Process, Information Retrieval, Standard Corpus.
Abstract:
The Arabic corpus, specifically the gold standard corpus is an important part of The Arabic Natural Language Processing. Described as a very large collection of texts stored on a computer, a corpus is considered as the most important source for semantic and syntax research and it can be a single language, a monolingual Corpus, or a multilingual Corpus. Then, an easy access to available corpora is highly needed in the Natural Language process (NLP) research community especially for language such as Arabic. Currently, there is no easy way to access to a comprehensive and updated list of available Arabic corpora. Our study in this paper, aims to present the results of a recent survey conducted to identify the list of the available Arabic corpora classified into categories and their resources.