Authors:
Karima Meftouh
1
;
Kamel Smaili
2
and
Mohamed Tayeb Laskri
1
Affiliations:
1
Badji Mokhtar University, Algeria
;
2
INRIA-LORIA, France
Keyword(s):
Statistical language modeling, Arabic, French, Smoothing technique, n-gram model, Vocabulary, Perplexity, Performance.
Related
Ontology
Subjects/Areas/Topics:
Applications
;
Artificial Intelligence
;
Knowledge Engineering and Ontology Development
;
Knowledge-Based Systems
;
Natural Language Processing
;
Pattern Recognition
;
Symbolic Systems
Abstract:
In this paper, we propose a comparative study of statistical language models of Arabic and French. The objective of this study is to understand how to better model both Arabic and French. Several experiments using different smoothing techniques have been carried out. For French, trigram models are most appropriate whatever the smoothing technique used. For Arabic, the n-gram models of higher order smoothed with Witten Bell method are more efficient. Tests are achieved with comparable corpora and vocabularies in terms of size.