Experimenting Machine-Learning Algorithms for Morphological Disambiguation of Arabic Texts

Bilel Elayeb, Bilel Elayeb, Mohamed Ettih, Raja Ayed, Raja Ayed

2022

Abstract

Arabic language is characterized by its complexity and its morphological and orthographic variations including syntactic and semantic diversity of a word. This specificity may cause Arabic morphological ambiguity. We present in this paper a new architecture for morphological disambiguation of Arabic texts. The latter can be treated as a classification problem where the set of morphological features’ values represent classes, and a classification algorithm is used to assign a class to each word’s occurrence based on the context. The first step consists of identifying the correct morphological analysis of a non-vocalized Arabic word using the morphological dependencies extracted from the corpus of vocalized texts. Then, we propose a method of transforming imperfect training datasets into perfect data having precise attributes and certain classes. We experiment this architecture on a set of machine-learning classifiers using a corpus of classic Arabic texts. Results highlight some statistically significant improvement of SVM and Naïve Bayes classifiers in terms of disambiguation rate.

Download


Paper Citation


in Harvard Style

Elayeb B., Ettih M. and Ayed R. (2022). Experimenting Machine-Learning Algorithms for Morphological Disambiguation of Arabic Texts. In Proceedings of the 14th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART, ISBN 978-989-758-547-0, pages 851-862. DOI: 10.5220/0010917300003116


in Bibtex Style

@conference{icaart22,
author={Bilel Elayeb and Mohamed Ettih and Raja Ayed},
title={Experimenting Machine-Learning Algorithms for Morphological Disambiguation of Arabic Texts},
booktitle={Proceedings of the 14th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART,},
year={2022},
pages={851-862},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010917300003116},
isbn={978-989-758-547-0},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 14th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART,
TI - Experimenting Machine-Learning Algorithms for Morphological Disambiguation of Arabic Texts
SN - 978-989-758-547-0
AU - Elayeb B.
AU - Ettih M.
AU - Ayed R.
PY - 2022
SP - 851
EP - 862
DO - 10.5220/0010917300003116