Experimenting Machine-Learning Algorithms for Morphological Disambiguation of Arabic Texts
Bilel Elayeb, Bilel Elayeb, Mohamed Ettih, Raja Ayed, Raja Ayed
2022
Abstract
Arabic language is characterized by its complexity and its morphological and orthographic variations including syntactic and semantic diversity of a word. This specificity may cause Arabic morphological ambiguity. We present in this paper a new architecture for morphological disambiguation of Arabic texts. The latter can be treated as a classification problem where the set of morphological features’ values represent classes, and a classification algorithm is used to assign a class to each word’s occurrence based on the context. The first step consists of identifying the correct morphological analysis of a non-vocalized Arabic word using the morphological dependencies extracted from the corpus of vocalized texts. Then, we propose a method of transforming imperfect training datasets into perfect data having precise attributes and certain classes. We experiment this architecture on a set of machine-learning classifiers using a corpus of classic Arabic texts. Results highlight some statistically significant improvement of SVM and Naïve Bayes classifiers in terms of disambiguation rate.
DownloadPaper Citation
in Harvard Style
Elayeb B., Ettih M. and Ayed R. (2022). Experimenting Machine-Learning Algorithms for Morphological Disambiguation of Arabic Texts. In Proceedings of the 14th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART, ISBN 978-989-758-547-0, pages 851-862. DOI: 10.5220/0010917300003116
in Bibtex Style
@conference{icaart22,
author={Bilel Elayeb and Mohamed Ettih and Raja Ayed},
title={Experimenting Machine-Learning Algorithms for Morphological Disambiguation of Arabic Texts},
booktitle={Proceedings of the 14th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART,},
year={2022},
pages={851-862},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010917300003116},
isbn={978-989-758-547-0},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 14th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART,
TI - Experimenting Machine-Learning Algorithms for Morphological Disambiguation of Arabic Texts
SN - 978-989-758-547-0
AU - Elayeb B.
AU - Ettih M.
AU - Ayed R.
PY - 2022
SP - 851
EP - 862
DO - 10.5220/0010917300003116