A Machine Learning based Study on Classical Arabic Authorship Identification
Mohamed-Amine Boukhaled
2022
Abstract
Arabic is a widely spoken language with a rich and long written tradition spanning more than 14 centuries. Due to its very peculiars linguistic properties, it constitutes a difficult challenge to some natural language processing applications such as authorship identification, especially in its classical form. Authorship identification works done on Arabic have mainly focused on the investigation of style markers derived from either lexical or structural properties of the studied texts. Despite being effective to a certain degree, these types of style markers have been shown to be unreliable in addressing authorship problems for such language. In this contribution, we present a machine learning-based study on using different types of style markers for classical Arabic. Our aim is to compare the effectiveness of machine learning authorship identification using style markers that do not rely primarily on the lexical or structural dimension of language. We used three types of style markers relying mostly on the syntactic information. By way of illustration, we conducted a study and reported results of experiments done on a corpus of 700 books written by 20 eminent classical Arabic authors.
DownloadPaper Citation
in Harvard Style
Boukhaled M. (2022). A Machine Learning based Study on Classical Arabic Authorship Identification. In Proceedings of the 14th International Conference on Agents and Artificial Intelligence - Volume 1: NLPinAI, ISBN 978-989-758-547-0, pages 489-495. DOI: 10.5220/0010969100003116
in Bibtex Style
@conference{nlpinai22,
author={Mohamed-Amine Boukhaled},
title={A Machine Learning based Study on Classical Arabic Authorship Identification},
booktitle={Proceedings of the 14th International Conference on Agents and Artificial Intelligence - Volume 1: NLPinAI,},
year={2022},
pages={489-495},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010969100003116},
isbn={978-989-758-547-0},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 14th International Conference on Agents and Artificial Intelligence - Volume 1: NLPinAI,
TI - A Machine Learning based Study on Classical Arabic Authorship Identification
SN - 978-989-758-547-0
AU - Boukhaled M.
PY - 2022
SP - 489
EP - 495
DO - 10.5220/0010969100003116