A Machine Learning based Study on Classical Arabic Authorship Identification

Mohamed-Amine Boukhaled

2022

Abstract

Arabic is a widely spoken language with a rich and long written tradition spanning more than 14 centuries. Due to its very peculiars linguistic properties, it constitutes a difficult challenge to some natural language processing applications such as authorship identification, especially in its classical form. Authorship identification works done on Arabic have mainly focused on the investigation of style markers derived from either lexical or structural properties of the studied texts. Despite being effective to a certain degree, these types of style markers have been shown to be unreliable in addressing authorship problems for such language. In this contribution, we present a machine learning-based study on using different types of style markers for classical Arabic. Our aim is to compare the effectiveness of machine learning authorship identification using style markers that do not rely primarily on the lexical or structural dimension of language. We used three types of style markers relying mostly on the syntactic information. By way of illustration, we conducted a study and reported results of experiments done on a corpus of 700 books written by 20 eminent classical Arabic authors.

Download


Paper Citation


in Harvard Style

Boukhaled M. (2022). A Machine Learning based Study on Classical Arabic Authorship Identification. In Proceedings of the 14th International Conference on Agents and Artificial Intelligence - Volume 1: NLPinAI, ISBN 978-989-758-547-0, pages 489-495. DOI: 10.5220/0010969100003116


in Bibtex Style

@conference{nlpinai22,
author={Mohamed-Amine Boukhaled},
title={A Machine Learning based Study on Classical Arabic Authorship Identification},
booktitle={Proceedings of the 14th International Conference on Agents and Artificial Intelligence - Volume 1: NLPinAI,},
year={2022},
pages={489-495},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010969100003116},
isbn={978-989-758-547-0},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 14th International Conference on Agents and Artificial Intelligence - Volume 1: NLPinAI,
TI - A Machine Learning based Study on Classical Arabic Authorship Identification
SN - 978-989-758-547-0
AU - Boukhaled M.
PY - 2022
SP - 489
EP - 495
DO - 10.5220/0010969100003116