ArabiaNer: A System to Extract Named Entities from Arabic Content
Mohammad Hudhud, Hamed Abdelhaq, Fadi Mohsen
2021
Abstract
The extraction of named entities from unstructured text is a crucial component in numerous Natural Language Processing (NLP) applications such as information retrieval, question answering, machine translation, to name but a few. Named-entity Recognition (NER) aims at locating proper nouns from unstructured text and classifying them into a predefined set of types, such as persons, locations, and organizations. There has been extensive research on improving the accuracy of NER in English text. For other languages such as Arabic, extracting Named-entities is quite challenging due to its morphological structure. In this paper, we introduce ArabiaNer, a system employing Conditional Random Field (CRF) learning algorithm with extensive feature engineering steps to effectively extract Arabic named Entities. ArabiaNer produced state-of-the-art results with f1-score of 91.31% when applied on the ANERcrop dataset.
DownloadPaper Citation
in Harvard Style
Hudhud M., Abdelhaq H. and Mohsen F. (2021). ArabiaNer: A System to Extract Named Entities from Arabic Content.In Proceedings of the 13th International Conference on Agents and Artificial Intelligence - Volume 1: NLPinAI, ISBN 978-989-758-484-8, pages 489-497. DOI: 10.5220/0010382404890497
in Bibtex Style
@conference{nlpinai21,
author={Mohammad Hudhud and Hamed Abdelhaq and Fadi Mohsen},
title={ArabiaNer: A System to Extract Named Entities from Arabic Content},
booktitle={Proceedings of the 13th International Conference on Agents and Artificial Intelligence - Volume 1: NLPinAI,},
year={2021},
pages={489-497},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010382404890497},
isbn={978-989-758-484-8},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 13th International Conference on Agents and Artificial Intelligence - Volume 1: NLPinAI,
TI - ArabiaNer: A System to Extract Named Entities from Arabic Content
SN - 978-989-758-484-8
AU - Hudhud M.
AU - Abdelhaq H.
AU - Mohsen F.
PY - 2021
SP - 489
EP - 497
DO - 10.5220/0010382404890497