Sentence Transformers and DistilBERT for Arabic Word Sense Induction
Rakia Saidi, Fethi Jarray, Fethi Jarray
2023
Abstract
Word sense induction (WSI) is a fundamental task in natural language processing (NLP) that consists in discovering the sense associated to each instance of a given target ambiguous word. In this paper, we propose a two-stage approach for solving Arabic WSI. In the first stage, we encode the input sentence into context representations using Transformer-based encoder such as BERT or DistilBERT. In the second stage, we apply clustering to the embedded corpus obtained in the first stage by using K-Means and Agglomerative Hierarchical Clustering (HAC). We evaluate our proposed method on the Arabic WSI summarization task. Experimental results show that our model achieves new state-of-the-art on both the Open Source Arabic Corpus (OSAC)(Saad and Ashour, 2010) and the SemEval arabic (2017).
DownloadPaper Citation
in Harvard Style
Saidi R. and Jarray F. (2023). Sentence Transformers and DistilBERT for Arabic Word Sense Induction. In Proceedings of the 15th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART, ISBN 978-989-758-623-1, pages 1020-1027. DOI: 10.5220/0011891700003393
in Bibtex Style
@conference{icaart23,
author={Rakia Saidi and Fethi Jarray},
title={Sentence Transformers and DistilBERT for Arabic Word Sense Induction},
booktitle={Proceedings of the 15th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART,},
year={2023},
pages={1020-1027},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011891700003393},
isbn={978-989-758-623-1},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 15th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART,
TI - Sentence Transformers and DistilBERT for Arabic Word Sense Induction
SN - 978-989-758-623-1
AU - Saidi R.
AU - Jarray F.
PY - 2023
SP - 1020
EP - 1027
DO - 10.5220/0011891700003393