Towards the Enrichment of Arabic WordNet with Big Corpora

Georges Lebboss, Gilles Bernard, Noureddine Aliane, Mohammad Hajjar

Abstract

This paper presents a method aiming to enrich Arabic WordNet with semantic clusters extracted from a large general corpus. As the Arabic language is poor in open digital linguistic resources, we built such a corpus (more than 7.5 billion words) with ad-hoc tools. We then applied GraPaVec, a new method for word vectorization using automatically generated frequency patterns, as well as state-of-the-art Word2Vec and Glove methods. Word vectors were fed to a Self Organizing Map neural network model; the clusterings produced were then compared for evaluation with Arabic WordNet existing synsets (sets of synonymous words). The evaluation yields a F-score of 82.1 % for GrapaVec, 55.1 % for Word2Vec's Skipgram, 52.2 % for CBOW and 56.6 % for Glove, which at least shows the interest of the context that GraPaVec takes into account. We end up by discussing parameters and possible biases.

Download


Paper Citation


in Harvard Style

Lebboss G., Bernard G., Aliane N. and Hajjar M. (2017). Towards the Enrichment of Arabic WordNet with Big Corpora.In Proceedings of the 9th International Joint Conference on Computational Intelligence - Volume 1: IJCCI, ISBN 978-989-758-274-5, pages 101-109. DOI: 10.5220/0006505701010109


in Bibtex Style

@conference{ijcci17,
author={Georges Lebboss and Gilles Bernard and Noureddine Aliane and Mohammad Hajjar},
title={Towards the Enrichment of Arabic WordNet with Big Corpora},
booktitle={Proceedings of the 9th International Joint Conference on Computational Intelligence - Volume 1: IJCCI,},
year={2017},
pages={101-109},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006505701010109},
isbn={978-989-758-274-5},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 9th International Joint Conference on Computational Intelligence - Volume 1: IJCCI,
TI - Towards the Enrichment of Arabic WordNet with Big Corpora
SN - 978-989-758-274-5
AU - Lebboss G.
AU - Bernard G.
AU - Aliane N.
AU - Hajjar M.
PY - 2017
SP - 101
EP - 109
DO - 10.5220/0006505701010109