Unsupervised Word Sense Disambiguation based on Word Embedding and Collocation
Shangzhuang Han, Kiyoaki Shirai
2021
Abstract
This paper proposes a novel unsupervised word sense disambiguation (WSD) method. It utilizes two useful features for WSD. One is contextual information of a target word. The similarity between words in a context and a sense of a target word is measured based on the pre-trained word embedding, then the most similar sense to the context is chosen. Furthermore, we introduce a procedure not to use irrelevant words in a context in a calculation of the similarity. The other is a collocation, which is an idiomatic phrase including a target word. High-precision rules to determine a sense by a collocation is automatically acquired from a raw corpus. Finally, the above two methods are integrated into our final WSD system. Results of the experiments using Senseval-3 English lexical sample task showed that our proposed method could improve the precision by 4.7 point against the baseline.
DownloadPaper Citation
in Harvard Style
Han S. and Shirai K. (2021). Unsupervised Word Sense Disambiguation based on Word Embedding and Collocation.In Proceedings of the 13th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, ISBN 978-989-758-484-8, pages 1218-1225. DOI: 10.5220/0010380112181225
in Bibtex Style
@conference{icaart21,
author={Shangzhuang Han and Kiyoaki Shirai},
title={Unsupervised Word Sense Disambiguation based on Word Embedding and Collocation},
booktitle={Proceedings of the 13th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,},
year={2021},
pages={1218-1225},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010380112181225},
isbn={978-989-758-484-8},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 13th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,
TI - Unsupervised Word Sense Disambiguation based on Word Embedding and Collocation
SN - 978-989-758-484-8
AU - Han S.
AU - Shirai K.
PY - 2021
SP - 1218
EP - 1225
DO - 10.5220/0010380112181225