Authors:
Erick Velazquez Godinez
;
Zoltán Szlávik
;
Edeline Contempré
and
Robert-Jan Sips
Affiliation:
myTomorrows, Anthony Fokkerweg 61 1059CP, Amsterdam, The Netherlands
Keyword(s):
Medical Word Sense Disambiguation, Knowledge-based, Semantic Similarity, Word Embeddings, Data Understanding.
Abstract:
Word Sense Disambiguation (WSD) is an essential step for any NLP system; it can improve the performance of a more complex task, like information extraction, named entity linking, among others. Consequently, any error, while disambiguating a term, spreads to later stages with a snowball effect. Knowledge-based strategies for WSD offer the advantage of wider coverage of medical terminology than supervised algorithms. In this research, we present a knowledge-based approach for word sense disambiguation that can use different semantic similarity measures to determine the correct sense of a term in a given context. Our experiments show that when our approach used WordNet-based similarity measures, it achieved a very close performance when using the semantic measures based on word embeddings. We also constructed a small dataset from real-world data, where the feedback received from the annotators made us distinguish between true ambiguous terms and vague terms. This distinction needs to be
considered for future research for WSD algorithms and dataset construction. Finally, we analyzed a state-of-the-art dataset with linguistic variables that helped to explain our approach’s performance. Our analysis revealed that texts containing a high score of lexical richness and a high ratio of nouns and adjectives lead to better WSD performance.
(More)