Disambiguating Clinical Abbreviations using Pre-trained Word Embeddings

Areej Jaber, Areej Jaber, Paloma Martínez

2021

Abstract

Abbreviations are broadly used in clinical texts and most of them have more than one meaning which makes them highly ambiguous. Determining the right sense of an abbreviation is considered a Word Sense Disambiguation (WSD) task in clinical natural language processing (NLP). Many approaches are applied to disambiguate abbreviations in clinical narrative. However, supervised machine learning approaches are studied in this field extensively and have proven a good performance at tackling this problem. We have investigated four strategies that integrate pre-trained word embedding as features to train two supervised machine learning models: Support Vector Machines (SVM) and Naive Bayes (NB). Our training features include information of the context of target abbreviation, which is applied on 500 sentences for each of the 13 abbreviations that have been extracted from public clinical notes data sets from the University of Minnesota-affiliated (UMN) Fairview Health Services in the Twin Cities. Our results showed that SVM performs better than NB in all four strategies; the highest accuracy being 97.08% using a pre-trained model trained from Wikipedia, PubMed and PMC (PubMedCentral) texts.

Download


Paper Citation


in Harvard Style

Jaber A. and Martínez P. (2021). Disambiguating Clinical Abbreviations using Pre-trained Word Embeddings. In Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2021) - Volume 5: HEALTHINF; ISBN 978-989-758-490-9, SciTePress, pages 501-508. DOI: 10.5220/0010256105010508


in Bibtex Style

@conference{healthinf21,
author={Areej Jaber and Paloma Martínez},
title={Disambiguating Clinical Abbreviations using Pre-trained Word Embeddings},
booktitle={Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2021) - Volume 5: HEALTHINF},
year={2021},
pages={501-508},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010256105010508},
isbn={978-989-758-490-9},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2021) - Volume 5: HEALTHINF
TI - Disambiguating Clinical Abbreviations using Pre-trained Word Embeddings
SN - 978-989-758-490-9
AU - Jaber A.
AU - Martínez P.
PY - 2021
SP - 501
EP - 508
DO - 10.5220/0010256105010508
PB - SciTePress