Terminology Expansion with Prototype Embeddings: Extracting Symptoms of Urinary Tract Infection from Clinical Text

Mahbub Ul Alam, Aron Henriksson, Hideyuki Tanushi, Emil Thiman, Emil Thiman, Pontus Naucler, Pontus Naucler, Hercules Dalianis

2021

Abstract

Many natural language processing applications rely on the availability of domain-specific terminologies containing synonyms. To that end, semi-automatic methods for extracting additional synonyms of a given concept from corpora are useful, especially in low-resource domains and noisy genres such as clinical text, where nonstandard language use and misspellings are prevalent. In this study, prototype embeddings based on seed words were used to create representations for (i) specific urinary tract infection (UTI) symptoms and (ii) UTI symptoms in general. Four word embedding methods and two phrase detection methods were evaluated using clinical data from Karolinska University Hospital. It is shown that prototype embeddings can effectively capture semantic information related to UTI symptoms. Using prototype embeddings for specific UTI symptoms led to the extraction of more symptom terms compared to using prototype embeddings for UTI symptoms in general. Overall, 142 additional UTI symptom terms were identified, yielding a more than 100% increment compared to the initial seed set. The mean average precision across all UTI symptoms was 0.51, and as high as 0.86 for one specific UTI symptom. This study provides an effective and cost-effective solution to terminology expansion with small amounts of labeled data.

Download


Paper Citation


in Harvard Style

Alam M., Henriksson A., Tanushi H., Thiman E., Naucler P. and Dalianis H. (2021). Terminology Expansion with Prototype Embeddings: Extracting Symptoms of Urinary Tract Infection from Clinical Text. In Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2021) - Volume 5: HEALTHINF; ISBN 978-989-758-490-9, SciTePress, pages 47-57. DOI: 10.5220/0010190200470057


in Bibtex Style

@conference{healthinf21,
author={Mahbub Ul Alam and Aron Henriksson and Hideyuki Tanushi and Emil Thiman and Pontus Naucler and Hercules Dalianis},
title={Terminology Expansion with Prototype Embeddings: Extracting Symptoms of Urinary Tract Infection from Clinical Text},
booktitle={Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2021) - Volume 5: HEALTHINF},
year={2021},
pages={47-57},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010190200470057},
isbn={978-989-758-490-9},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2021) - Volume 5: HEALTHINF
TI - Terminology Expansion with Prototype Embeddings: Extracting Symptoms of Urinary Tract Infection from Clinical Text
SN - 978-989-758-490-9
AU - Alam M.
AU - Henriksson A.
AU - Tanushi H.
AU - Thiman E.
AU - Naucler P.
AU - Dalianis H.
PY - 2021
SP - 47
EP - 57
DO - 10.5220/0010190200470057
PB - SciTePress