loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Mahbub Ul Alam 1 ; Aron Henriksson 1 ; Hideyuki Tanushi 2 ; Emil Thiman 3 ; 2 ; Pontus Naucler 3 ; 2 and Hercules Dalianis 1

Affiliations: 1 Department of Computer and Systems Sciences, Stockholm University, Stockholm, Sweden ; 2 Division of Infectious Disease, Department of Medicine, Karolinska Institutet, Stockholm, Sweden ; 3 Department of Infectious Diseases, Karolinska University Hospital, Stockholm, Sweden

Keyword(s): Natural Language Processing, Terminologies, Synonym Extraction, Word Embeddings, Clinical Text.

Abstract: Many natural language processing applications rely on the availability of domain-specific terminologies containing synonyms. To that end, semi-automatic methods for extracting additional synonyms of a given concept from corpora are useful, especially in low-resource domains and noisy genres such as clinical text, where nonstandard language use and misspellings are prevalent. In this study, prototype embeddings based on seed words were used to create representations for (i) specific urinary tract infection (UTI) symptoms and (ii) UTI symptoms in general. Four word embedding methods and two phrase detection methods were evaluated using clinical data from Karolinska University Hospital. It is shown that prototype embeddings can effectively capture semantic information related to UTI symptoms. Using prototype embeddings for specific UTI symptoms led to the extraction of more symptom terms compared to using prototype embeddings for UTI symptoms in general. Overall, 142 additional UTI symp tom terms were identified, yielding a more than 100% increment compared to the initial seed set. The mean average precision across all UTI symptoms was 0.51, and as high as 0.86 for one specific UTI symptom. This study provides an effective and cost-effective solution to terminology expansion with small amounts of labeled data. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 3.145.68.94

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Alam, M. ; Henriksson, A. ; Tanushi, H. ; Thiman, E. ; Naucler, P. and Dalianis, H. (2021). Terminology Expansion with Prototype Embeddings: Extracting Symptoms of Urinary Tract Infection from Clinical Text. In Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2021) - HEALTHINF; ISBN 978-989-758-490-9; ISSN 2184-4305, SciTePress, pages 47-57. DOI: 10.5220/0010190200470057

@conference{healthinf21,
author={Mahbub Ul Alam and Aron Henriksson and Hideyuki Tanushi and Emil Thiman and Pontus Naucler and Hercules Dalianis},
title={Terminology Expansion with Prototype Embeddings: Extracting Symptoms of Urinary Tract Infection from Clinical Text},
booktitle={Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2021) - HEALTHINF},
year={2021},
pages={47-57},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010190200470057},
isbn={978-989-758-490-9},
issn={2184-4305},
}

TY - CONF

JO - Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2021) - HEALTHINF
TI - Terminology Expansion with Prototype Embeddings: Extracting Symptoms of Urinary Tract Infection from Clinical Text
SN - 978-989-758-490-9
IS - 2184-4305
AU - Alam, M.
AU - Henriksson, A.
AU - Tanushi, H.
AU - Thiman, E.
AU - Naucler, P.
AU - Dalianis, H.
PY - 2021
SP - 47
EP - 57
DO - 10.5220/0010190200470057
PB - SciTePress