loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Authors: Anastasios Lamproudis ; Aron Henriksson and Hercules Dalianis

Affiliation: Department of Computer and System Sciences, Stockholm University, Stockholm, Sweden

Keyword(s): Natural Language Processing, Language Models, Domain-adaptive Pretraining, Clinical Text, Swedish.

Abstract: Research has shown that using generic language models – specifically, BERT models – in specialized domains may be sub-optimal due to domain differences in language use and vocabulary. There are several techniques for developing domain-specific language models that leverage the use of existing generic language models, including continued and domain-adaptive pretraining with in-domain data. Here, we investigate a strategy based on using a domain-specific vocabulary, while leveraging a generic language model for initialization. The results demonstrate that domain-adaptive pretraining, in combination with a domain-specific vocabulary – as opposed to a general-domain vocabulary – yields improvements on two downstream clinical NLP tasks for Swedish. The results highlight the value of domain-adaptive pretraining when developing specialized language models and indicate that it is beneficial to adapt the vocabulary of the language model to the target domain prior to continued, domain-adaptive pretraining of a generic language model. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 3.16.70.99

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Lamproudis, A.; Henriksson, A. and Dalianis, H. (2022). Vocabulary Modifications for Domain-adaptive Pretraining of Clinical Language Models. In Proceedings of the 15th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2022) - HEALTHINF; ISBN 978-989-758-552-4; ISSN 2184-4305, SciTePress, pages 180-188. DOI: 10.5220/0010893800003123

@conference{healthinf22,
author={Anastasios Lamproudis. and Aron Henriksson. and Hercules Dalianis.},
title={Vocabulary Modifications for Domain-adaptive Pretraining of Clinical Language Models},
booktitle={Proceedings of the 15th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2022) - HEALTHINF},
year={2022},
pages={180-188},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010893800003123},
isbn={978-989-758-552-4},
issn={2184-4305},
}

TY - CONF

JO - Proceedings of the 15th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2022) - HEALTHINF
TI - Vocabulary Modifications for Domain-adaptive Pretraining of Clinical Language Models
SN - 978-989-758-552-4
IS - 2184-4305
AU - Lamproudis, A.
AU - Henriksson, A.
AU - Dalianis, H.
PY - 2022
SP - 180
EP - 188
DO - 10.5220/0010893800003123
PB - SciTePress