loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Authors: Corina Masanti 1 ; 2 ; Hans-Friedrich Witschel 2 and Kaspar Riesen 1

Affiliations: 1 Institute of Computer Science, University of Bern, 3012 Bern, Switzerland ; 2 Institute for Informations Systems, University of Appl. Sci. and Arts Northwestern Switzerland, 4600 Olten, Switzerland

Keyword(s): Boosting Techniques, Language Models, Synthetic Data, Real-Word Errors.

Abstract: With the introduction of transformer-based language models, research in error detection in text documents has significantly advanced. However, some significant research challenges remain. In the present paper, we aim to address the specific challenge of detecting real-word errors, i.e., words that are syntactically correct but semantically incorrect given the sentence context. In particular, we research three categories of frequent real-word errors in German, viz. verb conjugation errors, case errors, and capitalization errors. To address the scarcity of training data, especially for languages other than English, we propose to systematically incorporate synthetic data into the training process. To this end, we employ ensemble learning methods for language models. In particular, we propose to adapt the boosting technique to language model learning. Our experimental evaluation reveals that incorporating synthetic data in a non-systematic way enhances recall but lowers precision. In con trast, the proposed boosting approach improves the recall of the language model while maintaining its high precision. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 3.21.126.72

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Masanti, C., Witschel, H.-F. and Riesen, K. (2025). Boosting Language Models for Real-Word Error Detection. In Proceedings of the 14th International Conference on Pattern Recognition Applications and Methods - ICPRAM; ISBN 978-989-758-730-6; ISSN 2184-4313, SciTePress, pages 318-325. DOI: 10.5220/0013251500003905

@conference{icpram25,
author={Corina Masanti and Hans{-}Friedrich Witschel and Kaspar Riesen},
title={Boosting Language Models for Real-Word Error Detection},
booktitle={Proceedings of the 14th International Conference on Pattern Recognition Applications and Methods - ICPRAM},
year={2025},
pages={318-325},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013251500003905},
isbn={978-989-758-730-6},
issn={2184-4313},
}

TY - CONF

JO - Proceedings of the 14th International Conference on Pattern Recognition Applications and Methods - ICPRAM
TI - Boosting Language Models for Real-Word Error Detection
SN - 978-989-758-730-6
IS - 2184-4313
AU - Masanti, C.
AU - Witschel, H.
AU - Riesen, K.
PY - 2025
SP - 318
EP - 325
DO - 10.5220/0013251500003905
PB - SciTePress