Authors:
Philipp Kohl
1
;
Yoka Krämer
1
;
Claudia Fohry
2
and
Bodo Kraft
1
Affiliations:
1
FH Aachen, University of Applied Sciences, 52428 Jülich, Germany
;
2
University of Kassel, 34121 Kassel, Germany
Keyword(s):
Active Learning, Selective Sampling, Named Entity Recognition, Span Labeling, Annotation Effort.
Abstract:
Supervised learning requires a lot of annotated data, which makes the annotation process time-consuming and expensive. Active Learning (AL) offers a promising solution by reducing the number of labeled data needed while maintaining model performance. This work focuses on the application of supervised learning and AL for (named) entity recognition, which is a subdiscipline of Natural Language Processing (NLP). Despite the potential of AL in this area, there is still a limited understanding of the performance of different approaches. We address this gap by conducting a comparative performance analysis with diverse, carefully selected corpora and AL strategies. Thereby, we establish a standardized evaluation setting to ensure reproducibility and consistency across experiments. With our analysis, we discover scenarios where AL provides performance improvements and others where its benefits are limited. In particular, we find that strategies including historical information from the learn
ing process and maximizing entity information yield the most significant improvements. Our findings can guide researchers and practitioners in optimizing their annotation efforts.
(More)