Authors:
Maximilian Meidinger
and
Matthias Aßenmacher
Affiliation:
Department of Statistics, Ludwig-Maximilians-Universität, Munich, Germany
Keyword(s):
Benchmark, Multi-label Classification, Open-ended Responses, Transfer Learning, Pre-trained Language Models.
Abstract:
In order to evaluate transfer learning models for Natural Language Processing on a common ground, numerous general domain (sets of) benchmark data sets have been established throughout the last couple of years. Primarily, the proposed tasks are classification (binary, multi-class), regression or language generation. However, no benchmark data set for (extreme) multi-label classification relying on full-text inputs has been proposed in the area of social science survey research to this date. This constitutes an important gap, as a common data set for algorithm development in this field could lead to more reproducible, sustainable research. Thus, we provide a transparent and fully reproducible preparation of the 2008 American National Election Study (ANES) data set, which can be used for benchmark comparisons of different NLP models on the task of multi-label classification. In contrast to other data sets, our data set comprises full-text inputs instead of bag-of-words representations
or similar. Furthermore, we provide baseline performances of simple logistic regression models as well as performance values for recently established transfer learning architectures, namely BERT (Devlin et al., 2018), RoBERTa (Liu et al., 2019) and XLNet (Yang et al., 2019).
(More)