ones. This might be an indicator to look at smaller,
more problem-specific algorithms like feature-based
transfer learning to advance the research on automatic
coding in the future.
7 CONCLUSION
In this work, we provided an extension to the collec-
tion of commonly used benchmark data sets used for
evaluation transfer learning models for NLP. The full-
text data set encompasses a different task than most of
the others and thus widens the opportunities for care-
fully evaluating pre-trained models on a different kind
of challenge. Furthermore we propose a unified pre-
processing of the data set going along with a fixed
train-test split enabling a valid comparison against our
baselines. We evaluated the performance of state-of-
the-art transfer learning models on the ANES 2008
data set and compared them to a simple baseline
model. Our comparison illustrates that, despite the
extremely good performances of those models on bi-
nary, multi-class and previous multi-label classifica-
tion tasks, there is still a lot of room for improvement
concerning the performance on challenging multi-
label classification tasks on small to mid-sized data
sets.
ACKNOWLEDGEMENTS
We want to express our sincere gratitude to Christian
Heumann for his guidance and support during the pro-
cess of this research project. We would like to thank
Jon Krosnick and Matt Berent for their insightful ex-
planations via e-mail regarding the Open Ended Cod-
ing Project. This helped us to develop a better under-
standing for the initial data format. A special thanks
also goes to Dallas Card for his explanations regard-
ing the data splits from Card and Smith (2015).
REFERENCES
Adhikari, A., Ram, A., Tang, R., and Lin, J. (2019).
Docbert: Bert for document classification. arXiv
preprint arXiv:1904.08398.
Bird, S., Klein, E., and Loper, E. (2009). Natural Lan-
guage Processing with Python. Safari Books Online.
O’Reilly Media Inc, Sebastopol.
Bojanowski, P., Grave, E., Joulin, A., and Mikolov, T.
(2017). Enriching word vectors with subword infor-
mation. Transactions of the Association for Computa-
tional Linguistics, 5:135–146.
Card, D. and Smith, N. A. (2015). Automated coding of
open-ended survey responses.
CESSDA Training Team (2020). Cessda data management
expert guide.
Chang, W.-C., Yu, H.-F., Zhong, K., Yang, Y., and Dhillon,
I. (2019). X-bert: extreme multi-label text classifica-
tion using bidirectional encoder representations from
transformers. arXiv preprint arXiv:1905.02331.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K.
(2018). Bert: Pre-training of deep bidirectional trans-
formers for language understanding. arXiv preprint
arXiv:1810.04805.
Gibaja, E. and Ventura, S. (2014). Multi-label learning: a
review of the state of the art and ongoing research.
Wiley Interdisciplinary Reviews: Data Mining and
Knowledge Discovery, 4(6):411–444.
Gibaja, E. and Ventura, S. (2015). A tutorial on multilabel
learning. ACM Computing Surveys (CSUR), 47(3):1–
38.
Herrera, F., Charte, F., Rivera, A. J., and Del Jesus, M. J.
(2016). Multilabel classification. In Multilabel Clas-
sification, pages 17–31. Springer.
Hochreiter, S. and Schmidhuber, J. (1997). Long short-term
memory. Neural computation, 9(8):1735–1780.
Hoyle, L., Vardigan, M., Greenfield, J., Hume, S., Ionescu,
S., Iverson, J., Kunze, J., Radler, B., Thomas, W.,
Weibel, S., and Witt, M. (2016). Ddi and enhanced
data citation. IASSIST Quarterly, 39(3):30.
Inter-University Consortium For Political And Social Re-
search (ICSPR) (2012). Guide to social science data
preparation and archiving: Best practice throughout
the data life cycle.
Krosnick, J. A., Lupia, A., and Berent, M. K. (2012). 2008
open ended coding project.
Lai, G., Xie, Q., Liu, H., Yang, Y., and Hovy, E. (2017).
Race: Large-scale reading comprehension dataset
from examinations. arXiv preprint arXiv:1704.04683.
Lee, J.-S. and Hsiang, J. (2019). Patentbert: Patent classifi-
cation with fine-tuning a pre-trained bert model. arXiv
preprint arXiv:1906.02124.
Lewis, D. D., Yang, Y., Rose, T. G., and Li, F. (2004).
Rcv1: A new benchmark collection for text catego-
rization research. Journal of machine learning re-
search, 5(Apr):361–397.
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D.,
Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov,
V. (2019). Roberta: A robustly optimized bert pre-
training approach. arXiv preprint arXiv:1907.11692.
Lupia, A. (2018a). Coding open responses. In Vannette,
D. L. and Krosnick, J. A., editors, The Palgrave Hand-
book of Survey Research, pages 473–487. Springer In-
ternational Publishing, Cham.
Lupia, A. (2018b). How to improve coding for open-ended
survey data: Lessons from the anes. In The Pal-
grave Handbook of Survey Research, pages 121–127.
Springer.
Mencia, E. L. and F
¨
urnkranz, J. (2008). Efficient pair-
wise multilabel classification for large-scale problems
in the legal domain. In Joint European Conference
ICAART 2021 - 13th International Conference on Agents and Artificial Intelligence
872