Authors:
Stefanie Urchs
1
;
Jelena Mitrović
2
and
Michael Granitzer
2
Affiliations:
1
Chair of Software Engineering for Business Information Systems, Technical University of Munich, Garching, Germany
;
2
Chair of Data Science, University of Passau, Passau, Germany
Keyword(s):
Datasets, German Law, German Legal Writing Styles, Machine Learning, Natural Language Processing.
Abstract:
Law professionals are wordsmiths, their main tool is language. Therefore, the field of law produces a vast amount of written text. These texts have to be analysed, summarised, and used in the creation of new text, which is a task that reaches the limits of what is humanly possible. However, it is possible to automate this analysis by using Natural Language Processing techniques. To perform these techniques (annotated) text corpora are required. Unfortunately, publicly available (annotated) legal text corpora are rare. Even scarcer is the availability of (annotated) German legal text corpora. To meet this need for publicly available German legal text corpora this paper presents two German legal text corpora. The first corpus contains 32,748 decisions from 131 German courts, enriched with metadata. The second one is a subset of the first corpus and consists of 200 randomly chosen judgements. In these judgements a legal expert annotated the components conclusion, definition and subsumpt
ion of the German legal writing style Urteilsstil. Furthermore, the paper presents experiments on these corpora.
(More)