main contributions of this paper include:
• Propose TRE framework, a novel knowledge in-
jection pretraining framework for Transformer
models;
• Introduce TREBERT pretrained with special hi-
erarchical logical structures to improve perfor-
mance of BERT in the legal domain;
• Experiment in detail to prove the effectiveness as
well as understand the characteristic of the pro-
posed approach.
2 RELATED WORK
Methods using pretrained models have proven effec-
tive in a variety of problems in natural language pro-
cessing (Qiu et al., 2020). In legal text processing,
a subfield of NLP, there are also studies that apply
this approach and achieve good results. There are
two ways to pretrain these models. The first one is
training from scratch, in which, initialization of tok-
enizer, initialization of weight is done from scratch.
This approach often requires large amounts of data
so that the models can abstract the patterns on their
own. The second one is further pretraining, where the
pretrained models will continue to be trained to bet-
ter abstract the relationships between concepts in the
domain. Pretraining studies in law evolved with the
development of the architecture of deep learning.
Law2Vec (Chalkidis and Kampas, 2019) is trained
on a large vocabulary from UK, EU, Canada, Aus-
tralia, USA, and Japan law data. At that time, the au-
thors assumed that the resource on word embedding
for the field of law was limited, researchers often had
to use general-purpose word embeddings in the prob-
lem of law, which resulted in models not achieving
their full potential. The authors demonstrate that us-
ing word vectors trained from a large corpus of law re-
sults in better performance for deep learning models.
The idea of approaches using pretrained word vectors
is that it is feasible to measure the co-occurrence of
terms in a corpus. In terms of pretraining a model for
legal text processing, a broad corpus is better than a
narrow corpus, corpus with more legal terms is better
than a corpus in the general domain.
Besides methods using word vectors, meth-
ods using pretrained contexture embedding with
Transformer architecture are also competitive ap-
proaches in the field of law. The authors of Legal
BERT(Chalkidis et al., 2020) create variants in the
legal domain for BERT (Devlin et al., 2018), exam-
ining both pretraining from scratch and further pre-
training. Through experiment, the authors show that
both approaches have better performance in legal text
processing than using the original model pretrained
in the general domain. In the task of recognizing the
named entity for the contract, the version pretrained
from scratch outperformed the further pretrained ver-
sion. At the same time, JNLP Team (Nguyen et al.,
2020b) propose two systems using BERT pretrained
from scratch and further pretrained, which become
the best systems in case law entailment and statute
law question answering tasks in COLIEE 2020 (Ra-
belo et al., 2020).
Analyzing the previous pretraining methods in the
legal domain, we find that these methods have the
same thing in common, that they are pretrained unsu-
pervised on a large corpus. By doing so, we can cre-
ate language models that accurately describe the rela-
tionships of concepts, terms, and syntax used in legal
documents. These models can also find hidden rules
expressed in words, use extrapolation to make deci-
sions. However, it is impossible for the model to find
all the latent rules just by identifying co-occurring
terms. This is a process that requires much time, a
lot of computational power, a huge amount of data.
Similar to the issues in math problems, it is difficult
for the model to find logical rules through unsuper-
vised training. Our approach is a further pretraining
method based on a supervised paradigm.
3 METHOD
3.1 Legal Logical Structures
With a different purpose than daily life sentences, le-
gal sentences often require rigor and logic. As the
product of thousands of years of human civilization,
the logic of the existing laws reaches a very high level.
From a syntactic point of view, two important compo-
nents of a law sentence to form an equivalent logical
proposition are requisite and effectuation. Requisite
and effectuation can be formed from smaller logical
parts such as antecedence, consequence, and topic. In
their work, (Bach et al., 2010) indicate four different
cases of these structures.
With a classic example in the logic “If it rains, the
road is wet.”, we can easily see that this sentence has
a requisite segment and an effectuation segment. The
requisite and effectuation segments are often complex
in practice, they can be nested and even interleaved.
In legal sentences, besides requisite and effectuation,
another common logical structure is unless, which in-
dicates exceptions where the main requisite and ef-
fectuation do not apply. Let’s consider the following
example: “Gifts not in writing may be revoked by ei-
Logical Structure-based Pretrained Models for Legal Text Processing
525