The ELFE System
Verifying Mathematical Proofs of Undergraduate Students
Maximilian Doré
1
and Krysia Broda
2
1
Department of Computing, RWTH Aachen University, Germany
2
Department of Computing, Imperial College London, 180 Queen’s Gate, London SW7 2BZ, U.K.
Keywords:
Didactics of Mathematics, Mathematical Reasoning, Proof Checking, Formal Mathematics.
Abstract:
ELFE is an interactive system for teaching basic proof methods in discrete mathematics. The user inputs a
mathematical text written in fair English which is converted to a special data-structure of first-order formulas.
Certain proof obligations implied by this intermediate representation are checked by automated theorem pro-
vers which try to either prove the obligations or find countermodels if an obligation is wrong. The result of the
verification process is then returned to the user. ELFE is implemented in HASKELL and can be accessed via a
reactive web interface or from the command line. Background libraries for sets, relations and functions have
been developed. It has been tested by students in the beginning of their mathematical studies.
1 INTRODUCTION
The Soviet researcher Victor Glushkov formulated in
1971 that "to understand a proof means to be able to
explain it to a machine that is operating with a re-
latively unsophisticated algorithm" (Glushkov, 1971,
p. 111). Remarkably, teaching mathematics in uni-
versity is still a mostly analogous endeavour. In or-
der to understand mathematical reasoning, students
practice writing proofs on paper and wait for the feed-
back of instructors to improve their understanding.
Immediate feedback would greatly increase the lear-
ning curve – it is often difficult to see when a proof is
complete or what steps are missing.
Such feedback could be provided by machines.
And indeed, many attempts have been made to forma-
lize mathematics. Most prominently, the interactive
theorem provers ISABELLE and COQ are advanced
systems; for instance COQ was used in proving the
Four-color-theorem (Gonthier, 2008). However, mat-
hematical beginners are overwhelmed by the capabili-
ties of such systems since using them requires a deep
understanding of workings of automated theorem pro-
vers (ATP).
The goal of this work is to provide users with a
system that gives feedback on proofs entered in a fai-
rly natural Mathematical language. Thereby the users
are detached from the technicalities of automated the-
orem provers. The ELFE system provides a proof of
concept that this is feasible and sensible. In the past
years, several attempts have been made to create a
proof verifier which accepts mathematical texts writ-
ten in fair English, one of which SYSTEM FOR AUTO-
MATED DEDUCTION (SAD) (Verchinine et al., 2007)
was most influential for our work. The SAD provides
an intuitive input language, called FORTHEL. Howe-
ver, the user still has to dig into the automated veri-
fication process to understand why a proof does not
work. The ELFE system in contrast processes the out-
put of background provers and tries to give counter-
models to wrong proofs.
Include functions.
Let A,B,C be set.
Let f: A B.
Let g: B C.
Lemma: gf is injective implies f is injective.
Proof:
Assume gf is injective.
Assume x A and x’ A and (f{x}) = (f{x’}).
Then ((gf){x}) = ((gf){x’}).
Hence x = x’.
Hence f is injective.
qed.
Figure 1: Exemplary ELFE text.
Consider the exemplary proof in Figure 1 which
is in fact a valid ELFE text. After including a back-
Doré, M. and Broda, K.
The ELFE System.
DOI: 10.5220/0006681000150026
In Proceedings of the 10th International Conference on Computer Supported Education (CSEDU 2018), pages 15-26
ISBN: 978-989-758-291-2
Copyright
c
2019 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
15
ground library and introducing specific sets A, B and
C and functions f and g, a lemma is proposed that if
the composition of f and g is injective, so the firstly
applied f must be injective. This lemma is proven by
the reasoning that if f maps two elements x and x’ to
the same element, the composition of f and g must
map them to the same elements. Since this composi-
tion is injective, it follows that x and x’ are the same
elements and f is thus injective. Note that (gf){x}
denotes the function application of gf which is put
in brackets to specify the precedence of the symbols.
We will learn in the following how the text is verified.
The remainder of the paper is structured as fol-
lows. We first give a brief overview of the imple-
mentation in Section 2 and web interface in Section
3. Next we introduce the Elfe language and proof
structures and justify the correctness of the formali-
sation in Section 4. Finally we evaluate our work in
Section 5 and compare it with popular current the-
orem provers in Section 6 before concluding with a
short discussion in Section 7.
An instance of the system can be found online
1
.
2 IMPLEMENTATION
The ELFE system can be accessed through a web in-
terface or a command-line interface (CLI) as shown
in Figure 2. The web interface provides an intuitive
way of accessing the systems output, while the CLI
offers more debugging functionality. We will take a
closer look at the the web interface in Section 3.
After the text is entered via one of its interfaces, it
will be parsed into an intermediary representation in
first-order logic. This proof representation is presen-
ted in Section 4.2. The Verifier takes the intermediary
proof representation and checks it for correctness by
calling several ATPs in parallel. If a proof obligation
is wrong, the Verifier tries to extract a countermodel
from the background provers. The result of this ve-
rification process is then returned to the user via the
chosen interface.
CLI
Web server
Parser
Verifier
ATP
Figure 2: Architecture of the ELFE system.
1
https://elfe-prover.org
The system is implemented in HASKELL, its
source code can be found online
2
. In order to parse
a text, a parser combinator is constructed with the li-
brary PARSEC
3
. The framework SCOTTY
4
is used
to provide a backend for the web interface. The re-
active frontend is implemented with the JAVASCRIPT
framework VUEJS
5
.
In order to send proof obligations to the back-
ground provers, the syntax standard TPTP (Sutcliffe,
2009) is used. Since the used ATP can be easily con-
figured, nearly all current systems can be interfaced.
So far, we have used the provers E PROVER
(Schulz, 2002), SPASS (Weidenbach et al., 2002) and
VAMPIRE (Riazanov and Voronkov, 2002) due to their
performance at the CADE System Competitions (Sut-
cliffe, 2016). Additionally, we used the provers Z3
(De Moura and Bjørner, 2008) and BEAGLE (Baum-
gartner et al., 2015) which do theorem proving mo-
dulo background theories. Even though we did not
fully utilize, for instance, their arithmetic proving fa-
cilities, it turned out efficient to call several provers in
parallel. E.g., E PROVER turned out to be fast in pro-
ving lemmas with equality while BEAGLE gave useful
countermodels for wrong proof obligations.
3 WEB INTERFACE
The front-end of the web interface shown in Figure 3
consists of a simple text field in which the user can
enter his proof. Above the input, several special cha-
racters can be entered by mouse click besides a button
that initiates the verification process.
Figure 3: The web interface of ELFE.
After the verification process has finished, colours
2
https://github.com/maxdore/elfe
3
https://hackage.haskell.org/package/parsec
4
https://hackage.haskell.org/package/scotty
5
https://vuejs.org/
CSEDU 2018 - 10th International Conference on Computer Supported Education
16
indicate the status of each text line as depicted in Fi-
gure 4. Since all text is green, the text was considered
correct. The user can inspect the verification process
by clicking in specific lines, more information about
the verification is then given in the box below the text
field. In our example, we learn the TPTP represen-
tation of the proof obligation x1 = x2 and that it was
proved by E PROVER. Note that the variables are pre-
fixed with c in the raw version since they are conside-
red constants at this point in the proof. The reason for
this will be explained in Section 4.3.
Figure 4: Verified correct ELFE text.
If the user enters an incorrect proof, as in Figure
5, red colours indicate that the verification process fai-
led. In the example in line 9 we wrongly concluded
that g must have mapped x and x’ to the same ele-
ments, which does not always hold. The background
provers could not prove this, but also did not find a
countermodel to the obligation.
Figure 5: An unsound ELFE text.
In the proof in Figure 6, a countermodel could be
found for a wrong conclusion. The lemma states that
if a relation R is included in S and S is symmetric, the
inverse of R must be included in S as well. While the
statement is in general correct, the proof is too impre-
cise and misses a case distinction. The countermodel
now tells us that if x and y are in the union of R and
its inverse, they might be in the inverse of R but not
in R itself. Thus, the conclusion in line 8 does not in
general hold. The correct version of this proof can be
found in the Appendix.
Figure 6: Countermodel for a wrong ELFE text.
4 ELFE LANGUAGE
The input language for ELFE is mathematical texts
written in a subset of natural mathematical language.
We will not introduce the whole feature set in this pa-
per and only examine the exemplary proof of Figure 1
in the following. Other language constructs like case
distinctions or sub proofs, which make a text less mo-
nolithic, are presented in (Doré, 2017).
In order to verify an ELFE text, we transform
it into a special data-structure which implies certain
proof obligations. Since this internal proof represen-
tation uses first-order logic, we will first introduce
how to transform the ELFE language into first-order
logic. This preprocessing will be presented in Section
4.1. Keywords like Then and Hence have special me-
anings in an ELFE proof and are used to structure a
mathematical proof. This structure is captured in an
intermediate proof representation which is introduced
in Section 4.2. The intermediate proof representation
The ELFE System
17
implies certain obligations which need to be checked
by the background provers. What these are will be
explained in Section 4.3.
4.1 From ELFE to First-order Logic
First-order logic is used to encode mathematical sta-
tements. Most transformations are straightforward
from ELFE to first-order logic, e.g., P implies f is in-
jective is transformed to P in jective( f ). In order
to make an ELFE text more legible, three commands
introduce meta-language features.
Include sets, relations.
Let A,B,C be set.
Notation function: f: A B.
Definition function: for all f.
f: A B iff for all x A. exists y B.
f[x,y] and
(for all y’ B. y = y’ or not f[x,y’]).
Let f: A B.
Definition injective: f is injective iff
for all x A, x’ A, y B. f[x,y] and f[x’,y] implies
x = x’.
Let g: B C.
Notation composition: gf.
Definition composition: (gf): A C and
(for all x A. for all y B. for all z C.
((f[x,y] and g[y,z]) implies (gf)[x,z])).
Figure 7: Excerpt of the functions library.
The command Include can be used to include the
axioms of a background theory. E.g., in our exam-
ple in Figure 1 we include the functions library with
Include functions. The user can easily create his own
background theory since these are written in the ELFE
language as well. You can find an excerpt of the
functions library in Figure 7.
The command Notation is used to introduce syn-
tactic sugars. One can write an arbitrary pattern of
Unicode characters to define such a pattern, e.g., No-
tation function: f: A B. The alphabetical parts of the
pattern, i.e., f, A and B are treated as placeholders for
arbitrary terms. Thus, all terms of the form
*: * *
with * being arbitrary terms are subsequently conside-
red instances of the predicate function. For example,
g: B C will be transformed internally to the first-
order formula f unction(g, B, C). Similarly, the nota-
tion for composition is defined as gf. Consider the
version of our proof in raw first-order logic in Figure
8, where the first line of our exemplary ELFE proof
Assume gf is injective is transformed into Assume
in jective(composition(g, f )). Note that notations can
be used both for term and predicate symbols.
Lemma: set(A), set(B), set(C), f unction( f , A, B),
f unction(g, B, C). in jective(composition(g, f ))
in jective( f ).
Proof:
Assume in jective(composition(g, f )).
Assume f unApp( f , x) = f unApp( f , x
0
)
in(x, A) in(x
0
, A).
Then f unApp(composition(g, f ), x)
= f unApp(composition(g, f ), x
0
).
Hence x = x
0
.
Hence in jective( f ).
qed.
Figure 8: The injectivity proof without syntactic sugar.
The command Let binds a predicate symbol to a
variable, effectively assigning a type to a symbol. By
writing Let A,B, C be set, we ensure that in all follo-
wing statements A, B and C have the predicate sym-
bol set. Consider Figure 8 which shows the injectivity
proof after removing meta-level language features. A,
B and C are introduced universally quantified as sets
in the lemma.
4.2 Statement Sequences
So far, we have only seen how single mathematical
statements are transformed into first-order formulas.
In order to capture the structure of a proof, we pro-
pose a special kind of data-structure, so-called state-
ment sequences. Intuitively, a statement holds a first-
order formula with an identifier and a proof. A proof
can consist of other statements in order to represent
complex proof objects.
Definition 1. Statement Sequences.
A statement S is a tuple ID × GOAL × PROOF where
ID is an alphanumeric string which is unique for
each statement
GOAL is a formula in first-order logic
PROOF is either
ASSUMED or
BYCONTEXT or
BYSUBCONTEXT Id
1
, ..., Id
n
or
BYSEQUENCE S
1
, ..., S
n
or
BYSPLIT S
1
, ..., S
n
A statement sequence is a finite list of statements
S
1
, ..., S
n
.
CSEDU 2018 - 10th International Conference on Computer Supported Education
18
If a statement S is proved BYSEQUENCE S
1
, ..., S
n
or BYSPLIT S
1
, ..., S
n
, we call S
1
, ..., S
n
the children
of S. If we want to access S from a child S
i
, we write
S
i
.PARENT. On the top level, a statement has no pa-
rent, thus S.PARENT = EMPTY.
Consider the example in Figure 9. We will depict a
statement visually in the following as a box with its ID
in the upper-left corner. The GOAL of a statement is
written in the header of a statement, the PROOF below.
A PROOF can take different forms to capture complex
proof structures. The axioms of a text however are
simply annotated by ASSUMED. E.g., the statements
S
f un
and S
in j
depict the statements resulting from the
definitions in Figure 7. In the functions library, nume-
rous additional definitions are made which are omit-
ted here. Statements annotated with ASSUMED will
be depicted green in the following. Below the axi-
oms, the statement S of the lemma of our text in Fi-
gure 1 follows. In order to prove this statement, we
need more advanced proof structures which will be
introduced in the next Section 4.3. The statement is
depicted red and with a dashed border to indicate that
its proof is not complete.
set(A), set(B), f . f unction( f , A, B) x A.y B.
relapp( f , x, y) (y
0
B.y = y
0
¬relapp( f , x, y
0
))
ASSUMED
S
f un
set(A), set(B), f unction( f , A, B).
in jective( f ) x A, x
0
A, y B.
relapp( f , x, y) relapp( f , x
0
, y) x = x
0
ASSUMED
S
in j
set(A), set(B), set(C),
f unction( f , A, B), f unction(g, B, C).
in jective(composition(g, f )) in jective( f )
S
Figure 9: Exemplary statement sequence.
To give an overview of the other types of PROOF:
A proof BYSEQUENCE and BYSPLIT makes it pos-
sible to nest more complex derivation sequences. A
statement annotated with BYCONTEXT will be chec-
ked by the background provers. BYSUBCONTEXT is
a special case of this proof type which allows for re-
stricting the context of the statement.
4.3 Proved Statements
Since we want to verify that a text is sound, we need to
introduce a soundness criteria for statements. Axioms
of a text are considered correct, but the lemma needs
a more subtle criteria.
First we will define which axioms are considered
relevant to a statement. Intuitively, the context of a
statement in a statement sequence are all statements
"above" it.
Definition 2. Context of a Statement.
Let S
1
, ...S
n
be a statement sequence. The context of
a statement S
k
is inductively defined as
Γ(EMPTY) =
/
0,
Γ(S
k
) = {S
1
.GOAL, ..., S
k1
.GOAL}
Γ(S
k
.PARENT).
For example, in Figure 9, the context of statement
S consists of the respective goals of S
f un
and S
in j
(as
well as other definitions of the library which are omit-
ted here). With that, we can define an appropriate
soundness criteria for statements.
Definition 3. Proved Statement.
Let S be a statement with S.GOAL = φ.
We call S proved iff Γ(S) φ.
In other words, a statement is considered proved if
it already followed from the theory created by its con-
text. The statements S
f un
and S
in j
in Figure 9 are not
proved since they build up the axioms of our theory.
The statement S however should follow from these ax-
ioms, i.e., should be a proved statement. In order to
show that S is proved, we will create a more complex
proof object in the following such that correctness of
the proof object implies that S followed from its con-
text.
We start by unfolding the outer implication of the
lemma set(A), set(B), set(C), f unction( f , A, B),
f unction(g, B, C). in jective(composition(g, f ))
in jective( f ). More specifically, we fix specific sets
A, B and C and functions f, g. As we see in Figure 10,
this is captured in our data-structure by introducing
another statement S
1
such that the proof of S is BYSE-
QUENCE S
1
. We represent proofs BYSEQUENCE by
putting the proof inside the statement to prove. The
difference between S and S
1
is that we removed the
quantifiers and replaced the variables with constants
(depicted in blue and with an overline
¯
A in case the
color does not show up).
In order to prove the new goal of S
1
, we do a so-
called unfolding of the implication. The left hand side
is put in the statement S
2
and annotated with ASSU-
MED such that it is in the context of S
3
, which holds
the right hand side of the implication.
The whole reduction from S to S
3
is done automa-
tically by the system. It detects if meta-variables are
contained in the goal and injects the proof automati-
cally.
The ELFE System
19
set(A), set(B), set(C),
f unction( f , A, B), f unction(g, B, C).
in jective(composition(g, f )) in jective( f )
S
(set(
¯
A) set(
¯
B) set(
¯
C)
f unction(
¯
f ,
¯
A,
¯
B) f unction( ¯g,
¯
B,
¯
C))
(in jective(composition( ¯g,
¯
f )) in jective(
¯
f ))
S
1
set(
¯
A) set(
¯
B) set(
¯
C)
f unction(
¯
f ,
¯
A,
¯
B) f unction( ¯g,
¯
B,
¯
C)
ASSUMED
S
2
in jective(composition( ¯g,
¯
f )) in jective(
¯
f )
S
3
Figure 10: Unfolding meta-variables.
With this, we have reduced the problem of sho-
wing that S is proved to showing that S
3
is proved.
In order to convince us that S
3
is proved indeed im-
plies that S is proved, we will first see that it is sound
to fix an universally quantified variable to a constant.
This can be done by natural deduction which has been
shown to be sound (Fitting, 1990). Concretely, our
construction is analogous to the following deduction
rule:
(I) :
P(a)
x.P(x)
with a not occurring in P(x)
We use this deduction rule in showing the sound-
ness of our construction in Lemma 1.
Lemma 1. Introduction.
Let S be a statement such that S.GOAL = x.P(x)
and a not occurring in S.GOAL, S.PROOF = BY-
SEQUENCE S
1
, S
1
.GOAL = P(a) and S
1
is proved:
x.P(x) (a not occurring)
S
P(a)
S
1
Then S is proved.
Proof. Since S
1
is proved and Γ(S) = Γ(S
1
), we have
Γ(S) P(a). With (I) it follows that Γ(S) x.P(x)
since a does not occur in P(x).
Next we have to show that it is sound to assume
the left hand side of an implication and deduce the
right hand side. Again, this is analogous to a natural
deduction rule:
( I) :
P ` Q
P Q
This rule is used in the soundness proof in Lemma
2.
Lemma 2. Introduction.
Let S be a statement such that S.GOAL = P Q,
S.PROOF = BYSEQUENCE S
1
, S
2
, S
1
.GOAL = P,
S
2
.GOAL = Q and S
2
is proved:
P Q
S
P
S
1
Q
S
2
Then S is proved.
Proof. We have Γ(S
2
) = Γ(S) {P}. Since S
2
is pro-
ved, Γ(S) P Q. With ( I) it follows Γ(S) P
Q.
Now we will construct the proof of S
3
as shown
in Figure 11. In the proof text in Figure 1, we ex-
plicitly wrote Assume in jective(composition(g, f )).
[...] Hence in jective( f ). Analogous to the unfolding
of the implication of S
1
in Figure 10, we assume the
left hand side and now have to prove the right hand
side. Again, this is sound as proved in Lemma 2.
in jective(composition( ¯g,
¯
f )) in jective(
¯
f )
S
3
in jective(composition( ¯g,
¯
f ))
ASSUMED
S
4
in jective(
¯
f )
S
5
Figure 11: Unfolding an implication.
Now, we have to prove that in jective( f ) holds.
In order to do that, the proof in Figure 1 uses the
definition of injectivity: Assume f unApp( f , x) =
f unApp( f , x
0
) in(x, A) in(x
0
, A). [...] Hence x = x
0
.
In other words, we prove an alternative goal. In order
to retain a sound construction, we have to show two
things: First, that the alternative goal indeed implies
the original goal and second, that the alternative goal
holds. This is represented in Figure 12 by putting two
statements S
6
and S
7
below the goal of S
5
. This de-
picts that the PROOF of S
5
is BYSPLIT S
6
, S
7
. Note
that a proof BYSPLIT leads to a division of contexts,
i.e., the derived goal of S
6
will not be put into the con-
text of S
7
. Thus, the proof BYSPLIT allows for a finer
scoping of statements.
CSEDU 2018 - 10th International Conference on Computer Supported Education
20
in jective(
¯
f )
S
5
(x, x
0
. f unApp(
¯
f , x) = f unApp(
¯
f , x
0
)
in(x,
¯
A) in(x
0
,
¯
A) x = x
0
) in jective(
¯
f )
BYCONTEXT
S
6
x, x
0
. f unApp(
¯
f , x) = f unApp(
¯
f , x
0
)
in(x,
¯
A) in(x
0
,
¯
A) x = x
0
S
7
f unApp(
¯
f , ¯x) = f unApp(
¯
f ,
¯
x
0
) in(¯x,
¯
A) in(
¯
x
0
,
¯
A)
¯x =
¯
x
0
S
8
f unApp(
¯
f , ¯x) = f unApp(
¯
f ,
¯
x
0
) in(¯x,
¯
A) in(
¯
x
0
,
¯
A)
ASSUMED
S
9
¯x =
¯
x
0
S
10
Figure 12: Proving an alternative goal.
The statement S
6
contains the soundness check.
Its proof is BYCONTEXT which means that it will be
sent to the background provers. If some ATP finds
a proof, a statement annotated with BYCONTEXT is
considered proved. This is here the case if our defi-
nition of injectivity indeed allows us to prove this al-
ternative goal. We will depict statements proved BY-
CONTEXT in orange in the following.
Statement S
7
contains the proof of the alternative
goal. Again, the universally quantified variables x and
x
0
are fixed to constants. Afterwards, the implication
is unfolded. As seen in Lemma 1 and Lemma 2, this
is sound.
To convince us that this construction is sound, we
have to use two additional natural deduction rules:
(I) :
P Q
P Q
( E) :
P Q P
Q
These rules will be used in the proof of Lemma
3 which is an abstract case of our approach in Figure
12.
Lemma 3. Splitting a Goal.
Let S be a statement such that S.GOAL = P, S.PROOF
= BYSPLIT S
0
, S
1
, ..., S
n
, S
0
.GOAL = Q
1
... Q
n
P, S
i
.GOAL = Q
i
and S
i
is proved for i = 1, ..., n:
P
S
Q
1
... Q
n
P
S
0
Q
1
S
1
...
Q
n
S
n
Then S is proved.
Proof. We have Γ(S) = Γ(S
i
) for i = 0, . . . , n. With
S
i
proved for i = 1, . . . , n we have Γ(S) Q
i
for i =
1, . . . , n. With (I) it follows Γ(S) Q
1
··· Q
n
.
With S
0
proved we also have Γ(S) Q
1
...
Q
n
P. Thus, we can deduce with ( E) that
Γ(S) P.
The remaining bit to prove is the goal of S
10
,
i.e., that x = x
0
follows from the context. Ho-
wever, in the text in Figure 1 the next deriva-
tion step is Then f unApp(composition(g, f ), x) =
f unApp(composition(g, f ), x
0
).. This statement does
not change the overall goal we want to proof, but gi-
ves a cornerstone to how one can derive the goal. As
depicted in Figure 13, this additional finding will first
be verified by annotating statement S
11
with BYCON-
TEXT. Afterwards, the actual goal is proved. Since
the user gave no additional proving methods, we send
the final goal x = x
0
to the background provers as well.
¯x =
¯
x
0
S
10
f unApp(composition( ¯g,
¯
f ), ¯x) =
f unApp(composition( ¯g,
¯
f ),
¯
x
0
)
BYCONTEXT
S
11
¯x =
¯
x
0
BYCONTEXT
S
12
Figure 13: Giving a cornerstone to a proof.
If S
11
can be derived by the background provers
already, the theory created by the context of S
12
is not
extended by adding S
11
. This is formally reflected in
Lemma 4.
Lemma 4. Deriving a Cornerstone.
Let S be a statement such that S.GOAL = P, S.PROOF
= BYSEQUENCE S
1
, S
2
, S
1
.GOAL = Q, S
2
.GOAL = P
and S
1
is proved:
The ELFE System
21
P
S
Q
S
1
P
S
2
Then S is proved.
Proof. Since S
1
is proved, we have Γ(S
1
) Q. Be-
cause of Γ(S) = Γ(S
1
) already Γ(S) Q. Hence, with
S
2
proved we have Γ(S
2
) P and it follows that alre-
ady Γ(S) P.
This completes our construction of the internal
proof representation of the lemma in Figure 1. Three
statements S
6
, S
11
and S
12
are annotated BYCON-
TEXT and will be sent to the background provers. If
each of these three statements can be derived from
their respective contexts, we can conclude that the ori-
ginal goal of S already followed from its context. The
proof of the lemma is then considered sound.
5 EVALUATION
The tool was tested by students in the beginning of
their mathematical studies. In Section 5.1, we will
take a look at their evaluation and suggestions. We
also formalized some more advanced theorems in the
system, e.g., Cantor’s theorem and the Knaster-Tarski
theorem, and will discuss our experiences as well as
the system’s inherent limitations in Section 5.2.
5.1 User Feedback
The system was tested with 12 undergraduates of
Computing, Mathematics and Electrical Engineering
at Imperial College London, of which none had prior
experience with interactive theorem provers. Due to
the limited time frame we were not able to evaluate
the system further.
At first, the students were given the proof sketch
shown in Figure 14. An intuition about the proof was
given in natural language, i.e., it was explained we
want to prove that the complement of the complement
of a set is the set itself. All students were able to iden-
tify the proof pattern, i.e., that we show set inclusion
in both directions. This is a very common proof pro-
cedure to show equality of two sets. When writing the
remaining bit of the proof, the students successfully
resolved syntactic errors by inspecting the parsing er-
rors and all completed the proof. The syntactic cha-
racteristics of ELFE, e.g., that Then and Hence have
distinct meanings, did not pose an obstacle since they
only had to copy the structure of the first sub proof.
However, only two students were able to figure out the
meanings of these language features, i.e., that Hence
closes an implication whereas Then is for giving cor-
nerstones to a proof. This suggests that using ELFE
requires an introduction to the different language fea-
tures and users cannot start writing proofs right away.
Later on, the testers were given more complex
proof sketches. Students who were in general comfor-
table with mathematical reasoning were able to com-
plete the proofs. The other students had problems
grasping the idea of the proof and did not start to write
a proof in the system.
Include sets.
Let A be set.
Let x be element.
Lemma: ((A
C
)
C
) = A.
Proof:
Proof ((A
C
)
C
) A:
Assume x ((A
C
)
C
).
Then not x (A
C
).
Hence x A.
qed.
Proof A ((A
C
)
C
):
...
qed.
qed.
Figure 14: Proof to be completed in the evaluation.
After the students tried the system, they were
given the following statements and had to indicate
with 1 (strongly agree) to 6 (strongly disagree) their
agreement with the statements.
I enjoy writing mathematical proofs.
Mean: 3.3 – Median: 3,5
I find writing mathematical proofs difficult.
Mean: 2.6 – Median: 2
I think computers can be of use in learning how to
write mathematical proofs.
Mean: 2.3 – Median: 2
I enjoyed writing mathematical proofs in the
ELFE system.
Mean: 2.5 – Median: 2
I found the feedback of the system helpful.
Mean: 2.6 – Median: 2
I would like to know how ELFE and interactive
theorem proving works.
Mean: 1.8 – Median: 1
In text form, they could also write down what they
liked about the system and what should be improved.
CSEDU 2018 - 10th International Conference on Computer Supported Education
22
It was highlighted that the language was "simple and
clear" and did not "get in the way of the proof". They
liked the "very understandable and simple UI" and its
reactiveness. As improvements for the user interface
they proposed autocompletion features of the proofs
and syntax highlighting. The given raw translations
of the mathematical text were not easy to understand.
One user also pointed out that the background provers
are sometimes too clever thus, a text is accepted
even if crucial cornerstones of a proof are missing.
He would like to have a criteria on when a proof is
"complete" for humans and not only for a computer.
As we see, the testers were in general not espe-
cially keen on writing mathematical proofs. Writing
proofs in ELFE made it a bit more enjoyable. The sy-
stem seems to have succeeded in waking an interest
for interactive theorem proving.
5.2 Limits of the Current System
Since first-order logic is an intuitive way to write
down proofs in set theory and relations, proofs in
these domains could be written down easily. Working
with the functions library was more complex. Some
additional lemmas and function symbols which were
introduced to make a proof more readable for humans
increase the difficulty for the background provers. If
the background provers take too long in proof search,
it is hard to assess if a proof itself is wrong or only ta-
kes a long time to prove. Debugging a failing proof is
still difficult with the user interface provided by ELFE.
In most cases, the raw proof obligations given to the
background provers were more helpful in finding bugs
by manually deleting and changing the given premi-
ses. This is due to constructions like Let which shor-
ten a proof, but also hide what is going on inside the
system.
The Notation command has turned out to be a very
powerful construct to ease the readability of proofs.
New notations can be introduced easily and make a
proof look quite intuitive.
BEAGLE was able to provide countermodels to a
wrong proof only if the number of premises was limi-
ted. Restricting the context of a derivation step incre-
ased the success rate significantly. However, for new
users it is certainly difficult to relate a countermodel
to the entered text since it is given in the raw TPTP
format.
Another problem that occurred was that the back-
ground provers were too clever. They sometimes find
intermediate steps that are not at all obvious for a hu-
man reader. This cleverness is particularly problema-
tic with proofs by contradiction. If the background
provers find the inconsistency caused by the assump-
tion, all derivations a user may make are trivially also
true, even though they do not make sense in the proof.
Writing larger proof texts in straightforward dom-
ains as set theory can be easily done in ELFE. Ho-
wever, some properties like well-foundedness are not
expressible at all in first-order logic, so it might be ex-
pedient for future versions to use higher-order logic at
the core of statement sequences.
6 RELATED WORK
In Section 6.1, we will take a look at mathematical
text verifiers like the SYSTEM FOR AUTOMATED DE-
DUCTION, which heavily influenced this project. In
Section 6.2, we will compare ELFE to the popular in-
teractive theorem provers ISABELLE and COQ.
6.1 Mathematical Text Verifier
In the following, we will present two projects aimed
for verifying mathematical texts: The SYSTEM FOR
AUTOMATED DEDUCTION (SAD) in Section 6.1.1
and NAPROCHE in Section 6.1.2.
6.1.1 SYSTEM FOR AUTOMATED DEDUCTION
The SAD was developed at the University Paris and
the Taras Shevchenko National University of Kyiv. It
continues the project "Algoritm Ochevidnosti" (algo-
rithm of obviousness) which was initiated by the so-
viet researcher Victor Glushkov in the 1960s. His goal
was to develop a tool that shortens long but "obvious"
proofs to users. These omitted parts should be veri-
fied by automated theorem provers. (Verchinine and
Paskevich, 2000)
SAD uses the input language FORTHEL which al-
lows for expressing mathematical statements intuiti-
vely. FORTHEL texts are converted to an ordered set
of first-order formulas. The structure of the initial text
is preserved such that necessary proof tasks can be de-
fined. These tasks are then given to an ATP. The inter-
nal reasoner may simplify tasks and omit trivial state-
ments. Afterwards, the verification status of the text
is given to the user. For each proof task, the result of
the used ATP is returned. This allows to inspect possi-
ble sources of failing tasks, but requires knowledge of
how the background provers work. (Verchinine et al.,
2007)
Currently, it is not possible to work with functions
in SAD due to the lack of background libraries. Thus,
we could not implement the injectivity proof of Figure
1 in SAD.
The ELFE System
23
6.1.2 NAPROCHE
The NAPROCHE system was a joint project between
mathematicians at the University of Bonn and lin-
guists at the University of Duisburg-Essen. Its cen-
tral goal was to develop a controlled natural language
(CNL) which checks semi-formal mathematical texts.
The input are texts in a LATEX style language, con-
sisting of mathematical formulas embedded in a con-
trolled natural language. (Cramer et al., 2009)
To extract the semantics of a CNL text, NAPRO-
CHE adapts a concept from computational linguistics:
Proof Representation Structures (PRS) enrich the lin-
guistic concept of Discourse Representation Structu-
res in such a way that they can represent mathemati-
cal statements and their relations. The semantics of
PRS have been researched extensively; however, the
project is not continued and has no working version
available.
6.2 Interactive Theorem Prover
The classical approach to interactive theorem proving
integrates a human user strongly in the technical
verification process. We will briefly introduce the
popular provers ISABELLE in Section 6.2.1 and COQ
in Section 6.2.2 with their respective formalization of
the injectivity proof in Figure 1.
6.2.1 ISABELLE
ISABELLE is a joint project of Cambridge Univer-
sity and the Technical University Munich. It supports
polymorphic higher-order logic, augmented with ax-
iomatic type classes. At present it provides useful
proof procedures for Constructive Type Theory, va-
rious first-order logics, Zermelo-Fraenkel set theory
and higher-order logic. (Nipkow et al., 2002)
Consider the injectivity proof written in ISA-
BELLE in Figure 15. The predicate inj_on f A ex-
presses that function f is injective on the domain A.
The proof structure is close to the one used in ELFE:
We introduce arbitrary x and x’ which f maps to the
same element and conclude that they must have been
the same. One has to specify the automated proof
tactics and used premises: In our example, the de-
rivations are made by term rewriting using definiti-
ons comp_def and inj_on_def from the background li-
brary.
In comparison to ELFE, the user is therefore more
involved in the automated verification process. Since
2007, ISABELLE offers the extension SLEDGEHAM-
MER. By calling several ATP, SLEDGEHAMMER tries
to determine which premises are important to a goal.
It then tries to reconstruct the automated proofs with
methods implemented in ISABELLE. In fact, the me-
chanical prove methods needed in Figure 15 can be
found by invoking SLEDGEHAMMER.
theory InjectiveComposition
imports Fun
begin
lemma:
assumes "inj_on (g f) A"
shows "inj_on f A"
proof
fix x x’
assume "x A" and "x’ A"
moreover assume "f x = f x’"
then have "(g f) x = (g f) x’"
by (auto simp: comp_def)
ultimately show "x = x’" using assms
by (auto simp: inj_on_def)
qed
Figure 15: Proof in ISABELLE.
In a recent study, 34% of nontrivial goals contai-
ned in representative ISABELLE texts could be pro-
ved by SLEDGEHAMMER. With this extension, ISA-
BELLE allows beginners to prove challenging theo-
rems. The creators note that SLEDGEHAMMER was
not designed as a tool to teach ISABELLE since it fo-
cused primarily on experienced users. However, it
changed the way ISABELLE is taught. Beginners do
not have to learn about low level proving tactics and
how they work but can focus on the proof from a hig-
her level. (Paulson and Blanchette, 2010)
Require Import Basics.
Definition injective {A B} (f : A B) :=
forall x y : A, f x = f y x = y.
Theorem c_inj (A B C:Type) (f:AB) (g:BC):
(injective (compose g f)) injective f.
Proof.
intuition.
intros x x’.
pose (f x = f x’).
intuition.
assert (g (f x) = g (f x’)).
{ elim H0. rewrite H0. trivial. }
auto.
Qed.
Figure 16: Proof in COQ.
6.2.2 COQ
COQ is an interactive theorem prover initially develo-
ped 1984 at INRIA. It is based on the Curry–Howard
CSEDU 2018 - 10th International Conference on Computer Supported Education
24
correspondence which relates types to classical logic.
In order to prove a proposition, one has to construct a
term with the type corresponding to the proposition.
Consider the injectivity proof implemented in Fi-
gure 17. Again, the idea of the proof is to show that f
x = f x’ implies x = x’. However, we have to explicitly
apply rewrite techniques to make the derivation steps.
The tactic intuition says that we can assume a left hand
side of an implication and then prove the right hand
side. Afterwards, we want to make sure that we can
just apply g on both sides. We have to rewrite both
sides of H0, which stands for f x = f x’, in order to get
to our assertion. The final goal x = x’ is then derived
by applying the rewrite technique auto.
As we see, the translation process of mathemati-
cal texts to functional programs requires a good un-
derstanding of type theory and is not suitable for mat-
hematical beginners.
Consequently, the most prominent current inte-
ractive theorem provers are of a deeply technical na-
tural. They are thought of as programming languages
that happen to prove theorems, and not digitisations
of mathematical language.
7 DISCUSSION
This paper presented ELFE, a system that checks
proofs in discrete mathematics. Entered texts are
transformed to statement sequences, a special data-
structure of first-order formulas. Remaining proof
obligations are then checked by background pro-
vers. Statement sequences are a powerful interme-
diate proof representation which can hold manifold
proof techniques. The clear soundness criteria allows
for extending the proof techniques easily.
Students who tested the system liked especially
that they got immediate feedback on their proof work.
The implemented background libraries allow for an
easy start. Once a user becomes familiar with the tool,
he can easily construct his own background libraries.
Certainly, more evaluation of the system in pedagogi-
cal environments is necessary. It will be particularly
interesting to examine how teachers can incorporate
the system in their courses.
The language constructs presented here were the
result of formalizing several exemplary proofs. If one
formalizes more proofs, he will probably feel the need
for additional proving methods. If one can map the
proving methods soundly into statement sequences,
this should be easy to implement.
In addition to giving countermodels for wrong
proofs, one could utilize more features of the back-
ground provers. Many provers return in-depth infor-
mation about the proof of a conjecture. This informa-
tion could be useful for users in order to understand
why a proof works or fails. The challenge is to present
the technical output of the background provers via an
intuitive interface. In order to do proofs with arithme-
tic, it might be useful to utilize already implemented
arithmetic capabilities of background provers such as
Z3 and BEAGLE. Expert users presumably prefer sy-
stems with deep insight into the technical verification
process, but an abstraction is necessary if we want to
use computers in teaching mathematics.
The biggest structural limitation of ELFE is that
it internally uses first-order logic. E.g., with the cur-
rent capabilities it is not straightforward to implement
proofs by induction. The recent years have seen in-
teresting advances in automated theorem proving of
typed higher-order logic. A new standard for typed
higher-order-logic has been added to TPTP which
is used by several provers like LEO-II (Benzmüller
et al., 2015) and SATALLAX (Brown, 2012). A next
version of ELFE could use this development in order
to provide a more powerful way of expressing mathe-
matics. This requires to introduce a meaningful type
system for ELFE.
REFERENCES
Parsec: Monadic parser combinators.
https://hackage.haskell.org/package/parsec. Acces-
sed: 2017-01-03.
Scotty: Haskell web framework.
https://hackage.haskell.org/package/scotty. Accessed:
2017-01-03.
Vuejs: The progressive javascript framework.
https://vuejs.org/. Accessed: 2017-01-03.
Baumgartner, P., Bax, J., and Waldmann, U. (2015). BE-
AGLE–a hierarchic superposition theorem prover. In
Proc. CADE-25, pages 367–377.
Benzmüller, C., Sultana, N., Paulson, L. C., and Theiß, F.
(2015). The higher-order prover LEO-II. Journal of
Automated Reasoning, 55(4):389–404.
Brown, C. E. (2012). SATALLAX: An automatic higher-
order prover. In Proc. IJCAR 2012, pages 111–117.
Cramer, M., Fisseni, B., Koepke, P., Kühlwein, D., Schrö-
der, B., and Veldman, J. (2009). The NAPROCHE
project–controlled natural language proof checking of
mathematical texts. In Proc. CNL 2009, volume 5972,
pages 170–186.
De Moura, L. and Bjørner, N. (2008). Z3: An efficient SMT
solver. Tools and Algorithms for the Construction and
Analysis of Systems, pages 337–340.
Doré, M. (2017). ELFE – An interactive theorem prover for
undergraduate students. Bachelor thesis.
Fitting, M. (1990). First-order Logic and Automated Theo-
rem Proving. Springer, 2nd edition.
The ELFE System
25
Glushkov, V. M. (1971). Problems in the theory of auto-
mata and artificial intelligence. Journal of Cyberne-
tics, 1(1):97–113.
Gonthier, G. (2008). Formal proof–the four-color theorem.
Notices of the AMS, 55(11):1382–1393.
Nipkow, T., Wenzel, M., and Paulson, L. C. (2002). Isa-
belle/HOL: A Proof Assistant for Higher-order Logic.
Springer.
Paulson, L. C. and Blanchette, J. C. (2010). Three years
of experience with SLEDGEHAMMER, a practical link
between automatic and interactive theorem provers. In
Proc. IJCAR 2010, pages 1–10.
Riazanov, A. and Voronkov, A. (2002). The design and im-
plementation of VAMPIRE. AI Commun., 15(2, 3):91–
110.
Schulz, S. (2002). E - a brainiac theorem prover. AI Com-
mun., 15(2-3):111–126.
Sutcliffe, G. (2009). The TPTP Problem Library and Asso-
ciated Infrastructure: The FOF and CNF Parts, v3.5.0.
Journal of Automated Reasoning, 43(4):337–362.
Sutcliffe, G. (2016). The CADE ATP System Competition.
AI Magazine, 37(2):99–101.
Verchinine, K., Lyaletski, A., and Paskevich, A. (2007). SY-
STEM FOR AUTOMATED DEDUCTION (SAD): a tool
for proof verification. In Proc. CADE-21, pages 398–
403.
Verchinine, K. and Paskevich, A. (2000). FORTHEL–the
language of formal theories. International Journal
of Information Theories and Applications, 7(3):120–
126.
Weidenbach, C., Brahm, U., Hillenbrand, T., Keen, E., The-
obald, C., and Topi
´
c, D. (2002). SPASS version 2.0. In
Proc. CADE-18, pages 45–79.
APPENDIX
Include relations.
Let R,S be relation.
Lemma: R S and S is symmetric implies
(R (R
-1
)) S.
Proof:
Assume R S and S is symmetric.
Assume (R (R
-1
))[x,y].
Then R[x,y] or (R
-1
)[x,y].
Case R[x,y]:
Then S[x,y] by subrelation.
qed.
Case (R
-1
)[x,y]:
Then R[y,x] by relationInverse.
Then S[y,x] by subrelation.
Then S[x,y] by symmetry.
qed.
Hence S[x,y].
Hence (R (R
-1
)) S.
qed.
Figure 17: Correct ELFE proof about relations.
CSEDU 2018 - 10th International Conference on Computer Supported Education
26