The ELFE System

Verifying Mathematical Proofs of Undergraduate Students

Maximilian Doré

and Krysia Broda

Department of Computing, RWTH Aachen University, Germany

Department of Computing, Imperial College London, 180 Queen’s Gate, London SW7 2BZ, U.K.

Keywords:

Didactics of Mathematics, Mathematical Reasoning, Proof Checking, Formal Mathematics.

Abstract:

ELFE is an interactive system for teaching basic proof methods in discrete mathematics. The user inputs a

mathematical text written in fair English which is converted to a special data-structure of ﬁrst-order formulas.

Certain proof obligations implied by this intermediate representation are checked by automated theorem pro-

vers which try to either prove the obligations or ﬁnd countermodels if an obligation is wrong. The result of the

veriﬁcation process is then returned to the user. ELFE is implemented in HASKELL and can be accessed via a

reactive web interface or from the command line. Background libraries for sets, relations and functions have

been developed. It has been tested by students in the beginning of their mathematical studies.

1 INTRODUCTION

The Soviet researcher Victor Glushkov formulated in

1971 that "to understand a proof means to be able to

explain it to a machine that is operating with a re-

latively unsophisticated algorithm" (Glushkov, 1971,

p. 111). Remarkably, teaching mathematics in uni-

versity is still a mostly analogous endeavour. In or-

der to understand mathematical reasoning, students

practice writing proofs on paper and wait for the feed-

back of instructors to improve their understanding.

Immediate feedback would greatly increase the lear-

ning curve – it is often difﬁcult to see when a proof is

complete or what steps are missing.

Such feedback could be provided by machines.

And indeed, many attempts have been made to forma-

lize mathematics. Most prominently, the interactive

theorem provers ISABELLE and COQ are advanced

systems; for instance COQ was used in proving the

Four-color-theorem (Gonthier, 2008). However, mat-

hematical beginners are overwhelmed by the capabili-

ties of such systems since using them requires a deep

understanding of workings of automated theorem pro-

vers (ATP).

The goal of this work is to provide users with a

system that gives feedback on proofs entered in a fai-

rly natural Mathematical language. Thereby the users

are detached from the technicalities of automated the-

orem provers. The ELFE system provides a proof of

concept that this is feasible and sensible. In the past

years, several attempts have been made to create a

proof veriﬁer which accepts mathematical texts writ-

ten in fair English, one of which SYSTEM FOR AUTO-

MATED DEDUCTION (SAD) (Verchinine et al., 2007)

was most inﬂuential for our work. The SAD provides

an intuitive input language, called FORTHEL. Howe-

ver, the user still has to dig into the automated veri-

ﬁcation process to understand why a proof does not

work. The ELFE system in contrast processes the out-

put of background provers and tries to give counter-

models to wrong proofs.

Include functions.

Let A,B,C be set.

Let f: A → B.

Let g: B → C.

Lemma: g◦f is injective implies f is injective.

Proof:

Assume g◦f is injective.

Assume x ∈ A and x’ ∈ A and (f{x}) = (f{x’}).

Then ((g◦f){x}) = ((g◦f){x’}).

Hence x = x’.

Hence f is injective.

qed.

Figure 1: Exemplary ELFE text.

Consider the exemplary proof in Figure 1 which

is in fact a valid ELFE text. After including a back-

Doré, M. and Broda, K.

The ELFE System.

DOI: 10.5220/0006681000150026

In Proceedings of the 10th International Conference on Computer Supported Education (CSEDU 2018), pages 15-26

ISBN: 978-989-758-291-2

ground library and introducing speciﬁc sets A, B and

C and functions f and g, a lemma is proposed that if

the composition of f and g is injective, so the ﬁrstly

applied f must be injective. This lemma is proven by

the reasoning that if f maps two elements x and x’ to

the same element, the composition of f and g must

map them to the same elements. Since this composi-

tion is injective, it follows that x and x’ are the same

elements and f is thus injective. Note that (g◦f){x}

denotes the function application of g◦f which is put

in brackets to specify the precedence of the symbols.

We will learn in the following how the text is veriﬁed.

The remainder of the paper is structured as fol-

lows. We ﬁrst give a brief overview of the imple-

mentation in Section 2 and web interface in Section

3. Next we introduce the Elfe language and proof

structures and justify the correctness of the formali-

sation in Section 4. Finally we evaluate our work in

Section 5 and compare it with popular current the-

orem provers in Section 6 before concluding with a

short discussion in Section 7.

An instance of the system can be found online

2 IMPLEMENTATION

The ELFE system can be accessed through a web in-

terface or a command-line interface (CLI) as shown

in Figure 2. The web interface provides an intuitive

way of accessing the systems output, while the CLI

offers more debugging functionality. We will take a

closer look at the the web interface in Section 3.

After the text is entered via one of its interfaces, it

will be parsed into an intermediary representation in

ﬁrst-order logic. This proof representation is presen-

ted in Section 4.2. The Veriﬁer takes the intermediary

proof representation and checks it for correctness by

calling several ATPs in parallel. If a proof obligation

is wrong, the Veriﬁer tries to extract a countermodel

from the background provers. The result of this ve-

riﬁcation process is then returned to the user via the

chosen interface.

CLI

Web server

Parser

Veriﬁer

ATP

Figure 2: Architecture of the ELFE system.

https://elfe-prover.org

The system is implemented in HASKELL, its

source code can be found online

. In order to parse

a text, a parser combinator is constructed with the li-

brary PARSEC

. The framework SCOTTY

is used

to provide a backend for the web interface. The re-

active frontend is implemented with the JAVASCRIPT

framework VUEJS

In order to send proof obligations to the back-

ground provers, the syntax standard TPTP (Sutcliffe,

2009) is used. Since the used ATP can be easily con-

ﬁgured, nearly all current systems can be interfaced.

So far, we have used the provers E PROVER

(Schulz, 2002), SPASS (Weidenbach et al., 2002) and

VAMPIRE (Riazanov and Voronkov, 2002) due to their

performance at the CADE System Competitions (Sut-

cliffe, 2016). Additionally, we used the provers Z3

(De Moura and Bjørner, 2008) and BEAGLE (Baum-

gartner et al., 2015) which do theorem proving mo-

dulo background theories. Even though we did not

fully utilize, for instance, their arithmetic proving fa-

cilities, it turned out efﬁcient to call several provers in

parallel. E.g., E PROVER turned out to be fast in pro-

ving lemmas with equality while BEAGLE gave useful

countermodels for wrong proof obligations.

3 WEB INTERFACE

The front-end of the web interface shown in Figure 3

consists of a simple text ﬁeld in which the user can

enter his proof. Above the input, several special cha-

racters can be entered by mouse click besides a button

that initiates the veriﬁcation process.

Figure 3: The web interface of ELFE.

After the veriﬁcation process has ﬁnished, colours

https://github.com/maxdore/elfe

https://hackage.haskell.org/package/parsec

https://hackage.haskell.org/package/scotty

https://vuejs.org/

CSEDU 2018 - 10th International Conference on Computer Supported Education

indicate the status of each text line as depicted in Fi-

gure 4. Since all text is green, the text was considered

correct. The user can inspect the veriﬁcation process

by clicking in speciﬁc lines, more information about

the veriﬁcation is then given in the box below the text

ﬁeld. In our example, we learn the TPTP represen-

tation of the proof obligation x1 = x2 and that it was

proved by E PROVER. Note that the variables are pre-

ﬁxed with c in the raw version since they are conside-

red constants at this point in the proof. The reason for

this will be explained in Section 4.3.

Figure 4: Veriﬁed correct ELFE text.

If the user enters an incorrect proof, as in Figure

5, red colours indicate that the veriﬁcation process fai-

led. In the example in line 9 we wrongly concluded

that g must have mapped x and x’ to the same ele-

ments, which does not always hold. The background

provers could not prove this, but also did not ﬁnd a

countermodel to the obligation.

Figure 5: An unsound ELFE text.

In the proof in Figure 6, a countermodel could be

found for a wrong conclusion. The lemma states that

if a relation R is included in S and S is symmetric, the

inverse of R must be included in S as well. While the

statement is in general correct, the proof is too impre-

cise and misses a case distinction. The countermodel

now tells us that if x and y are in the union of R and

its inverse, they might be in the inverse of R but not

in R itself. Thus, the conclusion in line 8 does not in

general hold. The correct version of this proof can be

found in the Appendix.

Figure 6: Countermodel for a wrong ELFE text.

4 ELFE LANGUAGE

The input language for ELFE is mathematical texts

written in a subset of natural mathematical language.

We will not introduce the whole feature set in this pa-

per and only examine the exemplary proof of Figure 1

in the following. Other language constructs like case

distinctions or sub proofs, which make a text less mo-

nolithic, are presented in (Doré, 2017).

In order to verify an ELFE text, we transform

it into a special data-structure which implies certain

proof obligations. Since this internal proof represen-

tation uses ﬁrst-order logic, we will ﬁrst introduce

how to transform the ELFE language into ﬁrst-order

logic. This preprocessing will be presented in Section

4.1. Keywords like Then and Hence have special me-

anings in an ELFE proof and are used to structure a

mathematical proof. This structure is captured in an

intermediate proof representation which is introduced

in Section 4.2. The intermediate proof representation

The ELFE System

implies certain obligations which need to be checked

by the background provers. What these are will be

explained in Section 4.3.

4.1 From ELFE to First-order Logic

First-order logic is used to encode mathematical sta-

tements. Most transformations are straightforward

from ELFE to ﬁrst-order logic, e.g., P implies f is in-

jective is transformed to P → in jective( f ). In order

to make an ELFE text more legible, three commands

introduce meta-language features.

Include sets, relations.

Let A,B,C be set.

Notation function: f: A → B.

Deﬁnition function: for all f.

f: A → B iff for all x ∈ A. exists y ∈ B.

f[x,y] and

(for all y’ ∈ B. y = y’ or not f[x,y’]).

Let f: A → B.

Deﬁnition injective: f is injective iff

for all x ∈ A, x’ ∈ A, y ∈ B. f[x,y] and f[x’,y] implies

x = x’.

Let g: B → C.

Notation composition: g◦f.

Deﬁnition composition: (g◦f): A → C and

(for all x ∈ A. for all y ∈ B. for all z ∈ C.

((f[x,y] and g[y,z]) implies (g◦f)[x,z])).

Figure 7: Excerpt of the functions library.

The command Include can be used to include the

axioms of a background theory. E.g., in our exam-

ple in Figure 1 we include the functions library with

Include functions. The user can easily create his own

background theory since these are written in the ELFE

language as well. You can ﬁnd an excerpt of the

functions library in Figure 7.

The command Notation is used to introduce syn-

tactic sugars. One can write an arbitrary pattern of

Unicode characters to deﬁne such a pattern, e.g., No-

tation function: f: A → B. The alphabetical parts of the

pattern, i.e., f, A and B are treated as placeholders for

arbitrary terms. Thus, all terms of the form

*: * → *

with * being arbitrary terms are subsequently conside-

red instances of the predicate function. For example,

g: B → C will be transformed internally to the ﬁrst-

order formula f unction(g, B, C). Similarly, the nota-

tion for composition is deﬁned as g◦f. Consider the

version of our proof in raw ﬁrst-order logic in Figure

8, where the ﬁrst line of our exemplary ELFE proof

Assume g◦f is injective is transformed into Assume

in jective(composition(g, f )). Note that notations can

be used both for term and predicate symbols.

Lemma: ∀set(A), set(B), set(C), f unction( f , A, B),

f unction(g, B, C). in jective(composition(g, f )) →

in jective( f ).

Proof:

Assume in jective(composition(g, f )).

Assume f unApp( f , x) = f unApp( f , x

)

∧ in(x, A) ∧ in(x

, A).

Then f unApp(composition(g, f ), x)

= f unApp(composition(g, f ), x

Hence x = x

Hence in jective( f ).

qed.

Figure 8: The injectivity proof without syntactic sugar.

The command Let binds a predicate symbol to a

variable, effectively assigning a type to a symbol. By

writing Let A,B, C be set, we ensure that in all follo-

wing statements A, B and C have the predicate sym-

bol set. Consider Figure 8 which shows the injectivity

proof after removing meta-level language features. A,

B and C are introduced universally quantiﬁed as sets

in the lemma.

4.2 Statement Sequences

So far, we have only seen how single mathematical

statements are transformed into ﬁrst-order formulas.

In order to capture the structure of a proof, we pro-

pose a special kind of data-structure, so-called state-

ment sequences. Intuitively, a statement holds a ﬁrst-

order formula with an identiﬁer and a proof. A proof

can consist of other statements in order to represent

complex proof objects.

Deﬁnition 1. Statement Sequences.

A statement S is a tuple ID × GOAL × PROOF where

• ID is an alphanumeric string which is unique for

each statement

• GOAL is a formula in ﬁrst-order logic

• PROOF is either

ASSUMED or

BYCONTEXT or

BYSUBCONTEXT Id

, ..., Id

BYSEQUENCE S

, ..., S

BYSPLIT S

, ..., S

A statement sequence is a ﬁnite list of statements

, ..., S

CSEDU 2018 - 10th International Conference on Computer Supported Education

If a statement S is proved BYSEQUENCE S

, ..., S

or BYSPLIT S

, ..., S

, we call S

, ..., S

the children

of S. If we want to access S from a child S

, we write

.PARENT. On the top level, a statement has no pa-

rent, thus S.PARENT = EMPTY.

Consider the example in Figure 9. We will depict a

statement visually in the following as a box with its ID

in the upper-left corner. The GOAL of a statement is

written in the header of a statement, the PROOF below.

A PROOF can take different forms to capture complex

proof structures. The axioms of a text however are

simply annotated by ASSUMED. E.g., the statements

f un

and S

in j

depict the statements resulting from the

deﬁnitions in Figure 7. In the functions library, nume-

rous additional deﬁnitions are made which are omit-

ted here. Statements annotated with ASSUMED will

be depicted green in the following. Below the axi-

oms, the statement S of the lemma of our text in Fi-

gure 1 follows. In order to prove this statement, we

need more advanced proof structures which will be

introduced in the next Section 4.3. The statement is

depicted red and with a dashed border to indicate that

its proof is not complete.

∀set(A), set(B), f . f unction( f , A, B) ↔ ∀x ∈ A.∃y ∈ B.

relapp( f , x, y) ∧ (∀y

∈ B.y = y

∨ ¬relapp( f , x, y

))

ASSUMED

f un

∀set(A), set(B), f unction( f , A, B).

in jective( f ) ↔ ∀x ∈ A, x

∈ A, y ∈ B.

relapp( f , x, y) ∧ relapp( f , x

, y) → x = x

ASSUMED

in j

∀set(A), set(B), set(C),

f unction( f , A, B), f unction(g, B, C).

in jective(composition(g, f )) → in jective( f )

Figure 9: Exemplary statement sequence.

To give an overview of the other types of PROOF:

A proof BYSEQUENCE and BYSPLIT makes it pos-

sible to nest more complex derivation sequences. A

statement annotated with BYCONTEXT will be chec-

ked by the background provers. BYSUBCONTEXT is

a special case of this proof type which allows for re-

stricting the context of the statement.

4.3 Proved Statements

Since we want to verify that a text is sound, we need to

introduce a soundness criteria for statements. Axioms

of a text are considered correct, but the lemma needs

a more subtle criteria.

First we will deﬁne which axioms are considered

relevant to a statement. Intuitively, the context of a

statement in a statement sequence are all statements

"above" it.

Deﬁnition 2. Context of a Statement.

Let S

, ...S

be a statement sequence. The context of

a statement S

is inductively deﬁned as

• Γ(EMPTY) =

• Γ(S

) = {S

.GOAL, ..., S

k−1

.GOAL}

∪ Γ(S

.PARENT).

For example, in Figure 9, the context of statement

S consists of the respective goals of S

f un

and S

in j

(as

well as other deﬁnitions of the library which are omit-

ted here). With that, we can deﬁne an appropriate

soundness criteria for statements.

Deﬁnition 3. Proved Statement.

Let S be a statement with S.GOAL = φ.

We call S proved iff Γ(S)  φ.

In other words, a statement is considered proved if

it already followed from the theory created by its con-

text. The statements S

f un

and S

in j

in Figure 9 are not

proved since they build up the axioms of our theory.

The statement S however should follow from these ax-

ioms, i.e., should be a proved statement. In order to

show that S is proved, we will create a more complex

proof object in the following such that correctness of

the proof object implies that S followed from its con-

text.

We start by unfolding the outer implication of the

lemma ∀set(A), set(B), set(C), f unction( f , A, B),

f unction(g, B, C). in jective(composition(g, f )) →

in jective( f ). More speciﬁcally, we ﬁx speciﬁc sets

A, B and C and functions f, g. As we see in Figure 10,

this is captured in our data-structure by introducing

another statement S

such that the proof of S is BYSE-

QUENCE S

. We represent proofs BYSEQUENCE by

putting the proof inside the statement to prove. The

difference between S and S

is that we removed the

quantiﬁers and replaced the variables with constants

(depicted in blue and with an overline

A in case the

color does not show up).

In order to prove the new goal of S

, we do a so-

called unfolding of the implication. The left hand side

is put in the statement S

and annotated with ASSU-

MED such that it is in the context of S

, which holds

the right hand side of the implication.

The whole reduction from S to S

is done automa-

tically by the system. It detects if meta-variables are

contained in the goal and injects the proof automati-

cally.

The ELFE System

∀set(A), set(B), set(C),

f unction( f , A, B), f unction(g, B, C).

in jective(composition(g, f )) → in jective( f )

(set(

A) ∧ set(

B) ∧ set(

C)∧

f unction(

f ,

B) ∧ f unction( ¯g,

C)) →

(in jective(composition( ¯g,

f )) → in jective(

f ))

set(

A) ∧ set(

B) ∧ set(

C)∧

f unction(

f ,

B) ∧ f unction( ¯g,

ASSUMED

in jective(composition( ¯g,

f )) → in jective(

f )

Figure 10: Unfolding meta-variables.

With this, we have reduced the problem of sho-

wing that S is proved to showing that S

is proved.

In order to convince us that S

is proved indeed im-

plies that S is proved, we will ﬁrst see that it is sound

to ﬁx an universally quantiﬁed variable to a constant.

This can be done by natural deduction which has been

shown to be sound (Fitting, 1990). Concretely, our

construction is analogous to the following deduction

rule:

(∀I) :

P(a)

∀x.P(x)

with a not occurring in P(x)

We use this deduction rule in showing the sound-

ness of our construction in Lemma 1.

Lemma 1. ∀ Introduction.

Let S be a statement such that S.GOAL = ∀x.P(x)

and a not occurring in S.GOAL, S.PROOF = BY-

SEQUENCE S

, S

.GOAL = P(a) and S

is proved:

∀x.P(x) (a not occurring)

P(a)

Then S is proved.

Proof. Since S

is proved and Γ(S) = Γ(S

), we have

Γ(S)  P(a). With (∀I) it follows that Γ(S)  ∀x.P(x)

since a does not occur in P(x).

Next we have to show that it is sound to assume

the left hand side of an implication and deduce the

right hand side. Again, this is analogous to a natural

deduction rule:

(→ I) :

P ` Q

P → Q

This rule is used in the soundness proof in Lemma

Lemma 2. → Introduction.

Let S be a statement such that S.GOAL = P → Q,

S.PROOF = BYSEQUENCE S

, S

.GOAL = P,

.GOAL = Q and S

is proved:

P → Q

Then S is proved.

Proof. We have Γ(S

) = Γ(S)∪ {P}. Since S

is pro-

ved, Γ(S) ∪ P  Q. With (→ I) it follows Γ(S)  P →

Now we will construct the proof of S

as shown

in Figure 11. In the proof text in Figure 1, we ex-

plicitly wrote Assume in jective(composition(g, f )).

[...] Hence in jective( f ). Analogous to the unfolding

of the implication of S

in Figure 10, we assume the

left hand side and now have to prove the right hand

side. Again, this is sound as proved in Lemma 2.

in jective(composition( ¯g,

f )) → in jective(

f )

in jective(composition( ¯g,

f ))

ASSUMED

in jective(

f )

Figure 11: Unfolding an implication.

Now, we have to prove that in jective( f ) holds.

In order to do that, the proof in Figure 1 uses the

deﬁnition of injectivity: Assume f unApp( f , x) =

f unApp( f , x

)∧ in(x, A) ∧in(x

, A). [...] Hence x = x

In other words, we prove an alternative goal. In order

to retain a sound construction, we have to show two

things: First, that the alternative goal indeed implies

the original goal and second, that the alternative goal

holds. This is represented in Figure 12 by putting two

statements S

and S

below the goal of S

. This de-

picts that the PROOF of S

is BYSPLIT S

, S

. Note

that a proof BYSPLIT leads to a division of contexts,

i.e., the derived goal of S

will not be put into the con-

text of S

. Thus, the proof BYSPLIT allows for a ﬁner

scoping of statements.

CSEDU 2018 - 10th International Conference on Computer Supported Education

in jective(

f )

(∀x, x

. f unApp(

f , x) = f unApp(

f , x

)

∧in(x,

A) ∧ in(x

A) → x = x

) → in jective(

f )

BYCONTEXT

∀x, x

. f unApp(

f , x) = f unApp(

f , x

)

∧in(x,

A) ∧ in(x

A) → x = x

f unApp(

f , ¯x) = f unApp(

f ,

) ∧ in(¯x,

A) ∧ in(

→ ¯x =

f unApp(

f , ¯x) = f unApp(

f ,

) ∧ in(¯x,

A) ∧ in(

ASSUMED

¯x =

Figure 12: Proving an alternative goal.

The statement S

contains the soundness check.

Its proof is BYCONTEXT which means that it will be

sent to the background provers. If some ATP ﬁnds

a proof, a statement annotated with BYCONTEXT is

considered proved. This is here the case if our deﬁ-

nition of injectivity indeed allows us to prove this al-

ternative goal. We will depict statements proved BY-

CONTEXT in orange in the following.

Statement S

contains the proof of the alternative

goal. Again, the universally quantiﬁed variables x and

are ﬁxed to constants. Afterwards, the implication

is unfolded. As seen in Lemma 1 and Lemma 2, this

is sound.

To convince us that this construction is sound, we

have to use two additional natural deduction rules:

(∧I) :

P Q

P ∧ Q

(→ E) :

P → Q P

These rules will be used in the proof of Lemma

3 which is an abstract case of our approach in Figure

12.

Lemma 3. Splitting a Goal.

Let S be a statement such that S.GOAL = P, S.PROOF

= BYSPLIT S

, S

, ..., S

, S

.GOAL = Q

∧ ... ∧ Q

→

P, S

.GOAL = Q

and S

is proved for i = 1, ..., n:

∧ ... ∧ Q

→ P

...

Then S is proved.

Proof. We have Γ(S) = Γ(S

) for i = 0, . . . , n. With

proved for i = 1, . . . , n we have Γ(S)  Q

for i =

1, . . . , n. With (∧I) it follows Γ(S)  Q

∧ ··· ∧ Q

With S

proved we also have Γ(S)  Q

∧ ... ∧

→ P. Thus, we can deduce with (→ E) that

Γ(S)  P.

The remaining bit to prove is the goal of S

i.e., that x = x

follows from the context. Ho-

wever, in the text in Figure 1 the next deriva-

tion step is Then f unApp(composition(g, f ), x) =

f unApp(composition(g, f ), x

).. This statement does

not change the overall goal we want to proof, but gi-

ves a cornerstone to how one can derive the goal. As

depicted in Figure 13, this additional ﬁnding will ﬁrst

be veriﬁed by annotating statement S

with BYCON-

TEXT. Afterwards, the actual goal is proved. Since

the user gave no additional proving methods, we send

the ﬁnal goal x = x

to the background provers as well.

¯x =

f unApp(composition( ¯g,

f ), ¯x) =

f unApp(composition( ¯g,

f ),

)

BYCONTEXT

¯x =

BYCONTEXT

Figure 13: Giving a cornerstone to a proof.

If S

can be derived by the background provers

already, the theory created by the context of S

is not

extended by adding S

. This is formally reﬂected in

Lemma 4.

Lemma 4. Deriving a Cornerstone.

Let S be a statement such that S.GOAL = P, S.PROOF

= BYSEQUENCE S

, S

.GOAL = Q, S

.GOAL = P

and S

is proved:

The ELFE System

Then S is proved.

Proof. Since S

is proved, we have Γ(S

)  Q. Be-

cause of Γ(S) = Γ(S

) already Γ(S)  Q. Hence, with

proved we have Γ(S

)  P and it follows that alre-

ady Γ(S)  P.

This completes our construction of the internal

proof representation of the lemma in Figure 1. Three

statements S

, S

and S

are annotated BYCON-

TEXT and will be sent to the background provers. If

each of these three statements can be derived from

their respective contexts, we can conclude that the ori-

ginal goal of S already followed from its context. The

proof of the lemma is then considered sound.

5 EVALUATION

The tool was tested by students in the beginning of

their mathematical studies. In Section 5.1, we will

take a look at their evaluation and suggestions. We

also formalized some more advanced theorems in the

system, e.g., Cantor’s theorem and the Knaster-Tarski

theorem, and will discuss our experiences as well as

the system’s inherent limitations in Section 5.2.

5.1 User Feedback

The system was tested with 12 undergraduates of

Computing, Mathematics and Electrical Engineering

at Imperial College London, of which none had prior

experience with interactive theorem provers. Due to

the limited time frame we were not able to evaluate

the system further.

At ﬁrst, the students were given the proof sketch

shown in Figure 14. An intuition about the proof was

given in natural language, i.e., it was explained we

want to prove that the complement of the complement

of a set is the set itself. All students were able to iden-

tify the proof pattern, i.e., that we show set inclusion

in both directions. This is a very common proof pro-

cedure to show equality of two sets. When writing the

remaining bit of the proof, the students successfully

resolved syntactic errors by inspecting the parsing er-

rors and all completed the proof. The syntactic cha-

racteristics of ELFE, e.g., that Then and Hence have

distinct meanings, did not pose an obstacle since they

only had to copy the structure of the ﬁrst sub proof.

However, only two students were able to ﬁgure out the

meanings of these language features, i.e., that Hence

closes an implication whereas Then is for giving cor-

nerstones to a proof. This suggests that using ELFE

requires an introduction to the different language fea-

tures and users cannot start writing proofs right away.

Later on, the testers were given more complex

proof sketches. Students who were in general comfor-

table with mathematical reasoning were able to com-

plete the proofs. The other students had problems

grasping the idea of the proof and did not start to write

a proof in the system.

Include sets.

Let A be set.

Let x be element.

Lemma: ((A

)

) = A.

Proof:

Proof ((A

)

) ⊆ A:

Assume x ∈ ((A

)

Then not x ∈ (A

Hence x ∈ A.

qed.

Proof A ⊆ ((A

)

...

qed.

Figure 14: Proof to be completed in the evaluation.

After the students tried the system, they were

given the following statements and had to indicate

with 1 (strongly agree) to 6 (strongly disagree) their

agreement with the statements.

• I enjoy writing mathematical proofs.

Mean: 3.3 – Median: 3,5

• I ﬁnd writing mathematical proofs difﬁcult.

Mean: 2.6 – Median: 2

• I think computers can be of use in learning how to

write mathematical proofs.

Mean: 2.3 – Median: 2

• I enjoyed writing mathematical proofs in the

ELFE system.

Mean: 2.5 – Median: 2

• I found the feedback of the system helpful.

Mean: 2.6 – Median: 2

• I would like to know how ELFE and interactive

theorem proving works.

Mean: 1.8 – Median: 1

In text form, they could also write down what they

liked about the system and what should be improved.

CSEDU 2018 - 10th International Conference on Computer Supported Education

It was highlighted that the language was "simple and

clear" and did not "get in the way of the proof". They

liked the "very understandable and simple UI" and its

reactiveness. As improvements for the user interface

they proposed autocompletion features of the proofs

and syntax highlighting. The given raw translations

of the mathematical text were not easy to understand.

One user also pointed out that the background provers

are sometimes too clever – thus, a text is accepted

even if crucial cornerstones of a proof are missing.

He would like to have a criteria on when a proof is

"complete" for humans and not only for a computer.

As we see, the testers were in general not espe-

cially keen on writing mathematical proofs. Writing

proofs in ELFE made it a bit more enjoyable. The sy-

stem seems to have succeeded in waking an interest

for interactive theorem proving.

5.2 Limits of the Current System

Since ﬁrst-order logic is an intuitive way to write

down proofs in set theory and relations, proofs in

these domains could be written down easily. Working

with the functions library was more complex. Some

additional lemmas and function symbols which were

introduced to make a proof more readable for humans

increase the difﬁculty for the background provers. If

the background provers take too long in proof search,

it is hard to assess if a proof itself is wrong or only ta-

kes a long time to prove. Debugging a failing proof is

still difﬁcult with the user interface provided by ELFE.

In most cases, the raw proof obligations given to the

background provers were more helpful in ﬁnding bugs

by manually deleting and changing the given premi-

ses. This is due to constructions like Let which shor-

ten a proof, but also hide what is going on inside the

system.

The Notation command has turned out to be a very

powerful construct to ease the readability of proofs.

New notations can be introduced easily and make a

proof look quite intuitive.

BEAGLE was able to provide countermodels to a

wrong proof only if the number of premises was limi-

ted. Restricting the context of a derivation step incre-

ased the success rate signiﬁcantly. However, for new

users it is certainly difﬁcult to relate a countermodel

to the entered text since it is given in the raw TPTP

format.

Another problem that occurred was that the back-

ground provers were too clever. They sometimes ﬁnd

intermediate steps that are not at all obvious for a hu-

man reader. This cleverness is particularly problema-

tic with proofs by contradiction. If the background

provers ﬁnd the inconsistency caused by the assump-

tion, all derivations a user may make are trivially also

true, even though they do not make sense in the proof.

Writing larger proof texts in straightforward dom-

ains as set theory can be easily done in ELFE. Ho-

wever, some properties like well-foundedness are not

expressible at all in ﬁrst-order logic, so it might be ex-

pedient for future versions to use higher-order logic at

the core of statement sequences.

6 RELATED WORK

In Section 6.1, we will take a look at mathematical

text veriﬁers like the SYSTEM FOR AUTOMATED DE-

DUCTION, which heavily inﬂuenced this project. In

Section 6.2, we will compare ELFE to the popular in-

teractive theorem provers ISABELLE and COQ.

6.1 Mathematical Text Veriﬁer

In the following, we will present two projects aimed

for verifying mathematical texts: The SYSTEM FOR

AUTOMATED DEDUCTION (SAD) in Section 6.1.1

and NAPROCHE in Section 6.1.2.

6.1.1 SYSTEM FOR AUTOMATED DEDUCTION

The SAD was developed at the University Paris and

the Taras Shevchenko National University of Kyiv. It

continues the project "Algoritm Ochevidnosti" (algo-

rithm of obviousness) which was initiated by the so-

viet researcher Victor Glushkov in the 1960s. His goal

was to develop a tool that shortens long but "obvious"

proofs to users. These omitted parts should be veri-

ﬁed by automated theorem provers. (Verchinine and

Paskevich, 2000)

SAD uses the input language FORTHEL which al-

lows for expressing mathematical statements intuiti-

vely. FORTHEL texts are converted to an ordered set

of ﬁrst-order formulas. The structure of the initial text

is preserved such that necessary proof tasks can be de-

ﬁned. These tasks are then given to an ATP. The inter-

nal reasoner may simplify tasks and omit trivial state-

ments. Afterwards, the veriﬁcation status of the text

is given to the user. For each proof task, the result of

the used ATP is returned. This allows to inspect possi-

ble sources of failing tasks, but requires knowledge of

how the background provers work. (Verchinine et al.,

2007)

Currently, it is not possible to work with functions

in SAD due to the lack of background libraries. Thus,

we could not implement the injectivity proof of Figure

1 in SAD.

The ELFE System

6.1.2 NAPROCHE

The NAPROCHE system was a joint project between

mathematicians at the University of Bonn and lin-

guists at the University of Duisburg-Essen. Its cen-

tral goal was to develop a controlled natural language

(CNL) which checks semi-formal mathematical texts.

The input are texts in a LATEX style language, con-

sisting of mathematical formulas embedded in a con-

trolled natural language. (Cramer et al., 2009)

To extract the semantics of a CNL text, NAPRO-

CHE adapts a concept from computational linguistics:

Proof Representation Structures (PRS) enrich the lin-

guistic concept of Discourse Representation Structu-

res in such a way that they can represent mathemati-

cal statements and their relations. The semantics of

PRS have been researched extensively; however, the

project is not continued and has no working version

available.

6.2 Interactive Theorem Prover

The classical approach to interactive theorem proving

integrates a human user strongly in the technical

veriﬁcation process. We will brieﬂy introduce the

popular provers ISABELLE in Section 6.2.1 and COQ

in Section 6.2.2 with their respective formalization of

the injectivity proof in Figure 1.

6.2.1 ISABELLE

ISABELLE is a joint project of Cambridge Univer-

sity and the Technical University Munich. It supports

polymorphic higher-order logic, augmented with ax-

iomatic type classes. At present it provides useful

proof procedures for Constructive Type Theory, va-

rious ﬁrst-order logics, Zermelo-Fraenkel set theory

and higher-order logic. (Nipkow et al., 2002)

Consider the injectivity proof written in ISA-

BELLE in Figure 15. The predicate inj_on f A ex-

presses that function f is injective on the domain A.

The proof structure is close to the one used in ELFE:

We introduce arbitrary x and x’ which f maps to the

same element and conclude that they must have been

the same. One has to specify the automated proof

tactics and used premises: In our example, the de-

rivations are made by term rewriting using deﬁniti-

ons comp_def and inj_on_def from the background li-

brary.

In comparison to ELFE, the user is therefore more

involved in the automated veriﬁcation process. Since

2007, ISABELLE offers the extension SLEDGEHAM-

MER. By calling several ATP, SLEDGEHAMMER tries

to determine which premises are important to a goal.

It then tries to reconstruct the automated proofs with

methods implemented in ISABELLE. In fact, the me-

chanical prove methods needed in Figure 15 can be

found by invoking SLEDGEHAMMER.

theory InjectiveComposition

imports Fun

begin

lemma:

assumes "inj_on (g ◦ f) A"

shows "inj_on f A"

proof

ﬁx x x’

assume "x ∈ A" and "x’ ∈ A"

moreover assume "f x = f x’"

then have "(g ◦ f) x = (g ◦ f) x’"

by (auto simp: comp_def)

ultimately show "x = x’" using assms

by (auto simp: inj_on_def)

qed

Figure 15: Proof in ISABELLE.

In a recent study, 34% of nontrivial goals contai-

ned in representative ISABELLE texts could be pro-

ved by SLEDGEHAMMER. With this extension, ISA-

BELLE allows beginners to prove challenging theo-

rems. The creators note that SLEDGEHAMMER was

not designed as a tool to teach ISABELLE since it fo-

cused primarily on experienced users. However, it

changed the way ISABELLE is taught. Beginners do

not have to learn about low level proving tactics and

how they work but can focus on the proof from a hig-

her level. (Paulson and Blanchette, 2010)

Require Import Basics.

Deﬁnition injective {A B} (f : A → B) :=

forall x y : A, f x = f y → x = y.

Theorem c_inj (A B C:Type) (f:A→B) (g:B→C):

(injective (compose g f)) → injective f.

Proof.

intuition.

intros x x’.

pose (f x = f x’).

intuition.

assert (g (f x) = g (f x’)).

{ elim H0. rewrite H0. trivial. }

auto.

Qed.

Figure 16: Proof in COQ.

6.2.2 COQ

COQ is an interactive theorem prover initially develo-

ped 1984 at INRIA. It is based on the Curry–Howard

CSEDU 2018 - 10th International Conference on Computer Supported Education

correspondence which relates types to classical logic.

In order to prove a proposition, one has to construct a

term with the type corresponding to the proposition.

Consider the injectivity proof implemented in Fi-

gure 17. Again, the idea of the proof is to show that f

x = f x’ implies x = x’. However, we have to explicitly

apply rewrite techniques to make the derivation steps.

The tactic intuition says that we can assume a left hand

side of an implication and then prove the right hand

side. Afterwards, we want to make sure that we can

just apply g on both sides. We have to rewrite both

sides of H0, which stands for f x = f x’, in order to get

to our assertion. The ﬁnal goal x = x’ is then derived

by applying the rewrite technique auto.

As we see, the translation process of mathemati-

cal texts to functional programs requires a good un-

derstanding of type theory and is not suitable for mat-

hematical beginners.

Consequently, the most prominent current inte-

ractive theorem provers are of a deeply technical na-

tural. They are thought of as programming languages

that happen to prove theorems, and not digitisations

of mathematical language.

7 DISCUSSION

This paper presented ELFE, a system that checks

proofs in discrete mathematics. Entered texts are

transformed to statement sequences, a special data-

structure of ﬁrst-order formulas. Remaining proof

obligations are then checked by background pro-

vers. Statement sequences are a powerful interme-

diate proof representation which can hold manifold

proof techniques. The clear soundness criteria allows

for extending the proof techniques easily.

Students who tested the system liked especially

that they got immediate feedback on their proof work.

The implemented background libraries allow for an

easy start. Once a user becomes familiar with the tool,

he can easily construct his own background libraries.

Certainly, more evaluation of the system in pedagogi-

cal environments is necessary. It will be particularly

interesting to examine how teachers can incorporate

the system in their courses.

The language constructs presented here were the

result of formalizing several exemplary proofs. If one

formalizes more proofs, he will probably feel the need

for additional proving methods. If one can map the

proving methods soundly into statement sequences,

this should be easy to implement.

In addition to giving countermodels for wrong

proofs, one could utilize more features of the back-

ground provers. Many provers return in-depth infor-

mation about the proof of a conjecture. This informa-

tion could be useful for users in order to understand

why a proof works or fails. The challenge is to present

the technical output of the background provers via an

intuitive interface. In order to do proofs with arithme-

tic, it might be useful to utilize already implemented

arithmetic capabilities of background provers such as

Z3 and BEAGLE. Expert users presumably prefer sy-

stems with deep insight into the technical veriﬁcation

process, but an abstraction is necessary if we want to

use computers in teaching mathematics.

The biggest structural limitation of ELFE is that

it internally uses ﬁrst-order logic. E.g., with the cur-

rent capabilities it is not straightforward to implement

proofs by induction. The recent years have seen in-

teresting advances in automated theorem proving of

typed higher-order logic. A new standard for typed

higher-order-logic has been added to TPTP which

is used by several provers like LEO-II (Benzmüller

et al., 2015) and SATALLAX (Brown, 2012). A next

version of ELFE could use this development in order

to provide a more powerful way of expressing mathe-

matics. This requires to introduce a meaningful type

system for ELFE.

REFERENCES

Parsec: Monadic parser combinators.

https://hackage.haskell.org/package/parsec. Acces-

sed: 2017-01-03.

Scotty: Haskell web framework.

https://hackage.haskell.org/package/scotty. Accessed:

2017-01-03.

Vuejs: The progressive javascript framework.

https://vuejs.org/. Accessed: 2017-01-03.

Baumgartner, P., Bax, J., and Waldmann, U. (2015). BE-

AGLE–a hierarchic superposition theorem prover. In

Proc. CADE-25, pages 367–377.

Benzmüller, C., Sultana, N., Paulson, L. C., and Theiß, F.

(2015). The higher-order prover LEO-II. Journal of

Automated Reasoning, 55(4):389–404.

Brown, C. E. (2012). SATALLAX: An automatic higher-

order prover. In Proc. IJCAR 2012, pages 111–117.

Cramer, M., Fisseni, B., Koepke, P., Kühlwein, D., Schrö-

der, B., and Veldman, J. (2009). The NAPROCHE

project–controlled natural language proof checking of

mathematical texts. In Proc. CNL 2009, volume 5972,

pages 170–186.

De Moura, L. and Bjørner, N. (2008). Z3: An efﬁcient SMT

solver. Tools and Algorithms for the Construction and

Analysis of Systems, pages 337–340.

Doré, M. (2017). ELFE – An interactive theorem prover for

undergraduate students. Bachelor thesis.

Fitting, M. (1990). First-order Logic and Automated Theo-

rem Proving. Springer, 2nd edition.

The ELFE System

Glushkov, V. M. (1971). Problems in the theory of auto-

mata and artiﬁcial intelligence. Journal of Cyberne-

tics, 1(1):97–113.

Gonthier, G. (2008). Formal proof–the four-color theorem.

Notices of the AMS, 55(11):1382–1393.

Nipkow, T., Wenzel, M., and Paulson, L. C. (2002). Isa-

belle/HOL: A Proof Assistant for Higher-order Logic.

Springer.

Paulson, L. C. and Blanchette, J. C. (2010). Three years

of experience with SLEDGEHAMMER, a practical link

between automatic and interactive theorem provers. In

Proc. IJCAR 2010, pages 1–10.

Riazanov, A. and Voronkov, A. (2002). The design and im-

plementation of VAMPIRE. AI Commun., 15(2, 3):91–

110.

Schulz, S. (2002). E - a brainiac theorem prover. AI Com-

mun., 15(2-3):111–126.

Sutcliffe, G. (2009). The TPTP Problem Library and Asso-

ciated Infrastructure: The FOF and CNF Parts, v3.5.0.

Journal of Automated Reasoning, 43(4):337–362.

Sutcliffe, G. (2016). The CADE ATP System Competition.

AI Magazine, 37(2):99–101.

Verchinine, K., Lyaletski, A., and Paskevich, A. (2007). SY-

STEM FOR AUTOMATED DEDUCTION (SAD): a tool

for proof veriﬁcation. In Proc. CADE-21, pages 398–

403.

Verchinine, K. and Paskevich, A. (2000). FORTHEL–the

language of formal theories. International Journal

of Information Theories and Applications, 7(3):120–

126.

Weidenbach, C., Brahm, U., Hillenbrand, T., Keen, E., The-

obald, C., and Topi

c, D. (2002). SPASS version 2.0. In

Proc. CADE-18, pages 45–79.

APPENDIX

Include relations.

Let R,S be relation.

Lemma: R ⊆ S and S is symmetric implies

(R ∪ (R

-1

)) ⊆ S.

Proof:

Assume R ⊆ S and S is symmetric.

Assume (R ∪ (R

-1

))[x,y].

Then R[x,y] or (R

-1

)[x,y].

Case R[x,y]:

Then S[x,y] by subrelation.

qed.

Case (R

-1

)[x,y]:

Then R[y,x] by relationInverse.

Then S[y,x] by subrelation.

Then S[x,y] by symmetry.

qed.

Hence S[x,y].

Hence (R ∪ (R

-1

)) ⊆ S.

qed.

Figure 17: Correct ELFE proof about relations.

CSEDU 2018 - 10th International Conference on Computer Supported Education