• Due to the fact that mathematicians want to com-
municate unambiguously, they tend to use a rela-
tively small set of phrases to express their ideas,
and there is a standard interpretation for these
phrases. About 700 phrases suffice for the es-
sential part of mathematics (definitions, theorems,
proofs, etc.) but this does not include the more in-
formal motivational part (Trzeciak, 1995).
• Mathematicians use words and phrases in a very
rigid way. The language of mathematics is sim-
ple: very few variety in time, person, etc. (Gane-
salingam, 2009).
• Another reason why mathematics is apt to be
represented by a machine is that in mathematics
we are in the (probably unique) position that ev-
ery meaningful rigorous statement can, at least
in principle, be translated into a formal language.
Therefore, it is possible for a machine to faithfully
represent the complete content of an arbitrary (but
meaningful) mathematical statement.
However, we do not intend to allow general natural
language as input, even though we expect only rela-
tively simple sentences, but we intend to exploit the
fact that mathematical language is simple by defining
a controlled natural language (CNL) that is expressive
enough to fulfill the needs of mathematicians, while
still sounding like natural language.
For formulas, since L
A
T
E
X has been de facto-standard
in the mathematical community for decades, we en-
vision a reasonable subset of L
A
T
E
X as the main input
format.
5 THINGS ALREADY
IMPLEMENTED
• Optimization problems can be represented in the
semantic memory, and a description of the prob-
lem can then be automatically generated in the al-
gebraic modeling language AMPL and in almost
natural language. Below is an example of a simple
optimization problem, in Figure 3 formulated as a
mathematician would do, and then in in Figure 4
in the AMPL-format. Both texts have been gener-
ated automatically from a common representation
in the semantic memory comprising about 550 re-
lations of the form of (1).
• For the semantic memory we have two implemen-
tations, one written in Matlab where the seman-
tic memory is a sparse matrix and the objects are
natural numbers, and another one in Soprano, a
framework for RDF data.
Multi-dimensional knapsack.
Let integer N be number of contract , let integer M
be number of budget, let c
j
be contract volume of
project j for j = 1, . . . , N, let A
i, j
be estimated cost
of budget i for project j for i = 1, . . . , M and j =
1, . . . , N, let B
i
be available amount of budget i for
i = 1, . . . , M and let x
j
= 1 if project j is selected,
and let x
j
= 0 otherwise for j = 1, . . . , N.
Problem : Given integer N, integer M , vector c,
matrix A and vector B find binary vector x such that
N
∑
j=1
c
j
x
j
is maximal under the constraint
∑
N
j=1
A
i, j
x
j
≤ B
i
for i = 1, . . . , M.
Figure 3: The knapsack-problem in (almost) natural mathe-
matical language.
param N ;
param M ;
param c
{
j in 1..N
}
;
param A
{
i in 1..M , j in 1..N
}
;
param B
{
i in 1..M
}
;
var x
{
j in 1..N
}
binary ;
maximize target : sum
{
j in 1..N
}
(c[j]
* x[j]);
subject to constraint 3014
{
i in 1..M
}
: sum
{
j in 1..N
}
(A[i , j] * x[j]) <=
B[i];
Figure 4: The knapsack-problem in AMPL.
• We created an interface to the controlled natu-
ral language of the Naproche project (K
¨
uhlwein
et al., 2009), a project carried out at the univer-
sity of Bonn that enables proof checking of proofs
written in a controlled natural language.
• Creation of L
A
T
E
X-output of simple general mathe-
matical text represented in the semantic memory:
basic forms of definitions, assumptions, interfer-
ences, etc.
• Grammatically correct text-output is generated
via an interface to the Grammatical Framework
(Ranta, 2004), a programming language and soft-
ware package for multilingual grammar applica-
tions.
• We implemented a parser for problem files of
the TPTP (Thousands of Problems for Theorem
Provers, available at
http://www.tptp.org/
),
and parsed and represented large parts in the se-
mantic memory, adding up to several thousand
KEOD 2010 - International Conference on Knowledge Engineering and Ontology Development
478