In (18), das Gebäude/the building (R
1
) points to the
house (A) introduced in (17). Such a reference often
spans several steps in the subordination hierarchy and
the transitivity of SUB must be considered:
(A1) (x SUB y) ∧ (y SUB z) → (x SUB z)
For the referent der Keller/the basement (R
2
) in sen-
tence (19), there is no immediate antecedent in sen-
tence (17). Here we need an axiom governing the in-
heritance of part-whole relationships:
(A2) (d1 SUB d2) ∧ (d3 PARS d2) −→ ∃d4 [(d4 SUB
d3) ∧ (d4 PARS d1)]
and the common sense knowledge (Keller PARS
Haus) or (basement PARS house), i.e. a typical house
has a basement.
4
Logical Recurrence and Bridging References.
Bridging references are a type of reference where the
antecedent is not directly mentioned in the foregoing
text, i.e. an antecedent implicitly introduced has to be
made explicit by logical inferences and background
knowledge. A typical example is given by sentences
(17) and (19), where meronymic knowledge (gen-
eral properties of the part-whole relation PARS, and
a part-whole relationship of two generic concepts) is
needed to find the antecedent c
a
for the concept c
r
=
c1511 described by der Keller/the basement. The se-
mantic description D(c
r
) of this phrase with the vari-
able c
r
is represented by (c
r
SUB Keller/basement);
this is also the question to be answered over the se-
mantic network shown in Figure 4, where the mean-
ing of sentence (17), in the following shortly denoted
by sem(17), is represented on the left side by node
c1508. The meaning of sentence (19) is represented
on the right side by node c1509 (before the assimila-
tion, the partial networks represented by nodes c1509
and c1508 are separated, and especially (c1511 PARS
c1501) is missing).
The background knowledge of the previous para-
graph and (A2) lead to the antecedent in sem(17) by
means of the following backwards deduction:
(1) (c
r
SUB Keller/basement) (Start with question)
(2) Unification of (1) with the right side of (A2),
substituting basement for d
3
and a fresh constant
c1000 for c
r
, yields the new goal (d
1
SUB d
2
) ∧
(basement PARS d
2
).
(3) The first literal can be proved from the network
sem(17) by the arc (c1501 SUB house) of sem(17),
substituting c1501 for d
1
and house for d
2
.
(4) The second literal can be derived from the
meronymic background knowledge that (basement
PARS house).
4
Axiom (A2) means: If a concept d
2
superordinated to a
concept d
1
is known to have a part d
3
, then there must exist
a more specific part d
4
of d
1
subordinated to d
3
.
Applying the proposed assimilation mechanism to the
inclusion reference for das Gebäude/the building in
sentence (18), D(c
r
) = (c
r
SUB building), and us-
ing as a KB sem(17), axiom (A1), and the relation-
ship (house SUB building), one obtains node c1501
of representation (17) (left side in Figure 4) as the an-
tecedent c
a
to be identified with c
r
.
From the above, it can easily be seen that assimila-
tion itself heavily depends on the availability of back-
ground knowledge, especially common sense knowl-
edge. Thus, in building a large KB, one has to use
a kind of bootstrapping process. Starting with some
kernel of knowledge which is manually prepared us-
ing the workbench of the knowledge engineer, NLP
techniques based on MultiNet technology can be used
to automatically enlarge the background KB (vor der
Brück and Helbig, 2010; vor der Brück, 2010). And
this knowledge again can be used in the assimilation
process to build even larger KBs.
4 CONCLUSIONS
The assimilation of knowledge derived from pieces
of textual information into existing KBs plays a cru-
cial role in AI. In this task, the knowledge representa-
tion formalism MultiNet and its software tools can be
used as the central technological means. To the best
of our knowledge, there is no other approach integrat-
ing so seamlessly and consistently all linguistic and
logical processes as well as the computational lexi-
con and the background knowledge into one complex
system for automatically building large KBs from tex-
tual archives. The power of this approach is wit-
nessed by several real-life NLP applications devel-
oped in this framework, like question answering sys-
tems (Hartrumpf, 2005) based on corpora with mil-
lions of sentences, and NL interfaces to data bases
(Leveling, 2006).
Semantic representations by means of the Multi-
Net formalism are applicable across different lan-
guages, which is investigated in a machine transla-
tion project (German – Chinese) and in a prototype of
a semantically based search engine working on En-
glish documents. The MultiNet paradigm was also
used for building large semantically based computa-
tional lexica (Hartrumpf et al., 2003). The techniques
described in the paper were utilized for automatically
translating the German Wikipedia with its 60 million
sentences into a coherent MultiNet KB.
The tremendous amount of information contained
in such KBs is also the reason why it is practically im-
possible to use traditional measures from information
retrieval (like precision and recall) to directly evalu-
AutomaticGenerationofLargeKnowledgeBasesusingDeepSemanticandLinguisticallyFoundedMethods
303