and automated QA will only succeed if the QA sys-
tem avoids posting wrong answers, because the users
should not be spammed with incorrect responses.
This requirement for precision-orientation disquali-
fies traditional, retrieval-based QA techniques, since
these cannot provide the required justified answers.
LogAnswer, by contrast, uses logical reasoning for
finding and validating answers, thus promising a bet-
ter self-assessment of answer quality.
The questions found in QA forums are also chal-
lenging due to the variety of question types, while
many existing QA systems can only handle defini-
tion questions and factual questions that ask for a
limited number of named-entity types like PERSON,
ORGANIZATION etc. In our view, the types of ques-
tions that can be tackled by QA techniques and those
that should best be answered by human users (like
questions asking for an opinion) are complementary.
While a QA system can answer questions of the first
kind (such as factoid questions) directly, it makes
more sense to treat the second kind by techniques for
FAQ finding, i.e. by looking up results in a repository
of questions with known (human generated) answers.
This process of FAQ finding, too, can potentially be
improved by incorporating techniques originally de-
veloped for question answering.
This paper is divided as follows: In Section 2 we
give a short description of LogAnswer. Section 3 pro-
vides an overview of QA forums and investigates dif-
ferent adaptation methods for LogAnswer. In the final
Section 4 we summarize our results and outline some
topics for future work.
2 THE LogAnswer SYSTEM
LogAnswer is designed as a German language QA
system on the web, which can serve as an alternative
to conventional search engines. It is accessible by a
web interface (www.loganswer.de) similar to that of a
search engine. The user enters a question into a text
box, and LogAnswer then presents the three best an-
swers, which are highlighted in the relevant textual
sources to provide a context. The answers are derived
from an extensive knowledge base, which has been
obtained by automatically translating a snapshot of
the German Wikipedia into a semantic network rep-
resentation in the MultiNet (Multilayered Extended
Semantic Networks) formalism (Helbig, 2006). The
WOCADI parser for German (Hartrumpf, 2003) was
used for that purpose. Based on its large semantic-
based lexicon (Hartrumpf et al., 2003), combined
with a robust treatment of unknownwords, it achieves
a 51.8 percent rate of full parses (80.6 percent in-
cluding chunk parses) on the Wikipedia. WOCADI
uses rule-based and statistical techniques for handling
the the various kinds of ambiguities that occur in NL
analysis (e.g. prepositonal attachment, resolution of
pronouns). For a description of these techniques in-
cluding a detailed evaluation, see (Hartrumpf, 2003).
The MultNet representations for 29.1 million sen-
tences in the Wikipedia as generated by WOCADI,
and an additional 12,000 logical rules and facts con-
stitute the general background knowledge of Log-
Answer. Among other things, these additional rules
connect the various ways in which a temporal or lo-
cal specification can be expressed. Examples are
shown in (Gl¨ockner, 2007). A large part of the back-
ground knowledge is concerned with connecting the
meaning representations of verbs and deverbal nouns
(e.g. ‘read’ – ‘reader’), or with connecting adjectives
and correspoding attributes espressed as nouns (e.g.
‘high’ – ‘height’). Obviously, the results of Log-
Answer for a specific application can be improved by
adding domain-specific rules that cover paraphrases
and other inferences of relevance to the domain.
To make the semantic networks accessible to mod-
ern theorem provers, the MultiNet knowledge base
has been translated into First-Order Logic (FOL).
The expressivity of MultiNet exceeds that of FOL,
and while some aspects of MultiNet are lost in the
translation, we approximate the expressivity by using
logic extensions like equality and arithmetic evalua-
tion. See (Furbach et al., 2010) for a detailed example
of the logical translation and subsequent processing.
The complete knowledge base is too large to be
handled by an automated theorem prover, in partic-
ular when considering that the usage model of Log-
Answer requires short response times. Therefore the
initial processing steps for a user-provided question
serve to narrow down the knowledge base to the frag-
ment relevant to the task at hand. The query is trans-
lated into a MultiNet representation, and a machine-
learning (ML) ranking technique then uses shallow
linguistic criteria (like lexical overlap) to find the text
passages most likely to contain an answer. The as-
sessment of these criteria relies on pre-indexed infor-
mation and can be rapidly computed. The FOL repre-
sentations (answer candidates) of the most promising
text passages are then individually tested by the theo-
rem proverE-KRHyper (Pelzer and Wernhard, 2007).
In each test the candidate is combined with the back-
ground knowledge and the logical query representa-
tion as the input for the prover. A successful proof in-
stantiates variables in the FOL question with ground
terms representing the answer. Query relaxation tech-
niques increase the likelihood of finding a proof in
short time, at the cost of lowering the probability that
LogAnswer IN QUESTION ANSWERING FORUMS
493