Bj¨orn Pelzer
University Koblenz-Landau, Koblenz, Germany
Ingo Gl¨ockner, Tiansi Dong
University of Hagen, Hagen, Germany
Logic-based Question Answering, Question-Answer portals, Answer validation, Answer bots.
LogAnswer is a question answering (QA) system for the German language. By providing concise answers
to questions of the user, LogAnswer provides more natural access to document collections than conventional
search engines do. QA forums provide online venues where human users can ask each other questions and
give answers. We describe an ongoing adaptation of LogAnswer to QA forums, aiming at creating a virtual
forum user who can respond intelligently and efficiently to human questions. This serves not only as a more
accurate evaluation method of our system, but also as a real world use case for automated QA. The basic idea
is that the QA system can disburden the human experts from answering routine questions, e.g. questions with
known answer in the forum, or questions that can be answered from the Wikipedia. As a result, the users
can focus on those questions that really demand human judgement or expertise. In order not to spam users,
the QA system needs a good self-assessment of its answer quality. Existing QA techniques, however, are not
sufciently precision-oriented. The need to provide justified answers thus fosters research into logic-oriented
QA and novel methods for answer validation.
The field of question answering (QA) aims at auto-
matically finding concise answers to arbitrary ques-
tions. A QA system communicates with the user in a
natural language (NL), its input has the form of prop-
erly phrased questions, and the answers are derived
from an extensive knowledge base built from a doc-
ument collection. Conventional search engines with
their keyword based search, by contrast, merely pro-
duce documents which must be studied further, an
approach which is impractical for users with specific
questions. Conversely, a QA system only delivers the
information that is requested, saving time and allow-
ing a satisfactory result presentation even on compact
mobile devices. Its NL interface can be used intu-
itively, making it suitable for casual users. Drawbacks
of QA are the difficulties of dealing with the ambigu-
ities of NL, the need to construct the knowledge base,
and the complexity of deriving answers.
LogAnswer (Furbach et al., 2010; Gl¨ockner and
Pelzer, 2009) is a QA system for the German lan-
guage. It works with a knowledge base derived from
a local document collection, and the answers are pro-
duced using a combination of deep linguistic process-
ing and automated theorem proving. Evaluating QA
systems is difficult, as the quality of answers cannot
yet be judged automatically. LogAnswer is currently
accessible by a web-based interface. To expand its us-
age and to achieve a better evaluation, we are in the
process of adapting LogAnswer to QA forums. Such
internet forums normally provide their users with a
venue for asking and answering each others’ ques-
tions (a well-known example is the QA portal an-
swers.yahoo.com). By opening up our system to such
forums we hope that in the future we can draw from
the experiences of a vast number of users in a real-
world application of LogAnswer.
In our view, providing technological support for
QA forums is an ideal scenario for developing and
evaluating QA systems. The potential benefit for
users in the forum is just too evident even if the
experts in the forum are only disburdened from an-
swering questions with a known answer in the fo-
rum or in the Wikipedia, and even if those asking a
question in the forum get an instant, automatically
generated answer only for some of their questions.
However, the envisioned integration of human QA
Pelzer B., Glöckner I. and Dong T..
DOI: 10.5220/0003294304920497
In Proceedings of the 3rd International Conference on Agents and Artificial Intelligence (ICAART-2011), pages 492-497
ISBN: 978-989-8425-40-9
2011 SCITEPRESS (Science and Technology Publications, Lda.)
and automated QA will only succeed if the QA sys-
tem avoids posting wrong answers, because the users
should not be spammed with incorrect responses.
This requirement for precision-orientation disquali-
fies traditional, retrieval-based QA techniques, since
these cannot provide the required justified answers.
LogAnswer, by contrast, uses logical reasoning for
finding and validating answers, thus promising a bet-
ter self-assessment of answer quality.
The questions found in QA forums are also chal-
lenging due to the variety of question types, while
many existing QA systems can only handle defini-
tion questions and factual questions that ask for a
limited number of named-entity types like PERSON,
ORGANIZATION etc. In our view, the types of ques-
tions that can be tackled by QA techniques and those
that should best be answered by human users (like
questions asking for an opinion) are complementary.
While a QA system can answer questions of the first
kind (such as factoid questions) directly, it makes
more sense to treat the second kind by techniques for
FAQ finding, i.e. by looking up results in a repository
of questions with known (human generated) answers.
This process of FAQ finding, too, can potentially be
improved by incorporating techniques originally de-
veloped for question answering.
This paper is divided as follows: In Section 2 we
give a short description of LogAnswer. Section 3 pro-
vides an overview of QA forums and investigates dif-
ferent adaptation methods for LogAnswer. In the final
Section 4 we summarize our results and outline some
topics for future work.
2 THE LogAnswer SYSTEM
LogAnswer is designed as a German language QA
system on the web, which can serve as an alternative
to conventional search engines. It is accessible by a
web interface (www.loganswer.de) similar to that of a
search engine. The user enters a question into a text
box, and LogAnswer then presents the three best an-
swers, which are highlighted in the relevant textual
sources to provide a context. The answers are derived
from an extensive knowledge base, which has been
obtained by automatically translating a snapshot of
the German Wikipedia into a semantic network rep-
resentation in the MultiNet (Multilayered Extended
Semantic Networks) formalism (Helbig, 2006). The
WOCADI parser for German (Hartrumpf, 2003) was
used for that purpose. Based on its large semantic-
based lexicon (Hartrumpf et al., 2003), combined
with a robust treatment of unknownwords, it achieves
a 51.8 percent rate of full parses (80.6 percent in-
cluding chunk parses) on the Wikipedia. WOCADI
uses rule-based and statistical techniques for handling
the the various kinds of ambiguities that occur in NL
analysis (e.g. prepositonal attachment, resolution of
pronouns). For a description of these techniques in-
cluding a detailed evaluation, see (Hartrumpf, 2003).
The MultNet representations for 29.1 million sen-
tences in the Wikipedia as generated by WOCADI,
and an additional 12,000 logical rules and facts con-
stitute the general background knowledge of Log-
Answer. Among other things, these additional rules
connect the various ways in which a temporal or lo-
cal specification can be expressed. Examples are
shown in (Gl¨ockner, 2007). A large part of the back-
ground knowledge is concerned with connecting the
meaning representations of verbs and deverbal nouns
(e.g. ‘read’ – ‘reader’), or with connecting adjectives
and correspoding attributes espressed as nouns (e.g.
‘high’ ‘height’). Obviously, the results of Log-
Answer for a specific application can be improved by
adding domain-specific rules that cover paraphrases
and other inferences of relevance to the domain.
To make the semantic networks accessible to mod-
ern theorem provers, the MultiNet knowledge base
has been translated into First-Order Logic (FOL).
The expressivity of MultiNet exceeds that of FOL,
and while some aspects of MultiNet are lost in the
translation, we approximate the expressivity by using
logic extensions like equality and arithmetic evalua-
tion. See (Furbach et al., 2010) for a detailed example
of the logical translation and subsequent processing.
The complete knowledge base is too large to be
handled by an automated theorem prover, in partic-
ular when considering that the usage model of Log-
Answer requires short response times. Therefore the
initial processing steps for a user-provided question
serve to narrow down the knowledge base to the frag-
ment relevant to the task at hand. The query is trans-
lated into a MultiNet representation, and a machine-
learning (ML) ranking technique then uses shallow
linguistic criteria (like lexical overlap) to find the text
passages most likely to contain an answer. The as-
sessment of these criteria relies on pre-indexed infor-
mation and can be rapidly computed. The FOL repre-
sentations (answer candidates) of the most promising
text passages are then individually tested by the theo-
rem proverE-KRHyper (Pelzer and Wernhard, 2007).
In each test the candidate is combined with the back-
ground knowledge and the logical query representa-
tion as the input for the prover. A successful proof in-
stantiates variables in the FOL question with ground
terms representing the answer. Query relaxation tech-
niques increase the likelihood of finding a proof in
short time, at the cost of lowering the probability that
the answer is relevant for the query, i.e. decreasing
the quality of the answer for QA. A second ML phase
ranks all proofs according to their quality. The answer
terms are collected from the best proofs and translated
back into NL answers that are displayed to the user.
Considerable effort has been spent to make the
system robust to gaps in the background knowledge
and to parsing failure. If a parse of the query fails,
then the system is still able to find answer sentences
based on robust techniques like predictive annotation
(Prager et al., 2000). The robustness enhancing tech-
niques that were developed for LogAnswer are de-
scribed in (Gl¨ockner and Pelzer, 2008).
The evaluation of a QA system is difficult for several
reasons: Few questions have a single correct answer,
as answers can be paraphrased, or the question may be
unspecific. A user’s acceptance of an answer is very
subjective, depending on aspects like intellect and tol-
erance for malformedanswers. Hence it is insufficient
to evaluate a QA system using a fixed library of ques-
tions with known correct results.
One attempt at a standardized testing is the annual
competition of the Cross-Language Evaluation Forum
(CLEF: www.clef-campaign.org). A set of questions
is translated for each participating QA system, and
a panel of judges then assesses the answers. Since
2009 the questions refer to documents of the Euro-
pean Parliament, as these texts already exist in mul-
tiple translations, but this also means that they have
a specialized legal content. While LogAnswer has
performed well at CLEF (Gl¨ockner and Pelzer, 2010;
Gl¨ockner and Pelzer, 2009), the competition is not
representative for a real-world application with ac-
tual human users. Moreover, the questions are of-
ten closely based on specific documents. Compare
for example the question
“Which additives may be
used in the manufacture of peeled tomatoes?” and
the answer-containing text passage “As additives in
the manufacture of peeled tomatoes only citric acid (E
330) and calcium chloride (509) may be used. When
question and text passage are very similar, a logical
proof using their FOL representations is trivial. The
deep reasoning capabilities of a theorem prover are
not utilized, instead the bulk of the work is already
done once the information retrieval phase has found
the passage.
LogAnswer operates on German texts, but for better under-
standing all examples throughout this paper are in English.
To better evaluate both the real-world applica-
bility and the reasoning aspect of LogAnswer we
must therefore look beyond CLEF, and QA forums
provide an opportunity for this. We focus on Ger-
man forums here, but most have English versions
as well. In general, the QA forums allow users
to ask questions regarding any topic. Users can
also give answers to questions that have been asked,
and they can browse questions and answers, with
new unanswered questions being listed prominently.
Many forums allow multiple answers for a question,
and the questioner can mark the most helpful an-
swer. Frag Wikia! (
) allows only
one answer per question, but all users may edit and
improve this answer. QA forums are usually fi-
nanced by online advertising, as for example Frag
Wikia!, COSMiQ (www.cosmiq.de) and WikiAnswers
), or they may be paid directly
by the questioners (JustAnswer, www.justanswer.de).
Our preliminary experiments with LogAnswer con-
centrate on Frag Wikia! due to the permissive Cre-
ative Commons license for its approximately 80,000
questions. A tighter integration of LogAnswer with
any QA forum will require a cooperation with the fo-
rum owners, so once our research has progressed we
can consider an adaptation to a larger commercial QA
3.1 LogAnswer as a User
The most obvious application of LogAnswer is for
the system to assume the role of a virtual forum user
who answers the questions that have been posted. At
the time of this writing Frag Wikia! has a growing
backlog of about 5,300 unanswered questions. Log-
Answer can reply to questions immediately after they
have been posted, so that the questioner does not have
to wait an indefinite time for an answer. Alterna-
tively, since forum users do not expect immediate an-
swers, we can allow more time for the logical process-
ing than in the current usage model of LogAnswer,
thus enabling deeper reasoning and answers of higher
quality, while still ensuring that every new question is
answered within a few minutes. Table 1 shows exam-
ples of LogAnswer answering forum questions. By
covering such encyclopedic questions, LogAnswer
disburdens the users in the forum from writing an-
swers to those questions whose answer can easily be
found in the Wikipedia. Thus, the users can focus on
answering difficult questions that call for specialized
knowledge, human judgement, and advice from expe-
Some questions are well suited to the encyclope-
dic knowledge of LogAnswer. However, forum users
ICAART 2011 - 3rd International Conference on Agents and Artificial Intelligence
Table 1: Examples for questions from the Frag Wikia! QA forum, with answers provided by LogAnswer.
Question Reply from LogAnswer
How far is the Moon from Earth? 384,000km
When was the Berlin Wall built? 1961
Who was Ian Fleming? British author
also frequently ask about current events which are not
yet covered by the knowledge base of LogAnswer.
Many questions are also ambiguous or overly opti-
mistic, for example asking for the phone numbers of
celebrities. Nevertheless such questions must be ex-
pected in a real-world application.
In a first experiment, we collected a random sam-
ple of 200 Frag Wikia! questions.
We found
that 41% of the questions exhibit syntactic mistakes
(mostly spelling errors and wrong capitalization),
making them difficult to parse automatically. Still,
the WOCADI parser used by LogAnswer constructs
a logical representation for 81% of the questions. De-
spite the Frag Wikia! policy that questions with an
obvious answer in the Wikipedia should be avoided,
a manual search revealed that 61% of the questions in
our sample find an answer in the German Wikipedia,
so that LogAnswer has a chance of answering them.
Judging from the results pages produced by Log-
Answer that always show three answers, the system
achieved a 30% answer rate for these Wikipedia-
related questions.
3.2 LogAnswer Compares Questions
As a QA forum is frequented by a multitude of users,
the same questions are bound to be asked several
times by different people. Most forums try to avoid
repeatedly listing such questions as unanswered each
time they are posted. Instead an attempt is made
to redirect the respective user to an earlier instance
of the question, so that old answers can be reused.
This requires the forum software to compare the cur-
rent questions with the older questions in the forum
archive. Typically this is done by a search based
on keywords from the current question. As a result
QA forums may not find a semantically equivalent
question if it is a paraphrasing of the current ques-
tion. For example, Frag Wikia! contains the ques-
tion Q
: “What was the name of the first German
Chancellor?”, and asking the same question again
will produce this archived question together with an
answer. However, rephrasing Q
into Q
: “Who was
the first Chancellor of Germany?” will not lead to the
and its answer, as Q
treated as unique in-
see http://www.loganswer.de/resources/icaart2011.xml for the
questions, results of LogAnswer, and annotations.
WikiAnswers shows a similar behaviour: here
the paraphrased question is in the archive and will be
found when the identical wording is used. The origi-
nal question is not found, though. Instead the forum
search engine suggests other archived questions con-
taining keywords like “German” or “Chancellor”,
but none of them are relevant for the original ques-
LogAnswer can offer an improvement by per-
forming a semantic comparison between questions.
This requires a new knowledge base derived from the
archived questions. For this purpose the questions
must be treated like the Wikipedia text passages for
the original knowledge base: the questions are parsed
and then translated into their MultiNet and FOL rep-
resentations. The translation largely corresponds to
the way LogAnswer would normally translate a ques-
tion for question answering, with the exception that
constants are used instead of variables. The represen-
tations are then indexed by lexical content and by the
expected answer types of the questions, allowing effi-
cient access to the question-derived knowledge base.
When a user asks a new question, LogAnswer can
locate potentially matching archived questions using
the same ML filtering phase as for the text passages
(see Section 2). The theorem prover then tests the
filtered questions for equivalence to the user ques-
tion. In each test two proofs must be found, one
with the user question as a negated conjecture refuted
by the archived question and the general background
knowledge, and one proof vice versa which refutes
the archived question. For example, when attempt-
ing to match the aforementioned archived question
and the user question Q
, for one proof the prover
would operate on the following input
:pmod(c18, first, chancellor) prop(c16, german)
sub(c16, c18)...
:¬pmod(X1, first, chancellor) ¬attch(X2, FOCUS)
¬attr(X2, X3) ¬sub(FOCUS, X1)
¬sub(X3, name) ¬val(X3, germany).
Note that our rephrased question Q
has since been answered
by a different user.
The input has been simplified to improve legibility, focussing
on those literals which retain a semblance to the NL questions. The
original input has over 60 literals and uses more complex symbols
to account for different word meanings.
Successful proofs in both directions indicate the se-
mantic equivalence of user question and archived
question. Relaxation may be used, possibly weaken-
ing the probability of the proofs accurately represent-
ing equivalence. In the example LogAnswer finds a
refutational proof for Q
, using both relaxation and a
background knowledge axiom which essentially ex-
presses that having a nationality (“German Chancel-
lor”) and being of that nation (“Chancellor of Ger-
many”) are equivalent. When several archived ques-
tions have been tested this way for one user question,
then a second ML phase sorts the proofs to determine
the archived questions with the highest relevance for
the questioner. This process is similar to the second
ML phase described in Section 2, except that only a
subset of the criteria can be used since a comparison
of questions does not produce an answer. LogAnswer
then presents the best matching questions from the
archive, and the user can examine these and any ex-
isting answers.
3.3 Evaluation of LogAnswer by Forum
Most QA forums offer an evaluation system which al-
lows questioners to grade the answers they receive or
the users who provide them. When LogAnswer is
integrated into a QA forum this grading functional-
ity can form the basis for an evaluation of our sys-
tem. Unfortunately such grading is still in the plan-
ning stage in Frag Wikia!, which explains our moti-
vation to move on to a more full-featured commercial
forum once we have completed our internal adapta-
tions and tests of LogAnswer. For the time being we
can utilize the fact that Frag Wikia! users may edit
and improve answers which they find flawed, thereby
indirectly providing us with information about the ac-
ceptance of LogAnswer.
We believe QA forums to be an ideal environment for
a system like LogAnswer. They attract a large num-
ber of human users who have questions and who are
willing to grade the responses they receive, thereby
providing a large-scale evaluation of anyone who is
able to deliver answers in quantity, including an au-
tomated QA system. The questions found on QA fo-
rums concern arbitrary topics, and generally they are
asked by people who do not know the answers. They
might also be ambiguous, malformed, or based on
partial or even false information. This means that fo-
rum questions require a significant degree of flexibil-
ity from a QA system, and a reasoning power which
goes beyond pure information retrieval. An adapta-
tion of LogAnswer to QA forums will enable a thor-
ough evaluation of all aspects of our system, allowing
us to improve LogAnswer drawing on real-world QA
experience. The experimental results give us confi-
dence that this adaptation is feasible.
In our view, combining question answering com-
munities with technologies for automatic question an-
swering is potentially advantageous both for the users
in the forum and for QA research. For users ask-
ing a question, the most apparent benefit compared
to human answerers is the very quick response time
of the QA system. But introducing the QA agent also
helps the forum experts, who must no longer waste
their time with repeated questions or routine ques-
tions that can easily be answered from the Wikipedia.
In our concrete example forum (frag.wikia.com), fo-
rum policy discourages asking questions that target at
facts from the Wikipedia. Still, we found that 61%
of the questions in our sample have an answer in the
Wikipedia, so a QA system that uses the Wikipedia as
its document collection can indeed be useful. How-
ever, such a combination will only be acceptable for
the forum users if the involved QA system can re-
alistically judge its answer quality and avoid post-
ing wrong answers. Moreover, the system should be
able to integrate automated question answering and
FAQ finding from the repository of questions with
known answers. This integration is especially im-
portant for questions asking for opinions, advice or
judgements, where retrieving a human-provided an-
swer is currently the only realistic option.
In the long run, our aim is to develop a virtual and
learned internet user who can communicate with real
human users, answering questions in several natural
languages. Two critical cognitive capabilites for this
are: understanding natural languages which, more of-
ten than not, contain syntax errors, and finding an-
swers efficiently from a huge knowledge base. As to
the first challenge, we are aware that psychologists do
not view human errors as malfunctions of the human
system, but rather as windows to explore the nature
of the human system (Tversky, 1992). Consequently,
we do not treat sentences with syntax errors as excep-
tions. Instead, our language processing mechanism
should “understand” sentences as long as humans un-
derstand them. To achieve this, we will extend our
current method from using MultiNet alone to integrat-
ing MultiNet seamlessly with FrameNet, WordNet,
and OpenCyc. The basic idea is that the rich back-
ground knowledge provided by these resources can
ICAART 2011 - 3rd International Conference on Agents and Artificial Intelligence
often guide linguistic analysis to the intended mean-
Concerning the second problem, we recall the dis-
tinction of implicit knowledge (e.g. intuitions, expe-
riences, procedural knowledge) vs. explicit knowl-
edge (e.g. formal knowledge) (Anderson, 1983). Our
current approach in finding answers, which is purely
logical, is suited for explicit knowledge but would
become awkward for questions on implicit knowl-
edge. Such questions asking for advice, preferences,
or judgements are frequent in QA forums, though, so
that supporting them seems rewarding. Problems that
we will address in this context are: What is the im-
plicit knowledge for a given sentence? How shall we
represent it? How can implicit knowledge be con-
nected with our current knowledge representation and
reasoning system?
Anderson, J. R. (1983). The Architechture of Cognition.
Harvard University Press, Cambridge, MA.
Furbach, U., Gl¨ockner, I., and Pelzer, B. (2010). An
application of automated reasoning in natural lan-
guage question answering. AI Communications, 23(2-
3):241–265. PAAR Special Issue.
Gl¨ockner, I. (2007). Filtering and fusion of question-
answering streams by robust textual inference. In Pro-
ceedings of KRAQ’07, pages 43–48, Hyderabad, In-
Gl¨ockner, I. and Pelzer, B. (2008). Exploring robustness
enhancements for logic-based passage filtering. In
Knowledge Based Intelligent Information and Engi-
neering Systems (Proc. of KES2008, Part I), LNAI
5117, pages 606–614. Springer.
Gl¨ockner, I. and Pelzer, B. (2009). The LogAnswer
project at CLEF 2009. In Results of the CLEF
2009 Cross-Language System Evaluation Campaign,
Working Notes for the CLEF 2009 Workshop, Corfu,
Gl¨ockner, I. and Pelzer, B. (2010). The Log-
Answer project at ResPubliQA 2010. In
CLEF 2010 Labs and Workshops, Notebook Pa-
pers. http://clef2010.org/resources/proceedings/
clef2010labs submission 30.pdf.
Hartrumpf, S. (2003). Hybrid Disambiguation in Natural
Language Analysis. Der Andere Verlag, Osnabr¨uck,
Hartrumpf, S., Helbig, H., and Osswald, R. (2003).
The semantically based computer lexicon HaGenLex.
Traitement automatique des langues, 44(2):81–105.
Helbig, H. (2006). Knowledge Representation and the Se-
mantics of Natural Language. Springer.
Pelzer, B. and Wernhard, C. (2007). System Description: E-
KRHyper. In Automated Deduction - CADE-21, Pro-
ceedings, pages 508–513.
Prager, J., Brown, E., Coden, A., and Radev, D. (2000).
Question-answering by predictive annotation. In SI-
GIR ’00: Proceedings of the 23rd Annual Interna-
tional ACM SIGIR Conference on Research and De-
velopment in Information Retrieval, pages 184–191,
New York, NY. ACM Press.
Tversky, B. (1992). Distortions in Cognitive Maps. Geofo-
rum, 23(2):131–138.