LogAnswer IN QUESTION ANSWERING FORUMS

Bj¨orn Pelzer

University Koblenz-Landau, Koblenz, Germany

Ingo Gl¨ockner, Tiansi Dong

University of Hagen, Hagen, Germany

Keywords:

Logic-based Question Answering, Question-Answer portals, Answer validation, Answer bots.

Abstract:

LogAnswer is a question answering (QA) system for the German language. By providing concise answers

to questions of the user, LogAnswer provides more natural access to document collections than conventional

search engines do. QA forums provide online venues where human users can ask each other questions and

give answers. We describe an ongoing adaptation of LogAnswer to QA forums, aiming at creating a virtual

forum user who can respond intelligently and efﬁciently to human questions. This serves not only as a more

accurate evaluation method of our system, but also as a real world use case for automated QA. The basic idea

is that the QA system can disburden the human experts from answering routine questions, e.g. questions with

known answer in the forum, or questions that can be answered from the Wikipedia. As a result, the users

can focus on those questions that really demand human judgement or expertise. In order not to spam users,

the QA system needs a good self-assessment of its answer quality. Existing QA techniques, however, are not

sufﬁciently precision-oriented. The need to provide justiﬁed answers thus fosters research into logic-oriented

QA and novel methods for answer validation.

1 INTRODUCTION

The ﬁeld of question answering (QA) aims at auto-

matically ﬁnding concise answers to arbitrary ques-

tions. A QA system communicates with the user in a

natural language (NL), its input has the form of prop-

erly phrased questions, and the answers are derived

from an extensive knowledge base built from a doc-

ument collection. Conventional search engines with

their keyword based search, by contrast, merely pro-

duce documents which must be studied further, an

approach which is impractical for users with speciﬁc

questions. Conversely, a QA system only delivers the

information that is requested, saving time and allow-

ing a satisfactory result presentation even on compact

mobile devices. Its NL interface can be used intu-

itively, making it suitable for casual users. Drawbacks

of QA are the difﬁculties of dealing with the ambigu-

ities of NL, the need to construct the knowledge base,

and the complexity of deriving answers.

LogAnswer (Furbach et al., 2010; Gl¨ockner and

Pelzer, 2009) is a QA system for the German lan-

guage. It works with a knowledge base derived from

a local document collection, and the answers are pro-

duced using a combination of deep linguistic process-

ing and automated theorem proving. Evaluating QA

systems is difﬁcult, as the quality of answers cannot

yet be judged automatically. LogAnswer is currently

accessible by a web-based interface. To expand its us-

age and to achieve a better evaluation, we are in the

process of adapting LogAnswer to QA forums. Such

internet forums normally provide their users with a

venue for asking and answering each others’ ques-

tions (a well-known example is the QA portal an-

swers.yahoo.com). By opening up our system to such

forums we hope that in the future we can draw from

the experiences of a vast number of users in a real-

world application of LogAnswer.

In our view, providing technological support for

QA forums is an ideal scenario for developing and

evaluating QA systems. The potential beneﬁt for

users in the forum is just too evident – even if the

experts in the forum are only disburdened from an-

swering questions with a known answer in the fo-

rum or in the Wikipedia, and even if those asking a

question in the forum get an instant, automatically

generated answer only for some of their questions.

However, the envisioned integration of human QA

492

Pelzer B., Glöckner I. and Dong T..

LogAnswer IN QUESTION ANSWERING FORUMS.

DOI: 10.5220/0003294304920497

In Proceedings of the 3rd International Conference on Agents and Artiﬁcial Intelligence (ICAART-2011), pages 492-497

ISBN: 978-989-8425-40-9

 2011 SCITEPRESS (Science and Technology Publications, Lda.)

and automated QA will only succeed if the QA sys-

tem avoids posting wrong answers, because the users

should not be spammed with incorrect responses.

This requirement for precision-orientation disquali-

ﬁes traditional, retrieval-based QA techniques, since

these cannot provide the required justiﬁed answers.

LogAnswer, by contrast, uses logical reasoning for

ﬁnding and validating answers, thus promising a bet-

ter self-assessment of answer quality.

The questions found in QA forums are also chal-

lenging due to the variety of question types, while

many existing QA systems can only handle deﬁni-

tion questions and factual questions that ask for a

limited number of named-entity types like PERSON,

ORGANIZATION etc. In our view, the types of ques-

tions that can be tackled by QA techniques and those

that should best be answered by human users (like

questions asking for an opinion) are complementary.

While a QA system can answer questions of the ﬁrst

kind (such as factoid questions) directly, it makes

more sense to treat the second kind by techniques for

FAQ ﬁnding, i.e. by looking up results in a repository

of questions with known (human generated) answers.

This process of FAQ ﬁnding, too, can potentially be

improved by incorporating techniques originally de-

veloped for question answering.

This paper is divided as follows: In Section 2 we

give a short description of LogAnswer. Section 3 pro-

vides an overview of QA forums and investigates dif-

ferent adaptation methods for LogAnswer. In the ﬁnal

Section 4 we summarize our results and outline some

topics for future work.

2 THE LogAnswer SYSTEM

LogAnswer is designed as a German language QA

system on the web, which can serve as an alternative

to conventional search engines. It is accessible by a

web interface (www.loganswer.de) similar to that of a

search engine. The user enters a question into a text

box, and LogAnswer then presents the three best an-

swers, which are highlighted in the relevant textual

sources to provide a context. The answers are derived

from an extensive knowledge base, which has been

obtained by automatically translating a snapshot of

the German Wikipedia into a semantic network rep-

resentation in the MultiNet (Multilayered Extended

Semantic Networks) formalism (Helbig, 2006). The

WOCADI parser for German (Hartrumpf, 2003) was

used for that purpose. Based on its large semantic-

based lexicon (Hartrumpf et al., 2003), combined

with a robust treatment of unknownwords, it achieves

a 51.8 percent rate of full parses (80.6 percent in-

cluding chunk parses) on the Wikipedia. WOCADI

uses rule-based and statistical techniques for handling

the the various kinds of ambiguities that occur in NL

analysis (e.g. prepositonal attachment, resolution of

pronouns). For a description of these techniques in-

cluding a detailed evaluation, see (Hartrumpf, 2003).

The MultNet representations for 29.1 million sen-

tences in the Wikipedia as generated by WOCADI,

and an additional 12,000 logical rules and facts con-

stitute the general background knowledge of Log-

Answer. Among other things, these additional rules

connect the various ways in which a temporal or lo-

cal speciﬁcation can be expressed. Examples are

shown in (Gl¨ockner, 2007). A large part of the back-

ground knowledge is concerned with connecting the

meaning representations of verbs and deverbal nouns

(e.g. ‘read’ – ‘reader’), or with connecting adjectives

and correspoding attributes espressed as nouns (e.g.

‘high’ – ‘height’). Obviously, the results of Log-

Answer for a speciﬁc application can be improved by

adding domain-speciﬁc rules that cover paraphrases

and other inferences of relevance to the domain.

To make the semantic networks accessible to mod-

ern theorem provers, the MultiNet knowledge base

has been translated into First-Order Logic (FOL).

The expressivity of MultiNet exceeds that of FOL,

and while some aspects of MultiNet are lost in the

translation, we approximate the expressivity by using

logic extensions like equality and arithmetic evalua-

tion. See (Furbach et al., 2010) for a detailed example

of the logical translation and subsequent processing.

The complete knowledge base is too large to be

handled by an automated theorem prover, in partic-

ular when considering that the usage model of Log-

Answer requires short response times. Therefore the

initial processing steps for a user-provided question

serve to narrow down the knowledge base to the frag-

ment relevant to the task at hand. The query is trans-

lated into a MultiNet representation, and a machine-

learning (ML) ranking technique then uses shallow

linguistic criteria (like lexical overlap) to ﬁnd the text

passages most likely to contain an answer. The as-

sessment of these criteria relies on pre-indexed infor-

mation and can be rapidly computed. The FOL repre-

sentations (answer candidates) of the most promising

text passages are then individually tested by the theo-

rem proverE-KRHyper (Pelzer and Wernhard, 2007).

In each test the candidate is combined with the back-

ground knowledge and the logical query representa-

tion as the input for the prover. A successful proof in-

stantiates variables in the FOL question with ground

terms representing the answer. Query relaxation tech-

niques increase the likelihood of ﬁnding a proof in

short time, at the cost of lowering the probability that

LogAnswer IN QUESTION ANSWERING FORUMS

493

the answer is relevant for the query, i.e. decreasing

the quality of the answer for QA. A second ML phase

ranks all proofs according to their quality. The answer

terms are collected from the best proofs and translated

back into NL answers that are displayed to the user.

Considerable effort has been spent to make the

system robust to gaps in the background knowledge

and to parsing failure. If a parse of the query fails,

then the system is still able to ﬁnd answer sentences

based on robust techniques like predictive annotation

(Prager et al., 2000). The robustness enhancing tech-

niques that were developed for LogAnswer are de-

scribed in (Gl¨ockner and Pelzer, 2008).

3 QUESTION ANSWERING

FORUMS

The evaluation of a QA system is difﬁcult for several

reasons: Few questions have a single correct answer,

as answers can be paraphrased, or the question may be

unspeciﬁc. A user’s acceptance of an answer is very

subjective, depending on aspects like intellect and tol-

erance for malformedanswers. Hence it is insufﬁcient

to evaluate a QA system using a ﬁxed library of ques-

tions with known correct results.

One attempt at a standardized testing is the annual

competition of the Cross-Language Evaluation Forum

(CLEF: www.clef-campaign.org). A set of questions

is translated for each participating QA system, and

a panel of judges then assesses the answers. Since

2009 the questions refer to documents of the Euro-

pean Parliament, as these texts already exist in mul-

tiple translations, but this also means that they have

a specialized legal content. While LogAnswer has

performed well at CLEF (Gl¨ockner and Pelzer, 2010;

Gl¨ockner and Pelzer, 2009), the competition is not

representative for a real-world application with ac-

tual human users. Moreover, the questions are of-

ten closely based on speciﬁc documents. Compare

for example the question

“Which additives may be

used in the manufacture of peeled tomatoes?” and

the answer-containing text passage “As additives in

the manufacture of peeled tomatoes only citric acid (E

330) and calcium chloride (509) may be used.” When

question and text passage are very similar, a logical

proof using their FOL representations is trivial. The

deep reasoning capabilities of a theorem prover are

not utilized, instead the bulk of the work is already

done once the information retrieval phase has found

the passage.

LogAnswer operates on German texts, but for better under-

standing all examples throughout this paper are in English.

To better evaluate both the real-world applica-

bility and the reasoning aspect of LogAnswer we

must therefore look beyond CLEF, and QA forums

provide an opportunity for this. We focus on Ger-

man forums here, but most have English versions

as well. In general, the QA forums allow users

to ask questions regarding any topic. Users can

also give answers to questions that have been asked,

and they can browse questions and answers, with

new unanswered questions being listed prominently.

Many forums allow multiple answers for a question,

and the questioner can mark the most helpful an-

swer. Frag Wikia! (

frag.wikia.com

) allows only

one answer per question, but all users may edit and

improve this answer. QA forums are usually ﬁ-

nanced by online advertising, as for example Frag

Wikia!, COSMiQ (www.cosmiq.de) and WikiAnswers

(

de.answers.com

), or they may be paid directly

by the questioners (JustAnswer, www.justanswer.de).

Our preliminary experiments with LogAnswer con-

centrate on Frag Wikia! due to the permissive Cre-

ative Commons license for its approximately 80,000

questions. A tighter integration of LogAnswer with

any QA forum will require a cooperation with the fo-

rum owners, so once our research has progressed we

can consider an adaptation to a larger commercial QA

forum.

3.1 LogAnswer as a User

The most obvious application of LogAnswer is for

the system to assume the role of a virtual forum user

who answers the questions that have been posted. At

the time of this writing Frag Wikia! has a growing

backlog of about 5,300 unanswered questions. Log-

Answer can reply to questions immediately after they

have been posted, so that the questioner does not have

to wait an indeﬁnite time for an answer. Alterna-

tively, since forum users do not expect immediate an-

swers, we can allow more time for the logical process-

ing than in the current usage model of LogAnswer,

thus enabling deeper reasoning and answers of higher

quality, while still ensuring that every new question is

answered within a few minutes. Table 1 shows exam-

ples of LogAnswer answering forum questions. By

covering such encyclopedic questions, LogAnswer

disburdens the users in the forum from writing an-

swers to those questions whose answer can easily be

found in the Wikipedia. Thus, the users can focus on

answering difﬁcult questions that call for specialized

knowledge, human judgement, and advice from expe-

rience.

Some questions are well suited to the encyclope-

dic knowledge of LogAnswer. However, forum users

ICAART 2011 - 3rd International Conference on Agents and Artificial Intelligence

494

Table 1: Examples for questions from the Frag Wikia! QA forum, with answers provided by LogAnswer.

Question Reply from LogAnswer

How far is the Moon from Earth? 384,000km

When was the Berlin Wall built? 1961

Who was Ian Fleming? British author

also frequently ask about current events which are not

yet covered by the knowledge base of LogAnswer.

Many questions are also ambiguous or overly opti-

mistic, for example asking for the phone numbers of

celebrities. Nevertheless such questions must be ex-

pected in a real-world application.

In a ﬁrst experiment, we collected a random sam-

ple of 200 Frag Wikia! questions.

We found

that 41% of the questions exhibit syntactic mistakes

(mostly spelling errors and wrong capitalization),

making them difﬁcult to parse automatically. Still,

the WOCADI parser used by LogAnswer constructs

a logical representation for 81% of the questions. De-

spite the Frag Wikia! policy that questions with an

obvious answer in the Wikipedia should be avoided,

a manual search revealed that 61% of the questions in

our sample ﬁnd an answer in the German Wikipedia,

so that LogAnswer has a chance of answering them.

Judging from the results pages produced by Log-

Answer that always show three answers, the system

achieved a 30% answer rate for these Wikipedia-

related questions.

3.2 LogAnswer Compares Questions

As a QA forum is frequented by a multitude of users,

the same questions are bound to be asked several

times by different people. Most forums try to avoid

repeatedly listing such questions as unanswered each

time they are posted. Instead an attempt is made

to redirect the respective user to an earlier instance

of the question, so that old answers can be reused.

This requires the forum software to compare the cur-

rent questions with the older questions in the forum

archive. Typically this is done by a search based

on keywords from the current question. As a result

QA forums may not ﬁnd a semantically equivalent

question if it is a paraphrasing of the current ques-

tion. For example, Frag Wikia! contains the ques-

tion Q

: “What was the name of the ﬁrst German

Chancellor?”, and asking the same question again

will produce this archived question together with an

answer. However, rephrasing Q

into Q

: “Who was

the ﬁrst Chancellor of Germany?” will not lead to the

archivedQ

and its answer, as Q

treated as unique in-

see http://www.loganswer.de/resources/icaart2011.xml for the

questions, results of LogAnswer, and annotations.

stead.

WikiAnswers shows a similar behaviour: here

the paraphrased question is in the archive and will be

found when the identical wording is used. The origi-

nal question is not found, though. Instead the forum

search engine suggests other archived questions con-

taining keywords like “German” or “Chancellor”,

but none of them are relevant for the original ques-

tion.

LogAnswer can offer an improvement by per-

forming a semantic comparison between questions.

This requires a new knowledge base derived from the

archived questions. For this purpose the questions

must be treated like the Wikipedia text passages for

the original knowledge base: the questions are parsed

and then translated into their MultiNet and FOL rep-

resentations. The translation largely corresponds to

the way LogAnswer would normally translate a ques-

tion for question answering, with the exception that

constants are used instead of variables. The represen-

tations are then indexed by lexical content and by the

expected answer types of the questions, allowing efﬁ-

cient access to the question-derived knowledge base.

When a user asks a new question, LogAnswer can

locate potentially matching archived questions using

the same ML ﬁltering phase as for the text passages

(see Section 2). The theorem prover then tests the

ﬁltered questions for equivalence to the user ques-

tion. In each test two proofs must be found, one

with the user question as a negated conjecture refuted

by the archived question and the general background

knowledge, and one proof vice versa which refutes

the archived question. For example, when attempt-

ing to match the aforementioned archived question

and the user question Q

, for one proof the prover

would operate on the following input

:pmod(c18, ﬁrst, chancellor) ∧ prop(c16, german)

∧ sub(c16, c18)...

:¬pmod(X1, ﬁrst, chancellor)∨ ¬attch(X2, FOCUS)

∨ ¬attr(X2, X3)∨ ¬sub(FOCUS, X1)

∨ ¬sub(X3, name)∨ ¬val(X3, germany).

Note that our rephrased question Q

has since been answered

by a different user.

The input has been simpliﬁed to improve legibility, focussing

on those literals which retain a semblance to the NL questions. The

original input has over 60 literals and uses more complex symbols

to account for different word meanings.

LogAnswer IN QUESTION ANSWERING FORUMS

495

Successful proofs in both directions indicate the se-

mantic equivalence of user question and archived

question. Relaxation may be used, possibly weaken-

ing the probability of the proofs accurately represent-

ing equivalence. In the example LogAnswer ﬁnds a

refutational proof for Q

, using both relaxation and a

background knowledge axiom which essentially ex-

presses that having a nationality (“German Chancel-

lor”) and being of that nation (“Chancellor of Ger-

many”) are equivalent. When several archived ques-

tions have been tested this way for one user question,

then a second ML phase sorts the proofs to determine

the archived questions with the highest relevance for

the questioner. This process is similar to the second

ML phase described in Section 2, except that only a

subset of the criteria can be used since a comparison

of questions does not produce an answer. LogAnswer

then presents the best matching questions from the

archive, and the user can examine these and any ex-

isting answers.

3.3 Evaluation of LogAnswer by Forum

Users

Most QA forums offer an evaluation system which al-

lows questioners to grade the answers they receive or

the users who provide them. When LogAnswer is

integrated into a QA forum this grading functional-

ity can form the basis for an evaluation of our sys-

tem. Unfortunately such grading is still in the plan-

ning stage in Frag Wikia!, which explains our moti-

vation to move on to a more full-featured commercial

forum once we have completed our internal adapta-

tions and tests of LogAnswer. For the time being we

can utilize the fact that Frag Wikia! users may edit

and improve answers which they ﬁnd ﬂawed, thereby

indirectly providing us with information about the ac-

ceptance of LogAnswer.

4 CONCLUSIONS AND

OUTLOOK

We believe QA forums to be an ideal environment for

a system like LogAnswer. They attract a large num-

ber of human users who have questions and who are

willing to grade the responses they receive, thereby

providing a large-scale evaluation of anyone who is

able to deliver answers in quantity, including an au-

tomated QA system. The questions found on QA fo-

rums concern arbitrary topics, and generally they are

asked by people who do not know the answers. They

might also be ambiguous, malformed, or based on

partial or even false information. This means that fo-

rum questions require a signiﬁcant degree of ﬂexibil-

ity from a QA system, and a reasoning power which

goes beyond pure information retrieval. An adapta-

tion of LogAnswer to QA forums will enable a thor-

ough evaluation of all aspects of our system, allowing

us to improve LogAnswer drawing on real-world QA

experience. The experimental results give us conﬁ-

dence that this adaptation is feasible.

In our view, combining question answering com-

munities with technologies for automatic question an-

swering is potentially advantageous both for the users

in the forum and for QA research. For users ask-

ing a question, the most apparent beneﬁt compared

to human answerers is the very quick response time

of the QA system. But introducing the QA agent also

helps the forum experts, who must no longer waste

their time with repeated questions or routine ques-

tions that can easily be answered from the Wikipedia.

In our concrete example forum (frag.wikia.com), fo-

rum policy discourages asking questions that target at

facts from the Wikipedia. Still, we found that 61%

of the questions in our sample have an answer in the

Wikipedia, so a QA system that uses the Wikipedia as

its document collection can indeed be useful. How-

ever, such a combination will only be acceptable for

the forum users if the involved QA system can re-

alistically judge its answer quality and avoid post-

ing wrong answers. Moreover, the system should be

able to integrate automated question answering and

FAQ ﬁnding from the repository of questions with

known answers. This integration is especially im-

portant for questions asking for opinions, advice or

judgements, where retrieving a human-provided an-

swer is currently the only realistic option.

In the long run, our aim is to develop a virtual and

learned internet user who can communicate with real

human users, answering questions in several natural

languages. Two critical cognitive capabilites for this

are: understanding natural languages which, more of-

ten than not, contain syntax errors, and ﬁnding an-

swers efﬁciently from a huge knowledge base. As to

the ﬁrst challenge, we are aware that psychologists do

not view human errors as malfunctions of the human

system, but rather as windows to explore the nature

of the human system (Tversky, 1992). Consequently,

we do not treat sentences with syntax errors as excep-

tions. Instead, our language processing mechanism

should “understand” sentences as long as humans un-

derstand them. To achieve this, we will extend our

current method from using MultiNet alone to integrat-

ing MultiNet seamlessly with FrameNet, WordNet,

and OpenCyc. The basic idea is that the rich back-

ground knowledge provided by these resources can

ICAART 2011 - 3rd International Conference on Agents and Artificial Intelligence

496

often guide linguistic analysis to the intended mean-

ing.

Concerning the second problem, we recall the dis-

tinction of implicit knowledge (e.g. intuitions, expe-

riences, procedural knowledge) vs. explicit knowl-

edge (e.g. formal knowledge) (Anderson, 1983). Our

current approach in ﬁnding answers, which is purely

logical, is suited for explicit knowledge but would

become awkward for questions on implicit knowl-

edge. Such questions asking for advice, preferences,

or judgements are frequent in QA forums, though, so

that supporting them seems rewarding. Problems that

we will address in this context are: What is the im-

plicit knowledge for a given sentence? How shall we

represent it? How can implicit knowledge be con-

nected with our current knowledge representation and

reasoning system?

REFERENCES

Anderson, J. R. (1983). The Architechture of Cognition.

Harvard University Press, Cambridge, MA.

Furbach, U., Gl¨ockner, I., and Pelzer, B. (2010). An

application of automated reasoning in natural lan-

guage question answering. AI Communications, 23(2-

3):241–265. PAAR Special Issue.

Gl¨ockner, I. (2007). Filtering and fusion of question-

answering streams by robust textual inference. In Pro-

ceedings of KRAQ’07, pages 43–48, Hyderabad, In-

dia.

Gl¨ockner, I. and Pelzer, B. (2008). Exploring robustness

enhancements for logic-based passage ﬁltering. In

Knowledge Based Intelligent Information and Engi-

neering Systems (Proc. of KES2008, Part I), LNAI

5117, pages 606–614. Springer.

Gl¨ockner, I. and Pelzer, B. (2009). The LogAnswer

project at CLEF 2009. In Results of the CLEF

2009 Cross-Language System Evaluation Campaign,

Working Notes for the CLEF 2009 Workshop, Corfu,

Greece.

Gl¨ockner, I. and Pelzer, B. (2010). The Log-

Answer project at ResPubliQA 2010. In

CLEF 2010 Labs and Workshops, Notebook Pa-

pers. http://clef2010.org/resources/proceedings/

clef2010labs submission 30.pdf.

Hartrumpf, S. (2003). Hybrid Disambiguation in Natural

Language Analysis. Der Andere Verlag, Osnabr¨uck,

Germany.

Hartrumpf, S., Helbig, H., and Osswald, R. (2003).

The semantically based computer lexicon HaGenLex.

Traitement automatique des langues, 44(2):81–105.

Helbig, H. (2006). Knowledge Representation and the Se-

mantics of Natural Language. Springer.

Pelzer, B. and Wernhard, C. (2007). System Description: E-

KRHyper. In Automated Deduction - CADE-21, Pro-

ceedings, pages 508–513.

Prager, J., Brown, E., Coden, A., and Radev, D. (2000).

Question-answering by predictive annotation. In SI-

GIR ’00: Proceedings of the 23rd Annual Interna-

tional ACM SIGIR Conference on Research and De-

velopment in Information Retrieval, pages 184–191,

New York, NY. ACM Press.

Tversky, B. (1992). Distortions in Cognitive Maps. Geofo-

rum, 23(2):131–138.

LogAnswer IN QUESTION ANSWERING FORUMS

497