answers (Voorhees, 2001). IBM’s QA system Wat-
son (Ferrucci et al., 2010) is a state-of-the-art QA sys-
tem with many advanced features. But Watson does
not deal with questions with multiple answers. Fur-
thermore, most QA systems, including Watson, use
pre-collected text corpus, not the open Web as in our
approach.
Techniques for answering list questions in QA
systems are relevant to our work. In (Wang et al.,
2008), the authors proposed a method to expand a set
of answers from selected answer seeds. The idea of
this method is similar to our TAE algorithm except
their expansion only depends on the global correlation
of two candidate answers. According to our experi-
mental results in Section 6.3, global correlation turns
out to be the least effective among several types of
correlations we evaluated. Specifically, local correla-
tion and combined correlation are significantly better
for performing truthful alter-units (answers) expan-
sion. In (Jijkoun et al., 2007), answers are clustered
based on their similarity and all answers in the same
cluster are treated as one unit in the answer’s rank-
ing process. Similar idea is also found in (Ko et al.,
2007) except they extend the similarity computation
from string distance metrics to exploring semantics
similarity based on WordNet, Wikipedia, etc. Essen-
tially, their solution accepts multiple answers being
“synonyms” to each other, which is one of our infer-
ence rules. The work in (Razmara, 2008) is most
relevant to our TG algorithm. Both methods perform
clustering on candidate answers (alter-units) based on
correlations among them. But they also have several
significant differences. First, different correlations are
used. We use a combined correlation and the method
in (Razmara, 2008) uses correlation based on sen-
tences extracted from some documents (no global cor-
relation, no proximity information and no SRRs are
used). Second, the clustering process is also different.
Our method has three sub-steps and the best option
for correlation computation (i.e., synonym-based) is
not used in the method (Razmara, 2008). Finally and
very important, we would like to emphasize that the
fact statements we consider and the questions QA sys-
tems consider are very different concepts. The main
difference is the information about the doubt unit.
Each fact statement we consider has an instance of
the doubt unit while questions in QA systems have
only type information about the doubt unit (e.g., from
a question starting with “Where”, it can be easily in-
ferred that the type of the doubt unit is Location).
An instance has significantly more information than
a type. We can usually infer a more precise type from
the instance. For example, from “New York City”
we can infer a type City which is more specific than
Location. Furthermore, the instance itself provides
valuable information as it may be used to find clues
(via different relationships such as correlation rela-
tionships) for truthful alter-units. Our approach takes
advantage of this difference.
8 CONCLUSION
In this paper, we investigated the very challeng-
ing problem of processing doubtful fact statements
that have multiple alternative answers for a specified
doubt unit. The goal is to find all truthful answers
for such doubtful statements. We first evaluated a
Top-k solution and showed that none of the variations
of this solution is sufficiently accurate. We presented
solutions for two types of MTA statements (compat-
ible concepts and multi-valued attributes). Our solu-
tions explored some fundamental relationships among
truthful alter-units such as synonym, is a, part of and
co-occurrence correlation relationships. Based on dif-
ferent ways in which the above relationships are uti-
lized, we proposed three algorithms (TAE, TP and TG)
for selecting the truthful alter-units. We carefully
evaluated the effectiveness of different algorithms and
different types of correlations on different types (CC
and MVA) of MTA statements. Our experimental re-
sults indicate that the TG algorithm is the most effec-
tive overall with F-score around 90%.
ACKNOWLEDGEMENT
This work was supported in part by the following NSF
grants: IIS-1546441 and CNS-0958501. This work
was partially done when the first two authors visited
SA Center for Big Data Research hosted in Renmin
University of China. This Center is partially funded
by a Chinese National “111” Project “Attracting Inter-
national Talents in Data Engineering and Knowledge
Engineering Research”.
REFERENCES
Brin, S. and Page, L. (1998). The anatomy of a large-scale
hypertextual Web search engine. Computer Networks
and ISDN Systems, 30(1–7):107–117.
Ferrucci, D. A., Brown, E. W., Chu-Carroll, J., Fan, J.,
Gondek, D., Kalyanpur, A., Lally, A., Murdock, J. W.,
Nyberg, E., Prager, J. M., Schlaefer, N., and Welty,
C. A. (2010). Building watson: An overview of the
deepqa project. volume 31, pages 59–79.
WEBIST 2016 - 12th International Conference on Web Information Systems and Technologies
96