other approaches (Yang and Chua, 2004a; Yang and
Chua, 2004b), which download and process more
than 1000 full web documents, or submit more than
20 queries to different search engines, finishing with
an F
1
score of .464 ∼ .469 on TREC 2003. Our strat-
egy can strengthen their strategy, specially their clas-
sification and clustering of full documents.
In contrast to the observations in TREC 2001
(Voorhees, 2001), duplicate answers have a consid-
erable impact on the performance, because answers
are taken from many different sources. One singular
case is the several spellings and misspellings of an an-
swer. For instance,
ListWebQA
retrieved three differ-
ent spellings/misspellings for the Chuck Berry’s song
“Maybelline” (also found as “Maybellene” and “May-
beline”). Additionally, inexact or incomplete answers
also have an impact on the performance. For exam-
ple, John Updike’s novel “The Poorhouse Fair” was
also found as “Poorhouse Fair”.
5 CONCLUSIONS AND FUTURE
WORK
This paper presented
ListWebQA
, a question answer-
ing system which aimed specially at extracting an-
swers to list questions from web snippets. Our results
indicate that it is feasible to discover answers in web
snippets. We envisage that these answers will help to
select the most promising documents, and afterwards,
detecting the portions where these answers are.
Additionally, we envision that dependency trees
can be used to increase the accuracy of the recognition
of answer candidates, and extra search queries can be
formulated in order to boost the recall of answers in
web snippets. For this last purpose, we deem that
Google n-grams and on-line encyclopaedias would be
tremendously useful.
ACKNOWLEDGEMENTS
This work was partially supported by a research grant
from the German Federal Ministry of Education, Sci-
ence, Research and Technology (BMBF) to the DFKI
project
HyLaP
(FKZ: 01 IW F02) and the EC-funded
project QALL-ME.
REFERENCES
Cederberg, S. and Windows, D. (2003). Using lsa and noun
coordination information to improve the precision and
recall of automatic hyponymy extraction. In Confer-
ence on Natural Language Learning (CoNLL-2003),
pages 111–118, Edmonton, Canada.
Hearst, M. (1992). Automatic acquisition of hyponomys
from large text corpora. In Fourteenth International
Conference on computational Linguistics, pages 539–
545, Nantes, France.
Katz, B., Bilotti, M., Felshin, S., Fernandes, A., Hilde-
brandt, W., Katzir, R., Lin, J., Loreto, D., Marton, G.,
Mora, F., and Uzuner, O. (2004). Answering multiple
questions on a topic from heterogeneous resources. In
TREC 2004, Gaithersburg, Maryland.
Katz, B., Lin, J., Loreto, D., Hildebrandt, W., Bilotti, M.,
Felshin, S., Fernandes, A., Marton, G., and Mora, F.
(2003). Integrating web-based and corpus-based tech-
niques for question answering. In TREC 2003, pages
426–435, Gaithersburg, Maryland.
Katz, B., Marton, G., Borchardt, G., Brownell, A., Felshin,
S., Loreto, D., Louis-Rosenberg, J., Lu, B., Mora, F.,
Stiller, S., Uzuner, O., and Wilcox, A. (2005). Ex-
ternal knowledge sources for question answering. In
TREC 2005, Gaithersburg, Maryland.
Schone, P., Ciany, G., Cutts, R., Mayfield, J., and Smith, T.
(2005). Qactis-based question answering at trec 2005.
In TREC 2005, Gaithersburg, Maryland.
Shawe-Taylor, J. and Cristianini, N. (2004). Kernel meth-
ods for pattern analysis, chapter 10, pages 335–339.
Cambridge University Press.
Shinzato, K. and Torisawa, K. (2004a). Acquiring hy-
ponymy relations from web documents. In HLT-
NAACL 2004, pages 73–80, Boston, MA, USA.
Shinzato, K. and Torisawa, K. (2004b). Extracting hy-
ponyms of prespecified hypernyms from itemizations
and headings in web documents. In COLING ’04,
pages 938–944, Geneva, Switzerland.
Sombatsrisomboon, R., Matsuo, P., and Ishizuka, M.
(2003). Acquisition of hypernyms and hyponyms
from the www. In 2nd International Workshop on Ac-
tive Mining, Maebashi, Japan.
Voorhees, E. M. (2001). Overview of the trec 2001 ques-
tion answering track. In TREC 2001, pages 42–51,
Gaithersburg, Maryland.
Voorhees, E. M. (2003). Overview of the trec 2003 ques-
tion answering track. In TREC 2003, pages 54–68,
Gaithersburg, Maryland.
Wu, L., Huang, X., Zhou, Y., Zhang, Z., and Lin, F. (2005).
Fduqa on trec2005 qatrack. In TREC 2005, Gaithers-
burg, Maryland.
Yang, H. and Chua, T. (2004a). Effectiveness of web page
classification on finding list answers. In SIGIR ’04,
pages 522–523, Sheffield, United Kingdom.
Yang, H. and Chua, T. (2004b). Web-based list question an-
swering. In Proceedings of COLING ’04, pages 1277–
1283, Geneva, Switzerland.
FINDING DISTINCT ANSWERS IN WEB SNIPPETS
33