line of thought, on an extremely pessimistic view of the prospects of QATD, it will not
work for any parser, or question/answer set: sorting answers by tree-distance would be
no better than generating a random permutation. On an optimisitic view, at least for
some parsers, and some question/answer sets, the syntactic structures can be taken as an
approximation of semantic structures, and sorting by tree-distance will be useful. For
2 different parsers, and 2 QATD tasks, we have found reasons for the optimistic view,
in the form of the finding that improvements to parse quality lead to improved QATD
performance. The 2 tasks are:
The Library Manual QATD Task: in this case Q is a set of 88 hand-created
queries, and COR
q
, shared by all the queries, is the sentences of the manual of
the GNU C Library
2
.
The TREC 11 QATD task: In this case Q was the 500 questions of the
the TREC11 QA track [3], whose answers are drawn from a large corpus
of newspaper articles. COR
q
was taken to be the sentences of the top 50
from the top-1000 ranking of articles provided by TREC11 for each question
(|COR
q
|≈ 1000). Answer correctness was determined using the TREC11
answer regular expressions
The performance on these QATD tasks has been determined for some variants of
a home-grown parsing system – call it the trinity parser – and the Collins parser [2]
(Model 3 variant). Space precludes giving all the details but the basic finding is that
parse quality does equate to QATD performance. The left-hand data in Table 1 refers
to various reductions of the linguistic knowledge bases of the the trinity parser(thin50
= random removal of 50 % subset, manual = manual removal of a subset, flat = entirely
flat parses, gold = hand-correction of query parses and their correct answers). The right-
hand data in Table 1 refers to experiments in which the repertoire of moves available to
the Collins parser, as defined by its grammar file, was reduced to different sized random
subsets of itself.
Table 1. Distribution of Correct Cutoff across query set Q in different parse settings. Left-hand
data = GNU task, trinity parser, right-hand data = TREC11 task, Collins parser.
Parsing 1st Qu. Median Mean 3rd Qu.
flat 0.1559 0.2459 0.2612 0.3920
manual 0.0215 0.2103 0.2203 0.3926
thin50 0.01418 0.02627 0.157 0.2930
full 0.00389 0.04216 0.1308 0.2198
gold 0.00067 0.0278 0.1087 0.1669
Parsing 1st Qu. Median Mean 3rd Qu.
55 0.3157 0.6123 0.5345 0.766400
75 0.02946 0.1634 0.2701 0.4495
85 0.0266 0.1227 0.2501 0.4380
100 0.01256 0.08306 0.2097 0.2901
The basic notion of tree distance can be varied in many ways, some of which are:
Sub-tree: in this variant, the sub-tree distance is the cost of the least cost mapping from
a sub-tree of the source. Sub-traversal: the least cost mapping from a sub-traversal of
2
www.gnu.org
93