passages can be done in a number of ways. For exam-
ple, either via some textual identifier e.g. paragraph
markings (< p >), new line feed (/n) etc. or it can be
defined by a number of words. A passage could be a
sentence, a number of sentences or a paragraph itself.
The passages can be considered as discrete passages
with no intersection or can be viewed as overlapping
passages.
In this paper we introduce different similarity
functions that were used to generate new document
rankings by computing the passage similarity and
using this score (or its combination with document
level similarity score) as a means to rank the overall
document.
The main focus of our work is to see how
effectively the passage level evidence affected the
document retrieval. Factors such as different means
to define passage boundaries are not of huge concern
to us as present.
We have used the WebAp (Web Answer Passage)
1
test collection which is obtained from the 2004 TREC
Terabyte Track Gov2 collection and the Ohsumed
test collection (Hersh et al., 1994) which comprises
titles and/or abstracts from 270 Medline reference
medical journals. The results show that different
similarity functions behave differently across the two
test collections.
The paper outline is as follows: section 2 presents
a brief overview of the previous work in passage level
retrieval. Section 3 gives an overview of the method-
ology employed, outlining the details of different sim-
ilarity functions, the passage boundary approach, and
the evaluation measures adopted in the experiments.
Section 4 presents a brief explanation of the test col-
lections used in the experiments and the assumptions
made for them. Section 5 discusses different experi-
mental results obtained. Finally, section 6 provides a
summary of the main conclusions and outlines future
work.
2 RELATED WORK
In previous research, passage level retrieval has
been studied in information retrieval from different
perspectives. For defining the passage boundaries,
several approaches have been used. Bounded pas-
sages, overlapping window size, text-tiling, usage
of language models and arbitrary passages (Callan,
1
https://ciir.cs.umass.edu/downloads/WebAP/
1994; Hearst, 1997; Bendersky and Kurland, 2008b;
Kaszkiel and Zobel, 2001; Clarke et al., 2008)
are among the few main techniques. Window size
approaches consider the word count to separate the
passages from each other, irrespective of the written
structure of the document. Overlapping window
size is shown to be more effective and useful for
the document retrieval (Callan, 1994). Similarly, a
variant of the same approach was used by Croft (Liu
and Croft, 2002).
Jong (Jong et al., 2015) proposed an approach
which involved considering the score of passages
generated from an evaluation function to effectively
retrieve documents in a Question Answering system.
Their evaluation function calculates the proximity of
the different terms used in the query with different
passages and takes the maximum proximity score for
the document ranking.
Callan (Callan, 1994) demonstrated that ordering
documents based on the score of the best passage may
be up to 20% more effective than standard document
ranking. Similarly, for certain test collections, it
was concluded that combining the document score
with the best passage score gives improved results.
Buckley et al also use the combination of both scores
in a more complex manner, to generate scores for
ranking (Buckley et al., 1995). Moreover, Hearst et
al (Hearst and Plaunt, 1993) showed that instead of
only using the best passage with the maximum score,
adding other passages gives better overall ranking
as compare to the ad-hoc document ranking approach.
Salton (Salton et al., 1993) discussed another
idea to calculate the similarity of the passage to the
query. They re-ranked and filtered out the documents
that has a low passage score associated with it. They
included all the passages that have a higher score
than its overall document score, and then used these
scores to raise, or lower, the final document rank. In
this way, the document that has a lower score to the
document level score but a higher score at passage
level for certain passages, will get a better ranking
score in the end.
Different language modelling approaches at pas-
sage level and document level have been used in the
past to improve the document ranking (Liu and Croft,
2002; Lavrenko and Croft, 2001). A similar approach
has been used by Bendersky et al (Bendersky and
Kurland, 2008b), where they used the measure of the
document homogeneity and heterogeneity to combine
the document and passage similarity with the query