2 RELATED WORK
For starters, (Alonso et al., 2009) introduced the no-
tion of the time-centred snippet as a useful way of rep-
resenting documents for exploratory search and doc-
ument retrieval. The core idea is profiting from sen-
tences that carry relevant units of time (chronons) for
building document surrogates.
Specifically, (Alonso et al., 2009) noticed that
chronons can be incorporated into web-pages as meta-
data or in the form of temporal expressions. The vital
aspect of these chronons is their relevance for presen-
tation and for highlighting the importance of a doc-
ument given a query. More precisely, they are a key
factor in the construction of more descriptive snippets
that include essential temporal information. In or-
der to detect chronons, (Alonso et al., 2009) analysed
documents for detecting temporal anchors by means
of time-based linguistic tools.
As for selected sentences, (Alonso et al., 2009)
only took into consideration sentences containing ex-
plicit temporal expressions. The length of these se-
lected sentences was bound. For the purpose of rank-
ing sentences, (Alonso et al., 2009) made allowances
for the position of the temporal expression within the
sentence, the number and length of the sentence, and
features regarding the particular chronon: appearance
order and its frequency in the document and within
the sentence. Since (Alonso et al., 2009) applied
this ranking function to a web-corpus, the features
they utilised were chiefly on the surface level. Sen-
tences are thus ranked, and the top are sorted and pre-
sented as a temporal snippet. An interesting finding
of (Alonso et al., 2009) is the fact that users were
concerned about the lack of time-sensitive informa-
tion, that is they are keen on seeing time-sensitive in-
formation within search results. In particular, users
found temporally anchored snippets as surrogates of
documents very useful and the presentation of sorted
temporal information interesting.
Contrarily, (Pas¸ca, 2008) utilised temporally an-
chored text snippets to answer definition questions.
The difference between both strategies lies in the fact
that temporally anchored answers to definition ques-
tions must be biographical, and the chronon must be
closely related to the definiendum, whereas tempo-
rally anchored sentences representing a document can
be more diverse in nature.
Essentially, (Pas¸ca, 2008) also focused on tech-
niques that lack deep linguistic processing for discov-
ering temporally anchored answers. Frequently, this
type of answer must be extracted from several docu-
ments, not only because of completeness, but also as
a means to increase the redundancy. In this way def-
inition QA systems boost the probability of detecting
a larger set of reliable and diverse answers that are
temporally anchored, and build richer chronologies
afterwards. Therefore, definition QA systems require
efficient strategies that can quickly process massive
collections of documents. In particular, (Pas¸ca, 2008)
processed one billion documents corresponding to the
2003 Web snapshot of Google. To be more precise,
they solely used HTML tags removal, sentence detec-
tion and part-of-speech (POS) tags.
In addition, (Pas¸ca, 2008) took advantage of a re-
stricted set of regular expressions to detect dates: iso-
lated year (four-digit numbers, e.g., 1977); or simple
decade (e.g., 1970s); or month name and year (e.g.,
January 1534); or month name, day number and year
(e.g., August, 1945). In order to increase the accu-
racy of their date matching strategy, potential dates
are discarded if they are immediately followed by a
noun or noun modifier, or immediately preceded by a
noun. Further, four lexico-syntactic surface patterns
were used for selecting answer candidates:
P
1
: <Date [,|-|(|nil] [when] Snipp et [,|-|)|.]>
P
2
: <[StartSent] [In|On] Date [,|-|(|nil] Snippet[, |-|)|.]>
P
3
: <[StartSent] Snippet [in|on] Date [EndSent]>
P
4
: <[Verb] [OptionalAdverb] [in|on] Date>
As a means to avoid overmatching sentences
formed by complex linguistic phenomena, they en-
forced nuggets on containing a verb and on carrying
no pronoun. (Pas¸ca, 2008) additionally ensured that
both P
2
and P
3
match the start of the sentence, and
that the nugget in P
4
contains a noun phrase. Since
the aim is building a method with limited linguistic
knowledge, this noun phrase was approximated by the
occurrence of a noun, adjective or determiner.
Also, (Pas¸ca, 2008) biased their ranking strategy
in favour of: (a) snippets contained in a higher num-
ber of documents, and (b) snippets that carry fewer
non-stop terms. By the same token, they preferred
snippets that matched query words as a term to scat-
ter query matches. Lastly, (Pas¸ca, 2008) ranked dates
in accordance with the relevance of the snippets sup-
porting that date, and in each date, snippets are also
ranked relatively to one another.
3 CORPUS ACQUISITION
Contrary to (Pas¸ca, 2008; Alonso et al., 2009), our
approach is data-driven. In short, it aims essentially
at learning regularities from training sentences (pos-
itive examples) that are deemed to convey tempo-
rally anchored information about definiendum. More
precisely, these positive examples are acquired from
WEBIST 2010 - 6th International Conference on Web Information Systems and Technologies
270