is motivated by its:
• Popularity – calendar-based temporal expressions
occur relatively frequently, especially in news sto-
ries,
• Simplicity – one of the most common way of ex-
pressing temporal constraints by users is to use
calendar expressions, using the same time model
for queries and index simplifies the model,
• Expressiveness – model should allow to express
semantics of temporal expressions as precisely as
possible; each expression should be encoded at
the granularity level at which it was expressed in
a document.
A document may be then indexed with pairs
(I, G), where I is a granule index within granularity
G (see (Bettini et al., 1998) for calendar arithmetic).
We suggest using following granularities: a day of the
week, a day of the month, a week of the year, a month
of the year, a quarter of the of year, a half of the year,
a season of the year, a year, a decade, and a century
– G ∈ {DOW . . . MTH, YER. . . CTR}. The choice is
dictated by the relative frequency of expressions ex-
pressed at these granularity levels. The list obviously
does not cover all potential granularities, for exam-
ple: a day of the year and a fiscal year are missing,
but they appeared relatively rarely in analyzed docu-
ments. The index I of granule within granularity G is
computed as a number of granules between analyzed
granule and reference granule. The reference gran-
ule for granularity days is the first day of this era. For
other granularities, this is the granule that contains the
day with index 1 (DAY(1)).
This construction has two advantages. Firstly,
we do not lose semantics, whenautomatically shifting
granularity levels (during ,,a week” is not the same as
during six consecutive days that constitute this week).
Secondly, it is easy to compare expressions on differ-
ent granularity levels. For instance, in order to test
if MTH(i) ∩YER( j) ∈
/
0, the process is trivial, while
according to a definition of the calendar (Bettini et al.,
1998) both MTH and YER are defined as a derivative
granularities of granularity DAY.
The calendar is usedto encode values of document
temporal features. Following features have been de-
fined:
Temporal Expressions. Temporal expressions relate
directly to a model of time. All necessary infor-
mation required to qualify their values is embod-
ied in: the expression itself, the surrounding con-
text, and the time model. No external knowledge
is required. For instance ,,2007-01-02“, ,,tomor-
row“ or ,,before” are temporal expressions, but
,,during Great Depression“ is not one. Although,
the last expression points to some time period, it
requires knowledge at the beginning and ending
dates of this event, in order to precisely set the
time period.
Objects and Events. Objects and events posses tem-
poral features. They themselves do not have a
value specified by a time model but they exist in
time. For instance, an event may have an occur-
rence date and an object exists during some time
period.
Concepts. Concepts themselves, usually do not have
a meaning allowing to relate them to certain time
periods. We may assume, however, that conceptu-
alization layer is dynamic. The new concepts are
being created and some concepts lose popularity.
Moreover, the popularity of the concepts appear-
ing in documents change over time.
The last component used to characterize the in-
dexing model is a normalization process. The normal-
ization process sets values of temporal features in se-
lected time model. In case of calendar model, for each
temporal expression indices of granules and granular-
ity level need to be specified. The normalization pro-
cedure is partially independent from the other compo-
nents. It appears that more than one common normal-
ization approach for different temporal features often
exists, furthermore temporal feature may be normal-
ized using different approaches. We can distinguish
following normalization approaches:
Rules. For some categories of temporal fea-
tures, it is possible to define normalization
mechanism in terms of conditional statements
(IF.. .THEN...rules). This approach is espe-
cially useful in case of calendar expressions. For
example, if a reference date is ,,2000-01-01”
and a date to be normalized is ,,February“ and
from thenarrative context it appears that we speak
about future, then the year of the normalized date
should be set to the year of the reference date, i.e.
2000.
DB of States/Events. Above, we have used an ex-
ample of ,,Great Depression”. The normaliza-
tion of such an expression requires information at
the beginning and ending dates of this event. It
is possible to create a database of events/states,
which may be in turn used for indexing purposes.
The indexing model is certainly limited only to
events/states it has knowledge on.
Distribution of Concepts in Time. We have as-
sumed, that concepts used in text, or at least
their subset, including concepts used to describe
events and states are related to time. It is pos-
sible to build probabilistic model which defines
TEMPORAL INFORMATION INDEXING MODEL
389