Section 3 discuss the methodology applied to
construct the user experiment. Finally in Section 4,
we report the findings. We end the paper with
conclusions and thoughts for future work.
2 RELATED WORK
Uncertainty is one of the challenges in information
seeking and retrieval (Chowdhury et al., 2011). Many
attempts have been done in developing uncertainty
model by investigating human information behaviour
in information seeking and retrieval process
(Ingwersen, 1992). There are few work that proposed
natural language processing technique such as from
syntactical and semantic approach to reduce
uncertainty (Goodman, 2008; Topka, 2013).
Several linguistic research aim at modelling the
use of modality, but very few concentrate on
uncertainty, for instance the Certainty Categorization
Model proposed by Rubin (2006). This model was
based on four dimension; Level, Perspective, Focus
and Time to characterize uncertainty. For level
dimension, they considered the words such as ‘might
buy’ and ‘will come’ to be classified as Absolute level
or Low level. Meanwhile in Perspective level, they
analysed on how the sentences are reported from
writer’s point of view. Focus dimension differentiated
between Abstract and Factual information. Finally for
Time dimension, they analysed the sentences based
on past, present and future time. Then Goujon (2009)
enhanced the Certainty Categorization Model
proposed. The enhanced model includes the
identification of the local source, which was
important to the end user in validating the reliability
of the reported discourse. It also takes into account
the reality and unreality of an information which was
specified in the source text, rather than the Focus
dimension. Thus the enhanced dimensions consist of
five; Level, Perspective, Time, Reality and Source
Name to characterize uncertainty.
There are also few work in measuring uncertainty
in message. Mishel (1988) has introduced forms of
uncertainty (ambiguity, complexity, volume of
information and unpredictability) and Babrow (1998)
dimensions of uncertainty were combined to form
five forms of dimensions of uncertainty in messages.
Instances of uncertainty related content within a
message are such as message characteristic (specific
words, phrases or sentences). Then Hurley (2011)
enhanced the dimension of uncertainty into five
dimensions: too little information (volume), too much
information (volume), complex information,
ambiguous information and conflicting information.
These five forms of uncertainty in messages was
easily identified in news article and been
implemented in cancer news article.
In the context of TDT research, researchers have
attempted to build better document models,
developing similarity metrics or better document
representations (Chen and Ku, 2002). This has led to
a series of research efforts that concentrate on
improving document representation by applying
Named Entity Recognition (Chen and Ku, 2002).
Mohd and Mabrook (2014) investigated the potential
of named entities in TDT tasks and they discovered
that NEs has improved both tasks. However there is
no work has evaluate the role of NEs for uncertainty
recognition in event detection task. This is the first
work that explored the five dimensions of uncertainty
in TDT.
3 METHOD
There are two approaches in this work. First we
analysed the distribution of named entities (NEs)
across topics (Section 3.2) and secondly we
conducted a user experiment (Section 3.3 - 3.4) to
explore the potential of named entities for uncertainty
recognition in event detection task.
3.1 Dataset
We used 300 news documents from Topic Detection
and Tracking (TDT) corpus. There are 2 categories
(Politics and Sports) with 10 topics and 50 events
occurred as shown in Table 1. On average, there are
5 events and 30 documents/story per topic. In TDT, a
topic consist of several events and an event consist of
several stories or documents.
Table 1: Topics and events for Politics and Sports
categories.
Topic: [P1] Current Conflict with Iraq (20015)
Event
Current Conflict with Iraq
Iraq announces it will block inspections
Iraq prevents inspection team from entering
Reaction to blocked inspection team
Inspection team withdrawn
Hussein may stop cooperating with inspections
Topic: [P2] Clinton-Jiang Debate (20096)
Event
Plans, preparations for Clinton's trip to China
Clinton leaves for China
Clinton's activities in China
Freedom of worship for Chinese citizens
Reaction to Clinton's trip