ontology reflects the background knowledge used in
writing, reading and thinking (Brewster et al., 2007).
In fact a text tells the reader which ontology to use
to understand it (Brewster et al., 2007). The
background knowledge, taken for granted by the
author, is useful because can be used by a NLP
application in order to decide a particular word
sense.
Word Sense Disambiguation (hence WSD)
techniques use the notion of context in order to
decide a particular word sense. A context could
differ widely across WSD methods. One may
consider a whole text, a word window, a sentence or
some specific words (Xiaobin et al., 1995).
Such techniques are necessary to access a static
KB because the concepts are static objects; however
knowledge can then be used and developed by
reasoning. This approach comes from the Dynamic
Construal of Meaning (DCM) (Cruise, 2002)
approach, that we follow. The fundamental
assumption of DCM is that the meaning of a word
changes as it is used in different contexts or
language games (Sowa, 2010).
According to Chierchia (1997) we consider the
computation of meaning as a set of rules that
determine the reference of words. We consider
common names as classes, determiners as
restrictions on classes, entities as referents and verbs
as relations between entities and classes. This
scheme is compatible with the RDF structure and
can also serve as a bridge between natural language
and KBs. Our approach is also related to
Wittgenstein’s language games (Wittgenstein,
1953), in that we assume we need to use patterns of
words, to access an ontology. The RDF triples are
atomic facts with a simple semantic. The meaning of
each fact is the result of the meaning of three
components:
Classes: a class could be represented by a
common name. When we talk about presidents,
trees, cars, or carpenters, we are talking about
classes of entities.
Entities: we intend an entity as his reference. To
access an entity we use his label and the
disambiguation is done by one or more classes to
which the entity belongs.
Properties: are simple or complex relations
between entities, classes and literal. We need to
disambiguate a property and get contextual
information from it.
With our approach, we want to extract information
about the meaning of text. Particularly we want to
understand what specific entities are mentioned in
the text. To do this we use IE techniques to identify
the named entities. We can use their names as labels
to access a KB in order to get all the information
regarding the entities. But as we noted above the
same label could refer to several entities. The
solution is to use contextual information. For
instance, in the following example taken from the
RTE5 challenge dataset:
Proper Name + Definite Expression
(CNN) -- Malawians are rallying behind
Madonna as she awaits a ruling Friday on
whether she can adopt a girl from the southern
African nation. The pop star
, who has three
children, adopted a son from Malawi in 2006.
She is seeking to adopt Chifundo “Mercy”
James, 4. “Ninety-nine percent of the people
calling in are saying, let her take the baby,” said
Marilyn Segula, a presenter at Capital FM,
which broadcasts in at least five cities, including
the capital, Lilongwe.
when we find an ambiguous entity (the pop start) we
look for information that could disambiguate it. In
this case, the singular definite expression “the pop
star” is used to specify the entity Madonna. The
definite expression consists of a determiner and a
common noun that in our approach correspond to a
class. At this point we have to establish which class
could be associated with the noun found. This step
corresponds to a WSD procedure, which serve as a
bridge between natural language and KB. This
approach is particularly useful in coreference
resolution task where we have an identical name but
different properties. In this way, coreference
resolution is performed in parallel with entity
identification. Consider another example below,
with a text taken from the same RTE5 dataset:
Definite Expression + Proper Name
The eruption happened at around 1:30 PM local
time, the United States Geological Survey
reported. The volcano had erupted four times on
Friday, billowing ash up to 51,000 feet up into
the air. These are the latest in a series of
eruptions
from Mount Redoubt, which started
on March 22. The volcano had not erupted since
a four-month period in 1989-90. The Alaska
Volcano Observatory set its alert level at red,
the highest possible level, meaning that an
eruption
is imminent, and that it would send a
significant emission of volcanic ash into the
atmosphere.
In this example the name “Mount Redoubt” could
refer to different entities:
Mount Redoubt (Alaska) in Alaska, United States
KEOD 2011 - International Conference on Knowledge Engineering and Ontology Development
8