2003) there are actually several types of ambiguity
involved in toponym resolution (i.e. associate a to-
ponym with its spatial representation). The use of
contextual elements (other than toponyms), such as
words that have a geographical denotation (”river”,
”town”, ”basin” , etc.), can be extremely important
in a toponym disambiguation task (Hollenstein and
Purves, 2010).
This paper focuses on the use and adaptation of
Natural Language Processing (NLP) techniques for
recognition of the spatial representation throughout
the document. To achieve this, we have collected a
set of newspaper articles (corpus of 3809 textual doc-
uments) about the Thau basin territory from 2010 up
to 2013. NLP methods based on lexico-syntactic pat-
terns (Gaio and Nguyen, 2011) were then used to au-
tomatically annotate linguistic expressions conveying
more or less complex spatial information. In the pro-
posed approach, SF appearing in a text, are composed
of at least one named-entity allowing a geolocation
and one or more spatial indicators specifying its lo-
cation (Lesbegueries et al., 2006). Once the SF ex-
tracted, the problem is to identify their spatial char-
acteristics in order to define a spatial representation
throughout the document.
The paper is structured as follows. In Section 2,
an overview of SF extraction methods is presented.
In Section 3, the method to identify SF representa-
tion is detailed. Section 4 gives a short description
of the corpus, reports experiments and lists associated
prospects. The paper ends with conclusions and some
perspectives.
2 RELATED WORK
NERC methods automatically annotate different
types of named entities: dates, people, organisa-
tions, themes, numeric values, as well as place names.
There is a significant number of systems available,
both proprietary and open source, such as OpenNLP
1
from Apache, OpenCalais
2
from Thomson Reuters,
and CasEN (Maurel et al., 2011). More specific meth-
ods that are solely concerned with geographical data
are known as geoparsing (Leidner and Lieberman,
2011). In our work, we focus on this category and
a first issue is to precisely identify named-entities al-
lowing a geolocation using the definition proposed in
(Lesbegueries et al., 2006).
In this model, SF can then be identified in two dif-
ferent ways:
1
https://opennlp.apache.org/
2
http://www.opencalais.com/
• an Absolute Spatial Feature (A SF) one Named-
Entity (NE) allowing a geolocation, such as:
< (spatialIndicator)
∗
, NE of Location>
A spatialIndicator is a term contained within a
geographic lexicon (“river”, “town”, “mountain”,
etc.). Two examples of this type of SF are the
”The Thau basin” and ”the town of S
`
ete”;
• a Relative Spatial Feature (R SF) one spatial re-
lationship (topological or Euclidean) with at least
one SF. An R SF, including one A SF at the end
of a pattern, is defined as:
< (spatialRelation)
1..∗
, A SF > or
< (spatialRelation)
1..∗
, R SF >.
Five spatial relation types are considered: orien-
tation (“in the south of”, etc.), distance (“20 kilo-
metres from”, etc.), adjacency (“near”, etc.) , in-
clusion (“in”, etc.) , and geometry which defines
the union or intersection linking two SFs (between
A and B, etc.). An example of this type of SF is
“in the area of Cuzco” according to the pattern
< (spatialRelation)
1..∗
, A SF >.
A second issue is related to the identification of
spatial representation for each SF. In this sub-domain,
first research works in the 90s have been focused
on the representation of complex qualitative relation-
ships such as orientation (Frank, 1991). The direc-
tion of a SF is defined taking as reference the position
of a second SF. To achieve this, the author proposes
the model with cones to represent the four cardinal
points: north, east, south, west. A second proposal
was to represent the orientation relationship using a
3x3 grid in which the central cell is called the ”neutral
zone”. Cells around represent eight cardinal points,
north, northeast, east on southeast, etc. In (Hern
´
andez
et al., 1995), the authors focus on the study of the
representation of qualitative distances (far, near, etc.)
and propose a representation model of for flexible dis-
tances at different levels of granularity. In (Cohn,
1996), a state-of-the-art of SF representation is drawn.
At first, the author focuses on the use of an ontol-
ogy of geographic objects and presents a study on
the extension of basic physical objects (points, lines,
etc..) to build more complex figures (regions , roads,
etc..). In a second step, the author takes into account
the topological relationships between SF and relations
(orientation, distance, size and shape of related ob-
jects). These representations, enabling to take into
account the underlying abstractions, are then used to
create complex qualitative models.
More recently, in (Davis, 2013), the authors rely
on a corpus of citations to identify the spatial relation-
ships between spatial objects. The authors describe
KDIR 2015 - 7th International Conference on Knowledge Discovery and Information Retrieval
302