information resource. In this case, the image
fragment is consistent with a scale of 1:25,000 and
has a bounding rectangle defined by the pair of
coordinates ((43°15’W, 22° 52’ 30”S), (43° 07’
30”W, 23°S)).
Assume that the user chooses to relate the image
fragment with “hydrographic features”, a term of
the ADL FTT that, in our running example, can be
used to classify geographic datasets.
Since the ADL Gazetteer entries have no
associated scale information, ignore it.
Access the ADL Gazetteer, using the parameters
extracted in Step 1 and the ADL FTT terms under
“hydrographic features”, the term selected in Step
2. The query returns 9 entries, among which the first
3 are:
a. Feature(“Rodrigo de Freitas, Lagoa -
Brazil”, lakes, within)
b. Feature(“Comprido, Rio – Brazil”,
streams, within)
c. Feature(“Maracana, Rio – Brazil,
streams, within)
Store the result of the query as a description of
the information resource, that is, as a list of pairs
(N,r), where N is a geographic feature returned in
Step 4 and r is the topological relationship between
the image and N (in this case, r is “within”).
This brief example illustrates some of the basic
ideas of the paper. First, the use of the gazetteer
thesaurus to also classify geographic datasets
precludes the adoption of a second classification
scheme, such as the ISO19115 Topic Categories
(ISO-19115, 2002). This approach simplifies
mediating access to multiple catalogs and gazetteers,
as discussed in Section 4. However, it requires
defining the compatibility function s:
T[GA]→R
*
.
Second, a useful description of a geographic
information resource R can be created as a list of
pairs (N,r), where N is a geographic feature and r is
the topological relationship between R and N,
obtained by querying the gazetteer. In addition, the
list contains only features whose type is compatible
with the scale of R.
4 FEDERATED ARCHITECTURE
The discussion in Section 3 assumed a single
gazetteer, with a geographic feature type thesaurus,
and a single catalog. However, as pointed out in the
introduction, a federation of gazetteers and
geographic metadata catalogs, supported by
mediator, is a more realistic architecture. Such
mediators will need a tool to align different gazetteer
thesauri that, according to the discussion in Section
3, are used to classify both gazetteer and catalog
entries.
More precisely, let G and H be two gazetteers.
Assume that they classify features using two
thesauri, T and U, respectively. Suppose that we
adopt the schema of G as the mediated schema, but
we allow changing T to accommodate terms in U
that have no counterpart in T .
This means defining a function reclass:U→V that
maps terms in U into terms of a new thesaurus V,
created from T and U. If reclass(t)=u then we say
that t is the reclassification of u. In the rest of this
section, we only analyze two cases of this sub-
problem, for reasons of brevity.
Suppose first that G and H contain entries that
represent disjoint sets of features, and that T and U
represent disjoint sets of concepts. Albeit simple,
this is a common scenario.
We first graft U into T, using a term p of T as
pivot, that is, we add the root u of U as a new narrow
term of a p. This operation creates a new thesaurus,
denoted T[p,U]. Now, when the mediator accesses
entries in H, it will not change their type, that is,
reclass:U→T[p,U] is the identity function.
However, note that the grafting operation requires
user intervention, since there is no failsafe way to
automatically identify p by observing just the terms
in T and U, and their definitions.
For example, let H be the list of real state assets
of a large company, classified according to a
thesaurus U. Assume that the company operates in
Brazil, Venezuela and Argentine. Suppose that G is
a copy of the ADL Gazetteer, restricted to these
three countries. Then, to access the company’s assets
in H, using the ADL Gazetteer schema, we first add
the root of U as a new narrow term of “manmade
features”, a term of the ADL Feature Type
Thesaurus (FTT), on the grounds that the company’s
assets are neither listed in G, nor they can be
classified with the terms found in the ADL FTT. The
result of the alignment process will be a thesaurus
that contains the ADL Feature Type Thesaurus
entries plus the entries in U.
Suppose now that G and H represent non-
disjoint sets of features, and that they have thesauri
that represent non-disjoint sets of concepts. This is a
complex, but not uncommon scenario, which occurs
when the mediator wants to access both G and H.
For brevity, we consider only the case where T,
the thesaurus of G, will remain unchanged, which
means that the range of reclass is T. We discuss how
to use a gazetteer sampling technique that takes
advantage of geo-referencing to avoid the pitfalls of
syntactical alignment.
We first define a relationship Ident ⊆ G×H such
that, for any (E,F)∈G×H, we have that (E,F)∈Ident
iff E and F denote the same (real-world) feature.
ICEIS 2006 - DATABASES AND INFORMATION SYSTEMS INTEGRATION
218