Implementing a Semantic Catalogue of Geospatial Data

Helbert Arenas, Benjamin Harbelot and Christophe Cruz

Laboratoire Le2i, UMR-6302 CNRS,D

epartement d’Informatique , Universit

e de Bourgogne,

7 Boulevard Docteur Petitjean, 21078 Dijon, France

Keywords:

CSW, OGC, Triplestore, Metadata.

Abstract:

Complex spatial analysis requires the combination of heterogeneous datasets. However the identiﬁcation of

a dataset of interest is not a trivial task. Users need to review metadata records in order to select the most

suitable datasets. We propose the implementation of a system for metadata management based on semantic

web technologies. Our implementation helps the user with the selection task. In this paper, we present a

CSW that uses a triplestore as its metadata repository. We implement a translator between Filter Encoding and

SPARQL/GeoSPARQL in order to comply to basic OGC standards. Our results are promising however, this

is a novel ﬁeld with room for improvement.

1 INTRODUCTION

There is a growing interest in the development of the

SDI (Spatial Data Infrastructure), a term that refers

to the sharing of information and resources between

different institutions.The term was ﬁrst used by the

United States National Research Council in 1993. It

refers to the set of technologies, policies and agree-

ments designed to allow the communication between

spatial data providers and users (ESRI, 2010).

Currently vast amounts of information are be-

ing deployed in the internet through web services.

However, in order to proﬁt of this information, po-

tential users need to ﬁrst identify relevant and suit-

able datasets. Later, researchers and decision makers

would be able to implement smart queries. This is

a term ﬁrst employed by Goodwin (2005). It refers

to the combination of heterogeneous data sources in

order to solve complex problems (Goodwin, 2005).

In the spatial domain this has been possible,

thanks, in a signiﬁcant part to the standardization ef-

forts by OGC (Open Geospatial Consortium). OGC is

an international industry and academic group whose

goal is to develop open standards that enable com-

munication between heterogeneous systems (OGC,

2012). The tasks in which OGC is interested are:

publishing, ﬁnding and binding spatial information.

OGC provides standards that allow data providers and

users to communicate using a common language. The

information is offered through web services such as

WFS (Web Feature Service), WMS (Web Map Ser-

vice) or SOS (Sensor Observation Service). However,

in order to identify a dataset of interest the user needs

ﬁrst to identify it, using a catalog service. The OGC

standard for catalog services is CSW (Catalogue Ser-

vice for the Web).

OGC deﬁnes the interfaces and operations to

query metadata records. There are both commer-

cial and opensource/freeware CSW implementations.

Among the commercials we can ﬁnd ESRI ArcGIS

server and MapInfo Manager. Among the open-

source implementations we ﬁnd Constellation, De-

gree and GeoNetworkCSW. The OGC standards do

not indicate speciﬁc software components. In the

case of CSW, developers are able to select the

metadata repository more suitable to their prefer-

ences/requirements. However, the OGC CSW stan-

dard deﬁnes operations, requests and metadata for-

mats that should be supported. For instance, queries

submitted to a CSW should be formatted as Filter En-

coding or as CQL. The former is a XML encoded

query language, while the later is a human readable

text encoded query language (OSGeo, 2012)(Vre-

tanos, 2005).

Most common implementations of CSW use a

relational database as the metadata records reposi-

tory. For instance, GeoNetwork currently one of

the most popular CSW implementations, uses by de-

fault a McKoiDB relational database, although it can

connect to MySQL, PostGreSQL and other RDBMS

(Dunne et al., 2012). Because of the nature of the

metadata repository, currently queries are performed

152

Arenas H., Harbelot B. and Cruz C..

Implementing a Semantic Catalogue of Geospatial Data.

DOI: 10.5220/0004820101520159

In Proceedings of the 10th International Conference on Web Information Systems and Technologies (WEBIST-2014), pages 152-159

ISBN: 978-989-758-023-9

 2014 SCITEPRESS (Science and Technology Publications, Lda.)

by matching strings to selected metadata elements.

In this paper we propose the use of semantic web

technologies to store and query metadata records. By

using these technologies we are able to take advantage

of inference and reasoning mechanisms not available

on relational databases. In Section 2 we review re-

search conducted by other teams in the same ﬁeld. In

Section 3 we describe how we have implemented our

model. Finally, in Section 4 we present our conclu-

sions and outline future research.

2 RELATED RESEARCH

Spatial information is offered by different providers

through web services that implement standards such

as WCS, SOS, WFS or WMS. The deployed services

might have heterogeneous characteristics regarding

software components, languages, or providers. How-

ever, by implementing OGC standards all of them

have common request and response contents, param-

eters and encodings. These common elements allow

a user to access different services using a proven, safe

strategy . In order to allow datasets to be discover-

able, they have to be published in a catalogue service

that implements the CSW standard. The metadata for

the datasets is obtained by the catalogue service with a

harvest operation. The user in order to discover a spe-

ciﬁc dataset, submits a query. The server processes

the query using a string matching process, and sends

a response to the user. Once the user has identiﬁed

the relevant dataset, she is able to obtain the data and

perform a speciﬁc analysis.

The string matching process is a major limitation

in the current SDI. This limitation has been previously

identiﬁed by other researchers. In (Kammersell and

Dean, 2007) the authors aim to integrate heteroge-

neous datasources. In this research the authors pro-

pose the creation of a layer that translates the users

query formulated in OWL into WFS XML request

format. Later, they propose do the inverse process

with the results. Another approach is proposed by

(Kolas et al., 2005). Here the authors propose the

implementation of ﬁve different ontologies: 1) Base

Geospatial Ontology for basic geospatial concepts re-

sulting from the conversion of GML schemas into

OWL. 2) Domain Ontology, this is the user‘s ontol-

ogy. Its purpose is to link user‘s concepts to the Base

Geospatial Ontology. 3) Geospatial Service Ontol-

ogy, used to describe services and allow discovery. 4)

Geospatial Filter Ontology, which is used to formal-

ize ﬁlter description and use. 5) Feature Data Source

Ontology, to represent the characteristics of the fea-

tures returned from the WFS. Another approach is de-

scribed by (Harbelot et al., 2013), here the authors

suggest the integration of data from OGC services

into a triplestore with a focus on the WFS ﬁlters. In

(Janowicz et al., 2010) (Janowicz et al., 2012), the

authors propose the addition of semantic annotations

for each level of a geospatial semantic chain process

that involves OGC services. For instance, they pro-

pose speciﬁc semantic annotations at the level of the

service OGC Capabilities document that would cor-

respond to all the datasets managed by the service.

Other annotations would correspond to speciﬁc data

layers. Spatial Data with semantic annotations could

later be processed and semantically analysed using

custom made reasoning services. To achieve this goal

they propose the deployment of OGC services ca-

pable of interacting with libraries such as Sapience

which would result in richer data and data descrip-

tions. However there is little development in this di-

rection. At the moment there is little use of semantic

annotations on OGC capabilities documents.

In (Gwenzi, 2010) the author describes the CSW

limitations by evaluating GeoNetwork. According to

the author there are three possible ways to add se-

mantic capabilities to the CSW: 1) Linking keywords

to concepts in the getCapabilities response. 2) By

adding an ontology browser to the GeoNetwork client

interface. 3) Using ebRIM extensions to add ontolo-

gies to the CSW. In (Gwenzi, 2010) the author imple-

ments the third option.

Another experience is presented by Yue et al.

(2006). In this work the authors extend the ebRIM

CSW speciﬁcation by: 1) Adding new classes based

to existing ebRIM classes; and 2) Adding Slots to ex-

isting classes, thus creating new attributes. With these

additions the authors are able to store richer metadata

records in the catalogue. The authors identiﬁed two

options to implement an upgraded search function-

ality: 1) Create an external component without fur-

ther modiﬁcation of the CSW schemas; 2) Modify the

CSW adding semantic functionalities to the existing

CSW schemas. In (Yue et al., 2006) the authors opted

for the ﬁrst option. Yue et al. (2011) extends this

work, focusing on geoservices (Yue et al., 2011).

A different approach is used by (Lopez-Pellicer

et al., 2010). In this research the goal is to provide ac-

cess to data stored in CSW as Linked Data. In order

to achieve this goal the authors developed CSW2LD,

a middle layer on top of a conventional CSW based

server. It allows the server to mimic other Linked

Data sources and publish metadata records. CSW2LD

wraps the following CSW requests: GetCapabilities,

GetRecords and GetRecordById.

A very interesting work in progress is described in

(Pigot, 2012). This is a website describing a proposal

ImplementingaSemanticCatalogueofGeospatialData

153

by a team from the GeoNetwork developer commu-

nity. The authors intend to perform a major change

in GeoNetwork, allowing it to store metadata as RDF

facts stored in a RDF repository. They intent to use

SPARQL/GeoSPARQL to retrieve data. The website

describe technical characteristics of GeoNetwork and

mentions ﬁelds that require work in order to imple-

ment the project. Currently queries in GeoNetwork

are formatted as Filter Encoding or as CQL. Any im-

plementation of a RDF metadata repository would

need to consider a translation mechanism between the

current queries format to SPARQL (a W3C recom-

mendation) (DuCharme, 2011). Currently GeoNet-

work handles spatial constraints using GeoTools. In

the semantic web domain, spatial queries are per-

formed using GeoSPARQL (Kolas and Batle, 2012).

According to the authors it is not clear if GeoSPARQL

is mature enough to handle metadata spatial queries.

Even more there is no mechanism to translate spatial

constraints into GeoSPARQL. Regardless of the ad-

vantages that semantic web technologies might bring

into CSWs there is scarce research on this topic. By

the time we wrote this paper, there was no further de-

velopment in (Pigot, 2012) and the website was last

updated by the end of October of 2012.

3 IMPLEMENTATION

A regular implementation of OGC CSW, works as a

web service that communicates with a data repository

that stores metadata records. According to the OGC

standard, the catalog should accept requests formatted

as Filter Encoding, which is a XML based language,

designed to express queries. The web service, trans-

lates these queries into a suitable format, such that it

can communicate with its data repository. In most of

the implementations of CSW, the data repository is

relational database.

In this paper, we present a proof of concept imple-

mentation, designed to show the beneﬁts of using on-

tologies in a geospatial data catalog service. We have

developed a minimalistic implementation of the CSW

standard. A major difference between our system and

traditional implementations is that we use a triplestore

as our metadata repository. We opted for a Parliament

triplestore, because of its spatial capabilities thanks

to its support for GeoSPARQL. We developed an on-

tology in the triplestore, and mapped the metadata

records to instances of classes speciﬁed in the ontol-

ogy. Thanks to this, it is possible to use superclass -

subclass relationships in the metadata search process.

In traditional CSW implementations, a spatial

search uses the values of the bounding box of the

Figure 1: Architecture implementation.

dataset, as speciﬁed in the metadata record. However,

the number of users with an understanding of coor-

dinate systems, good enough to allow them to search

datasets using only the bounding box coordinate val-

ues is quite limited. In our approach, we implement

an ontology class called ToponymUnit. Instances of

this class are geographic features with labels famil-

iar to the users. In our implementation, the user can

search for metadata records whose bounding box has

speciﬁc spatial relations with instances of the class

ToponymUnit. Using this approach, users can sub-

mit queries such as: retrieve metadata records that are

within the toponym unit known as France.

Our current implementation is able to re-

spond to standard GetRecords requests submitted

as POST. The response of our system follows the

csw:SummaryRecord format. A crucial part of the im-

plementation is the translation of requests from Filter

Encoding to SPARQL/GeoSPARQL.

Figure 1 depicts the processes in the proposed sys-

tem. In the next subsections we further describe how

we obtain the information necessary to construct the

metadata records, how we map this information to an

ontology, and ﬁnally how we perform queries.

3.1 Harvesting Metadata

We focus our research on metadata records for

datasets available through services that implement the

OGC Web Feature Service (WFS) standard. A ser-

vice that implements WFS can contain one or many

datasets. We identify the datasets available on a WFS

and create a metadata record for each one of them.

The metadata record is stored as an instance of the

class abc:MetadataRecord .

In our ontology we implement a class called

geo:Feature that represent features with spatial repre-

sentation. Although, a metadata record is an abstract

description, it does have a spatial component repre-

sented by the bounding box of the dataset it describes.

Due to the spatial nature of the metadata records, we

deﬁne the class abc:MetadataRecord as a subclass of

geo:Feature (See Figure 2).

In our ontology we implemented Dublin Core

elements, as properties for instances of the class

WEBIST2014-InternationalConferenceonWebInformationSystemsandTechnologies

154

Figure 2: Classes, instances and relationships in the pro-

posed model.

Figure 3: Properties of instances of the class

abc:MetadataRecord

abc:MetadataRecord. We have developed a har-

vest tool that queries the WFS and constructs triples

with the responses. Our tool is a Java application

that makes use of two requests that are part of the

core OGC WFS standard: GetCapabilities and De-

scribeFeatureType. From the GetCapabilities request,

we obtain a general description of the catalog ser-

vice and information regarding the available datasets

on it. With the DescribeFeatureType request, we

can obtain the list of attributes for the actual fea-

tures that compose the dataset. We have tested our

tool with 17 services that implement WFS, having

as a result 2690 metadata records. Figure 3 depicts

the Dublin Core elements as properties of the class

abc:MetadataRecord.

The mapping process between the WFS request

responses and ontology properties is done with our

harvest tool. The WFS respond to requests with a

XML document. Our harvest tool, navigates through

the response document, identiﬁes relevant elements

and maps them to speciﬁc properties in our ontology

(See Figures 1 and 3).

For instance, the following code is part of a Get-

Capabilities response containing information regard-

ing the WFS publishing entity.

<ows:ServiceProvider>

<ows:ProviderName>

Provider X

</ows:ProviderName>

<ows:ServiceContact>

<ows:IndividualName>Helbert<ows:IndividualName/>

<ows:PositionName>GIS Manager<ows:PositionName/>

</ows:ServiceContact>

</ows:ServiceProvider>

This information is mapped into the dc:publisher

Dublin Core element (See Figure 3).

abc:md1 a abc:MetadataRecord .

abc:md1 dc:publisher _:B01 .

_:B01 abc:ProviderName "Provider X" .

_:B01 abc:PositionName "GIS Manager" .

_:B01 abc:IndividualName "Helbert" .

The following XML code, is another part of a Get-

Capabilities response.

<FeatureType xmlns:example=

"http://www.example-provider.org/example">

<Name>example:name</Name>

<Title>Example dataset title</Title>

<Abstract>Example abstract</Abstract>

<ows:Keywords>

<ows:Keyword>example keyword1</ows:Keyword>

</ows:Keywords>

urn:x-ogc:def:crs:EPSG:4326

</DefaultSRS>

<ows:WGS84BoundingBox>

<ows:LowerCorner>-5.84 37.75</ows:LowerCorner>

<ows:UpperCorner>11.02 54.63</ows:UpperCorner>

</ows:WGS84BoundingBox>

</FeatureType>

From this segment of the response, we can obtain

information for: dc:title, dc:subject, dc:description

and ows:BoundingBox.

abc:md1 a abc:MetadataRecord .

abc:md1 dc:title "Example dataset title" .

abc:md1 dc:subject _:B02 .

_:B02 abc:keyword "example keyword1" .

abc:md1 dc:description _:B03 .

_:B03 abc:abstract "Example abstract" .

_:B03 abc:defaultSRS "EPSG:4326" .

abc:md1 geo:hasGeometry _:B04 .

_:B04 geo:asWKT "POLYGON((-5.84 37.75,

-5.84 54.63, 11.02 54.63, 11.02 37.75,

-5.84 37.75))"ˆˆsf:wktLiteral .

ImplementingaSemanticCatalogueofGeospatialData

155

Additionally, our harvesting tool, submits a De-

scribeFeatureType request for each layer of informa-

tion found in the WFS. From the response we are able

to obtain a list of attributes for the spatial features

that compose the dataset. The following XML code

depicts part of a response to a DescribeFeatureType

request.

<xsd:complexType name="country_boundsType">

<xsd:complexContent>

<xsd:extension base="gml:AbstractFeatureType">

<xsd:sequence>

<xsd:element maxOccurs="1" minOccurs="0"

name="THE_GEOM" nillable="true"

type="gml:MultiSurfacePropertyType"/>

<xsd:element maxOccurs="1" minOccurs="0"

name="AREA" nillable="true"

type="xsd:double"/>

</xsd:sequence>

</xsd:extension>

</xsd:complexContent>

</xsd:complexType>

Our harvesting tool, browses through the re-

sponse, identiﬁes relevant elements and create

an attribute list within the dublin core element

dc:description , and creates the following triples:

abc:md1 a abc:MetadataRecord .

abc:md1 dc:description _:B03 .

_:B03 abc:attributeList _:B05 .

_:B05 abc:attribute _:B06 .

_:B06 abc:attributeName "THE_GEOM" .

_:B06 abc:attributeDataType

gml:SurfacePropertyType .

_:B05 abc:attribute B07 .

_:B07 abc:attributeName "AREA" .

_:B07 abc:attributeDataType xsd:double .

3.2 Link to Concept Classiﬁcation

By using semantic web technologies we are not lim-

ited to string matching queries. We can also use

inference mechanisms based on subsuming and es-

tablished relationships between terminology and con-

cepts. To test these capabilities, we have implemented

a taxonomy with domain ontology classes. The re-

lationships between these concepts are of the type

subclassOf. Individual datasets are represented as in-

stances of the domain ontology classes. Each dataset

is described by a metadata record, which is an in-

stance of the class abc:MetadataRecord . The link

between these two instances is given by the prop-

erty abc:hasDescription (See Figure 2). The follow-

ing triples depict the relation between an instance of

abc:MetadataRecord and an instance of the domain

ontology class abc:Political.

abc:d1 a abc:Political .

abc:md1 a abc:MetadataRecord .

abc:d1 abc:hasDescription abc:md1 .

Figure 2 depicts our proposed domain ontology.

Using the domain ontology we can make inferences

regarding the class membership. From the previous

example, because abc : Political is a subclass of abc :

Boundaries and abc : Spatial, we can infer that abc :

md1 is also a member of these two classes.

The goal of this paper is to show potential uses of

semantic web technologies for metadata record man-

agement. At the moment our focus is not on the auto-

matic classiﬁcation of datasets within a domain ontol-

ogy. This task can be achieved with a variety of meth-

ods for instance: Naive Bayes, Decision Rules, Neu-

ral Networks, among others. For this experiment, we

decided to create an instance of the class abc : Spatial

for each metadata record, and later add further spec-

iﬁcation of the instance randomly, only in order to

test the system query capabilities. Our future devel-

opment plans include the implementation of sophis-

ticated methods for the dataset classiﬁcation. An in-

teresting work in this ﬁeld, although not in the spatial

domain is (Werner et al., 2012).

3.3 Toponym Elements

In order to facilitate the identiﬁcation of suitable spa-

tial datasets we deﬁne the class abc:ToponymUnit

as a subclass of geo:Feature . Instances of the

class abc:ToponymUnit are geographic features with

known, accepted names. Our system enables users to

integrate into their metadata queries spatial relations

between the spatial components of metadata records

(bounding box) and toponym instances (See Figure

2).

To populate the class abc:ToponymUnit, we use

a country political boundaries dataset from Esri and

DeLorme Publishing Company, Inc. under a Creative

Commons Attribution-Noncommercial-Share Alike

3.0 United States License (ESRI, 2011). The

dataset is a shapeﬁle with 668 multipolygon fea-

tures. Our goal, is to create instances of the class

abc:ToponymUnit, each instance representing an area

with a known political designation.

However, before translating the political bound-

aries dataset into triples, it was necessary to perform

the following steps: 1) Convert the multipolygon fea-

tures to polygon ones. 2) Delete polygons that we

considered too small for practical purposes. 3) Sim-

plify the remaining polygons by reducing the num-

ber of vertices. 4) Translate the political boundaries

dataset into instances and triples using a customized

Java program, implemented with Jena and GeoTools

libraries. and ﬁnally 5) Upload the triples into our

triplestore. The pre-loading processing was done us-

ing Quantum GIS and GeoTools. The ﬁnal result is

WEBIST2014-InternationalConferenceonWebInformationSystemsandTechnologies

156

Figure 4: HTML user interface: Deﬁning a constraint

using a spatial relationship with an instance of th class

abc:ToponymUnit

3037 instances of the class abc:ToponymUnit.

3.4 Metadata Query

In order to test our implementation, we need to en-

able users to submit queries to the catalog service

and browse through the query results. This process

is composed by the following four sub-processes :

1) Deﬁnition of the query. 2) Mapping the user re-

quest into a suitable metadata repository query for-

mat. 3) Mapping the triplestore response into an al-

lowed OGC standard. 4) The visualization of the re-

sults in the user interface (See Figure 1).

3.4.1 Deﬁnition of the Query

The OGC standard query language for services im-

plementing CSW is called Filter Encoding. This is

a language based on XML . Queries with Filter En-

coding might become long XML documents that re-

quire a strict syntax, therefore it is necessary a soft-

ware application that helps the user to compose them.

In our system this task is done in a user interface de-

ployed as a web page. It uses a combination of HTML

and JavaScript, to enable users to compose complex

queries.

In order to help the user to compose queries, the

web site requests from the Server: 1) The list of do-

main ontology classes, 2) The list of labels associated

to instances of abc : ToponymUnit. It uses this infor-

mation to populate combo boxes, allowing the user to

compose constraints (See Figure 4).

The JavaScript application receives the user input

and formats it as a XML query document following

the Filter Encoding speciﬁcation. The application al-

lows multiple constraints to be linked using the oper-

ators AND and OR. When the query is completed the

user submits it as a POST. The following XML code

represents a query as formatted by the JavaScript run-

ning on the website.

<ValueReference>dc:title</ValueReference>

<Literal>water</Literal>

</PropertyIsLike>

<DescribesInstanceOf>abc:Harvest

</DescribesInstanceOf>

<ValueReference>BoundingBox</ValueReference>

<ToponymUnit>Netherlands</ToponymUnit>

</sfIntersects>

</And></Filter></Constraint>

</Query></GetRecords>

3.4.2 From Filter Encoding to

SPARQL/GeoSPARQL

We are using as our metadata repository a triplestore,

therefore it is necessary to translate request from Fil-

ter Encoding to SPARQL/GeoSPARQL. This task is

accomplished by our Servlet application. Once the

XML query arrives, the application proceeds to de-

compose it, into its constituent constraints.

In a SPARQL query, we distinguish three compo-

nents. 1) The speciﬁc elements or nodes we are re-

questing; 2) A set of triples that deﬁne a pattern the

SPARQL engine is going to look for; and 3) The ﬁl-

ter component, where we deﬁne a set of boolean value

conditions for the triples that match the previously de-

ﬁned pattern.

Our servlet application translates each constraint

separately into the respective set of triples pattern and

ﬁlter conditions. Using the interface the user is able

to deﬁne three types of constraints:

1. Alphanumeric attributes in the metadata record:

The user can select one attribute in the metadata

record, and perform a string matching, using the

operators PropertyIsEqualTo and PropertyIsLike.

In the later case the SPARQL implementation will

require the deﬁnition of a ﬁlter component:

<ValueReference>dc:title</ValueReference>

<Literal>water</Literal>

</PropertyIsLike>

would be translated to the triple pattern:

?md a abc:MetadataRecord.

?md dc:title ?xTitle.

with the ﬁlter component:

(regex(?xTitle,"water","i"))

2. Domain ontology class membership: Each meta-

data record describes an entity with a class mem-

bership. This type of constraint allows the user to

identify the class membership of the entity. For

example, the constraint:

<DescribesInstanceOf>abc:Harvest

</DescribesInstanceOf>

Is translated as the SPARQL triples:

?md a abc:MetadataRecord.

?ds abc:hasDescription ?md.

?ds a abc:Harvest.

ImplementingaSemanticCatalogueofGeospatialData

157

3. Using toponym elements and spatial relation-

ships: In this case the user identiﬁes a toponym

of interest, then she deﬁnes a spatial relationship

between the bounding box of the metadata record

and the geometry of the selected toponym. We im-

plement the spatial operators Disjoint,Intersects,

Contains and Within (See Figure 4). For exam-

ple, the following XML extract, indicates that the

metadata records bounding box should intersect

the geometry of Netherlands:

<ValueReference>BoundingBox</ValueReference>

<ToponymUnit>Netherlands</ToponymUnit>

</sfIntersects>

The constraint is translated as the following triple

pattern:

?md a abc:MetadataRecord.

?md geo:hasGeometry ?boundingbox.

?boundingbox geo:asWKT ?boundingbox_wkt.

?topoUnit a abc:ToponymUnit.

?topoUnit abc:CountryName "Netherlands".

?topoUnit geo:hasGeometry ?topoGeo.

?topoGeo geo:asWKT ?topoWKT.

plus the additional ﬁlter component:

(geof:sfIntersects

(?boundingbox_wkt,?topoWKT))

Once all the constraints have been translated, the

triples and ﬁlter elements are merged into a syntacti-

cally correct SPARQL/GeoSPARQL query, which is

submitted to the triplestore. The following code, de-

picts the SPARQL query resulting from combining all

the triple patterns and ﬁlter components from the pre-

vious examples:

SELECT DISTINCT ?md ?xTitle

Where

{?md a abc:MetadataRecord.

?md dc:title ?xTitle.

?md geo:hasGeometry ?boundingbox.

?boundingbox geo:asWKT ?boundingbox_wkt.

?ds abc:hasDescription ?md.

?ds a abc:Harvest.

?topoUnit a abc:ToponymUnit.

?topoUnit abc:CountryName "Netherlands".

?topoUnit geo:hasGeometry ?topoGeo.

?topoGeo geo:asWKT ?topoWKT.

FILTER((regex(?xTitle,"water","i"))&&

(geof:sfIntersects(?boundingbox_wkt,?topoWKT)))}

The response from the triplestore is then formatted

by the servlet as csw:SummaryRecord and sent to the

client website. The results are visualized in the web-

site allowing the user to examine the metadata records

(See Figure 5).

3.5 Smart Queries

A smart query requires the combination of diverse

datasources as described in (Goodwin, 2005). How-

ever, ﬁrst the researcher must be able to identify the

Figure 5: HTML user interface showing the results of the

example query.

most suitable dataset for the analysis. Our implemen-

tation aims to help users in this task. By using a do-

main ontology, we improve the user’s query capabili-

ties. Our use of toponyms, allows the user to select ar-

eas of interest by name, and establish speciﬁc spatial

relationships with the dataset of interest. The actual

features of the dataset can later be obtained using the

value of abc:GetFeaturesURL, in the metadata record.

4 CONCLUSIONS

In this work we present a simpliﬁed CSW imple-

mentation with a triplestore as a metadata reposi-

tory. Our implementation has a working transla-

tor that is able to convert Filter Encoding queries

into SPARQL/GeoSPARQL ones. The system allows

complex queries that can take advantage of inference

mechanisms provided by Semantic Web technologies.

At this point, our system only uses inference based

on class to subclass relationships. However, we plan

to extend these capabilities to include relationships

between concepts and automatic class membership

determination using a domain ontology.

Our approach to capture metadata information is

generic, takes advantage of the OGC standard in-

terfaces. With our harvesting tool we were able

to create 2690 metadata records. However the in-

formation supplied by the WFS publishing enti-

ties has limitations and is in many cases incom-

plete. Our metadata records contain 1384 distinct

keywords including 383 actual URLs. However in

no case the URLs referred to any ontology or for-

malized vocabulary. From the URLs, 217 were

links to html documents, and 52 to XML docu-

ments. In both cases the documents contained ex-

tended metadata descriptions of the datasets. All

WEBIST2014-InternationalConferenceonWebInformationSystemsandTechnologies

158

the datasets with URL of extended descriptions were

provided by one single WFS deployment (gisweb-

services.massgis.state.ma.us/geoserver/wfs?), the rest

of the keywords were strings with no formal seman-

tics associated. Our metadata harvesting tool also

obtained information regarding the names of the at-

tributes of the dataset. In total we have obtained 6331

individual attribute names, all of them were strings

with no formal semantics associated.

The use of extended descriptions in XML and

HTML documents is not a standard practice among

the WFS publishing entities. However, in case we

ﬁnd more documents of this kind, we can upgrade the

harvesting tool in order to allow it to get information

from the associated documents.

The results of our current implementation are

promising, in the near future we will implement an

automatic classiﬁcation of metadata records based on

harvested metadata using a domain ontology.

ACKNOWLEDGEMENTS

This research is supported by: 1) Conseil r

egional de

Bourgogne. 2) Direction G

erale de l’Armement,

see: http://www.defense.gouv.fr/dga/.

REFERENCES

DuCharme, B. (2011). Learning SPARQL. O’Reilly Media,

Inc.

Dunne, D., Leadbetter, A., and Lassoued, Y. (2012). ICAN

Semantic Interoperability Cookbooks. Technical re-

port, International Coastal Atlas Network.

ESRI (2010). GIS Best Practices: Spatial

Data Infrastructure (SDI). http://www.

esri.com/library/bestpractices/spatial-data-

infrastructure.pdf. Accessed: July 2013.

ESRI, D. (2011). World administrative units.

http://resources.arcgis.com/content/data-

maps/10.0/world. Accessed on May 2013.

Goodwin, J. (2005). What have ontologies ever done for us

- potential applications at a national mapping agency.

In OWL: Experiences and Directions (OWLED).

Gwenzi, J. (2010). Enhancing spatial web seach with se-

mantic web technology and metadata visualization.

Master of science, University of Twente.

Harbelot, B., Arenas, H., and Cruz, C. (2013). Semantics

for Spatio-Temporal “Smart Queries”. In Poster pre-

sentation in the 9th. International Conference on Web

Information Systems and Technologies, Aachen, Ger-

many.

Janowicz, K., Schade, S., Broring, A., Kebler, C., Maue,

P., and Stasch, C. (2010). Semantic Enablement for

Spatial Data Infrastructures. Transactions in GIS,

14(2):111–129.

Janowicz, K., Scheider, S., Pehel, T., and Hart, G.

(2012). Geospatial Semantics and Linked Spatiotem-

poral Data - Past, Present and Future. Semantic Web

- Interoperability, Usability and Applicability, 3(4):1–

10.

Kammersell, W. and Dean, M. (2007). Conceptual Search:

Incorporating Geospatial Data into Semantic Queries.

In Scharl, A. and Tochtermann, K., editors, The

Geospatial Web, Advanced Information and Knowl-

edge Processing, pages 47–54. Springer London.

10.1007/978-1-84628-827-2

Kolas, D. and Batle, R. (2012). GeoSPARQL User Guide.

http://ontolog.cim3.net/ﬁle/work/SOCoP/Educational/

GeoSPARQL User Guide.docx Accessed on May

2013.

Kolas, D., Hebeler, J., and Dean, M. (2005). Geospatial Se-

mantic Web: Architecture of Ontologies. pages 183–

194.

Lopez-Pellicer, F. J., Florczyk, A., Renteria-Aguaviva, W.,

Nogueras-Iso, J., and Muro-Medrano, P. R. (2010).

CSW2LD: a Linked Data frontend for CSW.

OGC (2012). OGC Institutional Web Site.

http://www.opengeospatial.org/. Accessed: Septem-

ber 2013.

OSGeo (2012). CQL.

http://docs.geotools.org/latest/userguide/library

/cql/cql.html. Accessed on November 2012.

Pigot, S. (2012). Using RDF as Metadata Storage.

http://trac.osgeo.org/geonetwork/wiki/rdfstore. Ac-

cessed on May 2013.

Vretanos, P. A. (2005). Filter Encoding Implementation

Speciﬁcation. online. Accessed on May 2013.

Werner, D., Cruz, C., and Nicolle, C. (2012). Ontology-

based Recommender System of Economic Articles. In

WEBIST 2012, pages 725–728.

Yue, P., Di, L., Yang, W., Yu, G., and Zhao, P. (2006).

Path planning for chaining geospatial web services.

In Proceedings of the 6th international conference

on Web and Wireless Geographical Information Sys-

tems, W2GIS’06, pages 214–226, Hong Kong, China.

Springer-Verlag.

Yue, P., Gong, J., Di, L., He, L., and Wei, Y. (2011). In-

tegrating Semantic Web Technologies and Geospatial

Catalog Services for Geospatial Information Discov-

ery and Processing in Cyberinfrastructure. GeoInfor-

matica, 15:273–303. 10.1007/s10707-009-0096-1.

ImplementingaSemanticCatalogueofGeospatialData

159