Our approach, manages to deliver efficient cross
disciplinary querying and processing of array data
and metadata, by offering a unified and friendly way
through the xWCPS 2.0. xWCPS 2.0 is built on the
first specification of the language xWCPS 1.0
(Liakos, 2014), as defined in EarthServer Project
(EarthServer.eu, 2018) and refines its characteristics,
so it facilitates implementation, improves expected
query performance and eases user adoption and
usage. In contrast to traditional approaches, where
two different queries are required so as to first filter
the semi structured metadata, retrieve and process the
results, in our approach the same functionality can be
achieved by executing just one unified query,
resulting the least number of data transferred to their
final consumption point. To overcome those
limitations, we propose a query language with clear
and user friendly syntax and we offer a working
engine for it, in which its core components are a new
metadata management engine, called FeMME, that
follows a scalable no-SQL approach fitting the needs
of the endeavour, and a proven array database system,
Rasdaman and we efficiently combine them to
support unified processing and retrieval of array data
and metadata.
2 CONCEPTS AND MOTIVATION
The fundamental ideas behind this work have
emerged from the EarthServer project series, which
set as an objective to establish Agile Analytics on
Petabyte Data Cubes as a simple, user-friendly and
scalable paradigm. The mandate of the project
includes the delivery of a standards’ based,
declarative query language that enhances geospatial
data infrastructures by allowing combined
multidimensional data and metadata filtering and
processing. xWCPS, in its current 2
nd
version, is built
on top of xWCPS 1.0 and combines two widely
known specifications, XPath (W3c.org, 2018) and the
Web Coverage Processing Service (WCPS) standard
(Baumann, 2010) into a single FLWOR (acronym
For-Let-Order By-Where-Return, which stands for
For-Let-Where-Order-Return) syntax to achieve the
aforementioned result.
In the root of the overall approach lies the concept
of “coverage” (OGC, 2017), a fundamental element
in Open Geospatial Consortium (http://www.open
geospatial.org/) ecosystem. The coverage refers to
data and metadata representing multidimensional
space/time-varying phenomena. The OGC has
introduced a number of standards and specifications
for accessing, retrieving and processing coverages,
the Web Coverage Service (WCS) (Baumann, 2012)
being one standard to support the access of raster data
that are handled as coverages. WCS defines requests
against these data and returns data with original
semantics (instead of raster images). WCS supports
the delivery of rich metadata about coverages,
however it yields huge flexibility on those metadata
to their provider making assumptions on the nature
and form of those metadata, irrelevant.
Complementing WCS, the Web Coverage Processing
Service offers processing capabilities on top of array
data using its defined language, allowing ad-hoc
processing of coverage data. Examples include
deriving composite indices (e.g. vegetation index),
determining statistical evaluations, and generating
different kinds of plots like data classifications,
histograms, etc.
The primary structure of the WCPS language
comprises the for-where-return clauses. The “for”
clause specifies the set of coverages that will be
examined by a query. The “return” clause specifies
the potential output that may be appended to the list
of results, in each iteration defined by the “for”
clause. Criteria used for determining if the output of
“return” is actually appended are specified by the
“where” clause.
It has to be noted that WCS and WCPS are
implemented by Rasdaman array database, which is
OGC's official Reference Implementation for WCS
Core. Rasdaman (Baumann, 1998), raster data
manager, is a fully parallel array engine that allows
storing and querying massive multi-dimensional
arrays, such as sensor data, satellite imagery, and
simulation data appearing in domains like earth,
space, and life sciences. This array analytics engine
distinguishes itself by its flexibility, performance, and
scalability. From simple geo imagery services up to
complex analytics, Rasdaman provides the whole
spectrum of functionality on spatio-temporal raster
data - both regular and irregular grids.
Although in EarthServer project they are faced
from the geospatial domain standpoint and expressed
as coverages, multi-dimensional arrays are far from
domain specific data form, and play a central role in
all science, engineering, and beyond. Consequently, a
significant number of approaches for retrieval from
arrays have been proposed for different purpose
applications. In practice, though, arrays typically are
forming part of some larger data structure. Array SQL
is a horizontal technology that – by its key enabling
features flexibility, scalability, and information
integration – enhances all fields of data management.
In this domain it is evident that data cannot be
consumed without metadata describing their essence.