ing provides an example where the SUM aggregate is
used.
SELECT SUM(NO2) FROM NO2Time
WHERE NO2 < 10 and Time < 20-8-2017
It is noteworthy to mention that all the retrieval
and aggregation commands are to be computed in
a symbolic manner wherever possible. We mean
by symbolic, the usage of mathematical calculations
rather numerical ones. For example, the retrieval of a
measurement below a certain threshold, as in the pre-
vious listing WHERE NO2 < 10, is not computed by
re-sampling the fitted function and then filtering it,
but by solving the inequality when the difference be-
tween the function and the threshold is below zero.
The authors of functionDB proved empirically that
using symbolic computation have performed better in
terms of time and resource utilization than using nu-
merical computation. This is one of the reasons that
support our approach.
2.5 Functional Analysis
Functional Data Analysis (FDA) is a field of statis-
tics where functions are analyzed to explore, confirm,
or predict certain knowledge statements. Unlike the
conventional data mining techniques that are applied
over vectors of discrete data, FDA works only with
functions; functions are the atomic data structure in
FDA. FDA is widely used for time series, spatial data,
and spatiotemporal data. One of our frameworks main
goals is to fasten and ease the transition from asyn-
chronous discrete data into functional views and fi-
nally into knowledge. For that, one important con-
tribution of our framework is to integrate functional
analysis operations with the high-level FSQL. In our
framework, the set of aggregates in FSQL are ex-
tended to support functional analysis operations. The
number of functional analysis operations is being in-
creased through the years by the continuous research
efforts in the field; this shows the importance of our
framework to be a platform that can incubate such ef-
forts. However, there is a group of basic and widely
used operations that are essential in any functional
analysis application.
2.5.1 Basic Operations
Functional data analysis usually starts by applying ba-
sic exploratory operations that permits to understand
the data more. Starting with the very basic statis-
tical operations such as mean, variance, covariance,
and correlation of a group of functional observations
and not ending with different transformation opera-
tions such as derivation, and registration. Derivation
helps in seeing the rate of change in the data which
gives the analyst a deeper sense of the data. Regis-
tration is the process of aligning different functional
observations with respect to function features such as
valleys and peaks; this leads to a better representation
of the data. The following listing showcase how some
of these aggregates might be used.
CREATE VIEW DerivativeMeanABCDE AS
SELECT MEAN(DERIVE(NO2TimeA, NO2TimeB,
NO2TimeC, NO2TimeD, NO2TimeE))
WHERE Time > 10-09-2017
This query creates a new continuous functional view
that represents the mean of the derivative of 5 differ-
ent NO2 observations after a certain time.
2.5.2 Advanced Operations
Advanced functional analysis operations include
Functional Principle Component Analysis (FPCA),
Functional regression and many others used in the
classification, clustering, and prediction of functional
data. FPCA is the functional version of the Princi-
ple Component Analysis (PCA) tool which is a well-
known dimensionality reduction tool used in data
mining. With FPCA functions can be projected over a
group of principal components that represents the sys-
tem where the variance between the functions projec-
tion is maximized. This type of operations might get
really complex and divers that is why we didnt con-
sider integrating them into FSQL aggregates. How-
ever, we are willing to build an external module that
acts over the algebraic representation of functions.
This module should be extensible so that researchers
can integrate their own developed algorithms and op-
erations.
3 RELEVANCE
DEMONSTRATION
Polluscope is a multidisciplinary project that aims to
measure individual’s air pollution exposure and re-
late it to health using opportunistic mobile monitor-
ing. One prior task was to asset different air pollutants
sensors. The quality assessment was done through a
novel algorithm introduced in (Fishbain et al., 2017)
that compares sensor time series with an accurate ref-
erence time series. The problem was that the sam-
pling rate of each time series was different and the al-
gorithm couldn’t be applied. To overcome this prob-
lem we developed an R language library that wraps in-
terpolation logic provided by the functional data anal-
ysis library of R fda.R (Ramsay et al., 2017). The de-
veloped library fits two functions one for the sensor
GISTAM 2018 - 4th International Conference on Geographical Information Systems Theory, Applications and Management
294