before forming a query to obtain relevant information
of interest.
Second, each SPARQL endpoint would require
development of customized applications, which is
highly inefficient. A typical example is the European
Commission, which not only provides a SPARQL
endpoint, but also visualizations of statistical data by
means of ten types of visual charts. Since applications
can provide elaborate and highly customizable inter-
faces, this option may be the most suitable alternative
for software developers. However, such applications
are frequently proprietary and integrate only a single
static data source.
Finally, some researchers attempted to deal with
this problem by developing generalized solutions
(Maali et al., 2012; Salas et al., 2012; Hoefler et al.,
2014; Helmich et al., 2014; K
¨
ampgen and Harth,
2014). The common idea of these approaches is
to build a web-based application which can analyze
components in each dataset and provide visualization
for this dataset.
However, all of these options are associated with
considerable disadvantages:
1. Dataset exploration is typically limited to viewing
raw data or using limited graphical visualization.
This makes it difficult for users to identify trends
and study datasets in detail.
2. It is typically not possible to combine or compare
data from different datasets, which is an important
requirement in data analytics.
3. Available tools are typically not open, i.e., they do
not allow users and developers to reuse solutions
and extend them with new functionalities and vi-
sual presentations. In the context of open data, it
is crucial to stress that the means to process and
recombine such open data should themselves be
open to maximize benefits and foster widespread
(re-)use.
4. Existing solutions typically do not cope well with
data from available SPARQL endpoints that do
not strictly follow the RDF Data Cube Vocabu-
lary. In Section 6, we show that available faceted
browsers and tools can only analyze a small num-
ber of available endpoints.
In this paper, we address these issues by introduc-
ing a novel approach based on widgets and mashups
that allow end users to effectively explore statistical
data sources available through SPARQL endpoints.
We model and expose each dataset of a source as a sta-
tistical widget with five salient characteristics: (i) ef-
fective querying, (ii) standard format, (iii) automatic
chart generation, (iv) openness, and (v) linkage.
Effective querying means that end users can
quickly and easily query a dataset via an interac-
tive interface. Next, widgets return their results in
standard JSON-LD (JSON for Linked Data) format
(Sporny et al., 2013), even if the data source only
partly complies to the vocabulary. Based on the result,
the widget will automatically identify suitable charts
that provide meaningful views on the dataset. In ad-
dition, end users can extend widgets with additional
interface components and functionality. Finally, the
system allows users to link widgets and thereby
establish relationships between statistical datasets.
A prototypical implementation of the proposed ap-
proach is available at http://linkedwidgets.org/widget-
generation.
The remainder of this paper is organized as fol-
lows. Section 2 provides background information on
the Data Cube Vocabulary, widgets, and mashups;
Section 3 discusses related work. Section 4 then in-
troduces our widget creation algorithm and Section 5
outlines our approach. Finally, we evaluate our ap-
proach and contrast it to existing alternatives in Sec-
tion 6 and conclude with an outlook on future research
in Section 7.
2 BACKGROUND
2.1 Data Cube Vocabulary
The Data Cube Vocabulary (Cyganiak and Reynolds,
2011) is a recently developed mechanism for enrich-
ing and transforming statistical datasets and publish-
ing them on the web as Linked Data (Maali et al.,
2012). To illustrate the approach, we provide a brief
example of a statistical dataset, which represents a
collection of observations. A set of dimensions, defin-
ing the foundations of the observation (e.g., the time
that the observation applies to, or a geographic region
that the observation covers), together with measures,
which describe objects of the observation (e.g., the
number of bus users during this time, or the income of
employees at a specific region) semantically describe
these collections. Such a statistical dataset is typically
presented as a table in which a table’s rows represent
observations. Furthermore, dimensions typically cor-
respond to primary keys in databases whereas mea-
sures represent the remaining columns.
Table 1 shows an example of a Bus Vehicle
dataset. Year is a dimension, while Pas (the number
of passengers taking the bus - unit is people in mil-
lion) and Kmh (average speed of bus - unit is km/h)
are measures.
Figure 1 presents a description of the Data Cube
Widget-basedExplorationofLinkedStatisticalDataSpaces
283