Widget-based Exploration of Linked Statistical Data Spaces

Ba-Lam Do, Tuan-Dat Trinh, Peter Wetz, Amin Anjomshoaa, Elmar Kiesling and A. Min Tjoa

Institute of Software Technology and Interactive Systems, Vienna University of Technology, Viena, Austria

Keywords:

Statistical Data, RDF Data Cube Vocabulary, Widget, Mashup, Linked Data.

Abstract:

Today, public statistical data plays an increasingly important role both in public policy formation and as a

facilitator for informed decision-making in the private sector. In line with the increasing adoption of open

data policies, the amount of data published by governments and organizations on the web is growing rapidly.

To increase the value of such data, the W3C recommends the RDF Data Cube Vocabulary to facilitate the

publication of data in a more structured and interlinked manner. Although important ﬁrst steps toward building

a web of statistical Linked Datasets have been made, providing adequate facilities for end users to interactively

explore and make use of the published data remains an unresolved challenge. This paper presents a widget-

based approach to deal with this issue. In particular, we introduce a mashup platform that allows users lacking

advanced skills and knowledge of Semantic Web technologies to interactively analyze datasets through widget

compositions and visualizations. Furthermore, we provide mechanisms for the interconnection of datasets to

support sophisticated knowledge extraction.

1 INTRODUCTION

In recent years, the number of available open data

sources has increased substantially (Brunetti et al.,

2013; Hoeﬂer et al., 2014). A considerable share

of the data published is statistical data, comprising a

wide range of domains including ﬁnance, population,

transportation, employment, etc. Statisticians, scien-

tists and researchers accumulate these data through

observations and experimentation to report overall

trends, identify risks and opportunities, and to con-

duct planning.

Statistical data becomes much more useful when

published as Linked Data, which can be consumed

and manipulated without proprietary tools. A Linked

Data approach allows users to combine data from dif-

ferent sources in order to gain new insights and to

obtain higher data quality, completeness, and level

of detail. The RDF Data Cube Vocabulary (Cyga-

niak and Reynolds, 2011) is a W3C standard for the

publication of multi-dimensional Linked Data on the

web. This vocabulary follows the same principles as

SDMX (Statistical Data and Metadata eXchange), an

ISO standard for exchanging and processing statisti-

cal data. By concretizing the general syntax of the

RDF standard for statistical data, this vocabulary en-

ables data providers to publish their data as Linked

Data on the web.

A large number of organizations and governments

such as the European Commission

, the United King-

dom Department for Communities and Local Govern-

ment

, and the Scottish Government

has adopted this

vocabulary and use it to publish their data sources via

dedicated SPARQL endpoints. This proliferation of

available statistical data has created enormous poten-

tial for interesting applications, but it has so far re-

sulted only in limited adoption by end users, including

developers and knowledge workers. These users need

appropriate tools to analyze, combine, remix, visual-

ize and make sense of the data. At present, however,

the means to obtain access to such data are limited to

three options.

First, users may write SPARQL queries directly.

This is a powerful information retrieval approach that

facilitates the extraction of a variety of information.

However, users are exposed to raw data output, which

is not necessarily easy to comprehend and may be of

limited use for inexperienced users aiming to deduce

insights from statistic data. These users also cannot

be expected to learn the SPARQL query language and

formulate queries by themselves. Even Semantic Web

experts typically have to invest considerable effort to

understand a dataset’s structure and its components

http://digital-agenda-data.eu/sparql

http://opendatacommunities.org/sparql

http://cofog01.data.scotland.gov.uk/sparql

282

Do B., Trinh T., Wetz P., Anjomshoaa A., Kiesling E. and Tjoa A..

Widget-based Exploration of Linked Statistical Data Spaces.

DOI: 10.5220/0005110102820290

In Proceedings of 3rd International Conference on Data Management Technologies and Applications (DATA-2014), pages 282-290

ISBN: 978-989-758-035-2

 2014 SCITEPRESS (Science and Technology Publications, Lda.)

before forming a query to obtain relevant information

of interest.

Second, each SPARQL endpoint would require

development of customized applications, which is

highly inefﬁcient. A typical example is the European

Commission, which not only provides a SPARQL

endpoint, but also visualizations of statistical data by

means of ten types of visual charts. Since applications

can provide elaborate and highly customizable inter-

faces, this option may be the most suitable alternative

for software developers. However, such applications

are frequently proprietary and integrate only a single

static data source.

Finally, some researchers attempted to deal with

this problem by developing generalized solutions

(Maali et al., 2012; Salas et al., 2012; Hoeﬂer et al.,

2014; Helmich et al., 2014; K

ampgen and Harth,

2014). The common idea of these approaches is

to build a web-based application which can analyze

components in each dataset and provide visualization

for this dataset.

However, all of these options are associated with

considerable disadvantages:

1. Dataset exploration is typically limited to viewing

raw data or using limited graphical visualization.

This makes it difﬁcult for users to identify trends

and study datasets in detail.

2. It is typically not possible to combine or compare

data from different datasets, which is an important

requirement in data analytics.

3. Available tools are typically not open, i.e., they do

not allow users and developers to reuse solutions

and extend them with new functionalities and vi-

sual presentations. In the context of open data, it

is crucial to stress that the means to process and

recombine such open data should themselves be

open to maximize beneﬁts and foster widespread

(re-)use.

4. Existing solutions typically do not cope well with

data from available SPARQL endpoints that do

not strictly follow the RDF Data Cube Vocabu-

lary. In Section 6, we show that available faceted

browsers and tools can only analyze a small num-

ber of available endpoints.

In this paper, we address these issues by introduc-

ing a novel approach based on widgets and mashups

that allow end users to effectively explore statistical

data sources available through SPARQL endpoints.

We model and expose each dataset of a source as a sta-

tistical widget with ﬁve salient characteristics: (i) ef-

fective querying, (ii) standard format, (iii) automatic

chart generation, (iv) openness, and (v) linkage.

Effective querying means that end users can

quickly and easily query a dataset via an interac-

tive interface. Next, widgets return their results in

standard JSON-LD (JSON for Linked Data) format

(Sporny et al., 2013), even if the data source only

partly complies to the vocabulary. Based on the result,

the widget will automatically identify suitable charts

that provide meaningful views on the dataset. In ad-

dition, end users can extend widgets with additional

interface components and functionality. Finally, the

system allows users to link widgets and thereby

establish relationships between statistical datasets.

A prototypical implementation of the proposed ap-

proach is available at http://linkedwidgets.org/widget-

generation.

The remainder of this paper is organized as fol-

lows. Section 2 provides background information on

the Data Cube Vocabulary, widgets, and mashups;

Section 3 discusses related work. Section 4 then in-

troduces our widget creation algorithm and Section 5

outlines our approach. Finally, we evaluate our ap-

proach and contrast it to existing alternatives in Sec-

tion 6 and conclude with an outlook on future research

in Section 7.

2 BACKGROUND

2.1 Data Cube Vocabulary

The Data Cube Vocabulary (Cyganiak and Reynolds,

2011) is a recently developed mechanism for enrich-

ing and transforming statistical datasets and publish-

ing them on the web as Linked Data (Maali et al.,

2012). To illustrate the approach, we provide a brief

example of a statistical dataset, which represents a

collection of observations. A set of dimensions, deﬁn-

ing the foundations of the observation (e.g., the time

that the observation applies to, or a geographic region

that the observation covers), together with measures,

which describe objects of the observation (e.g., the

number of bus users during this time, or the income of

employees at a speciﬁc region) semantically describe

these collections. Such a statistical dataset is typically

presented as a table in which a table’s rows represent

observations. Furthermore, dimensions typically cor-

respond to primary keys in databases whereas mea-

sures represent the remaining columns.

Table 1 shows an example of a Bus Vehicle

dataset. Year is a dimension, while Pas (the number

of passengers taking the bus - unit is people in mil-

lion) and Kmh (average speed of bus - unit is km/h)

are measures.

Figure 1 presents a description of the Data Cube

Widget-basedExplorationofLinkedStatisticalDataSpaces

283

qb:Observation

qb:DataSet

qb:DataStructure

Definition

qb:Dimension

Property

qb:Measure

Property

eg:observation1

eg:dsd

eg:refYear

eg:pas

eg:kmh

qb:dataset

eg:observation2

eg:component

rdf:type

qb:structure

qb:dimension

qb:measure

eg:dataset

2010

114.4

16.7

2011

113.6

17

rdf:type

qb:dataset

eg:refYear

eg:pas

eg:kmh

eg:pas

eg:kmh

qb is prefix of namespace http://purl.org/linked-data/cube#

eg is prefix of namespace that describes data source

Year

rdfs:label

Average Speed (km/h)

Class

Object

Component_Definition_Region

qb:component

qb:measure

Passengers (million)

Figure 1: A description according to the Data Cube Vocabulary.

Table 1: An extract of the Bus Vehicle dataset.

Year Pas Km/h

2010 114.4 16.7

2011 113.6 17

2012 167.1 17.3

Vocabulary for this dataset. We will use this ﬁgure

in the remaining Sections to illustrate and explain our

approach.

2.2 Statistical Data Source Exploration

To make a statistical data source available for inter-

active exploration through linked widgets, it is neces-

sary to identify (i) datasets in the data source, (ii) di-

mensions and measures associated with each dataset,

and (iii) a list of possible values for each dimension.

Upon completion of these steps, users can construct

meaningful data ﬁlters in order to uncover informa-

tion in large datasets.

To this end, the platform provides mechanisms for

slice-selection and visualization of a part of a dataset

(Dadzie and Rowe, 2011), created by ﬁltering a single

or multiple dimensions by value. Two types of visual-

izations are available: (i) single dataset visualizations

(e.g., a line chart that describes the trend of number

of passengers taking the bus in the period from 2010

to 2012 – cf. Figure 2), and (ii) multiple dataset vi-

sualizations (e.g., a multiple column chart that com-

pares the number of passengers taking the bus, tram

and metro in the same period – cf. Figure 5).

2.3 Widgets and Mashups

Our approach is based on widgets and mashups. A

widget is “an interactive single purpose application

for displaying and/or updating local data or data on

the Web, packaged in a way to allow a single down-

load and installation on a user’s machine or mobile

device” (C

aceres, 2011). Embedding widgets into a

web page allows execution at the client and makes it

easy for users to modify them. In addition, widgets

can be connected to each other in a mashup (Trinh

et al., 2013), which can convey information to the user

and highlight features of the data in a fast and efﬁcient

manner. A mashup in this context is “a Web page,

or Web application, that uses content from more than

one source to create a single new service displayed in

a single graphical interface” (Crupi, 2010).

3 RELATED WORK

Several researchers implemented custom browsers to

facilitate exploration of statistical datasets. Using the

URL of a SPARQL endpoint or a dataset as a starting

point, they allow users to explore the data source.

CubeViz

(Salas et al., 2012) is a general purpose

solution for exploring statistical data sources and pro-

vides visual presentations via ﬁve different types of

charts. In our evaluation, however, we found that it

only worked with two data sources from the European

http://cubeviz.aksw.org/

DATA2014-3rdInternationalConferenceonDataManagementTechnologiesandApplications

284

Commission and was not able to detect datasets using

other endpoints (cf. Table 4).

Data Cube faceted browser

(Maali et al., 2012)

was able to detect datasets in a larger number of end-

points in our evaluation presented in Section 6. How-

ever, it is only suitable for datasets with a small num-

ber of observations. It is also limited in that it pro-

vides only a list of observations of each dataset with-

out any visual charts.

Linked Data Query Wizard

(Hoeﬂer et al., 2014)

applies the idea of presenting statistical datasets via a

tabular interface. As such, end users can change the

slices of a dataset by choosing one value as a ﬁlter

value. However, due to the lack of a complete list

of values for each dimension, this solution constrains

end users to a small number of slices. In addition, this

solution is restricted to a static list of speciﬁc end-

points.

Linked Data Cubes Explorer (LDCE)

ampgen

and Harth, 2014) uses an OLAP approach to validate

and analyze statistical datasets. Unfortunately, this

tool only works with its sample datasets and returns

an error for any external dataset (cf. Table 4).

Payola

(Helmich et al., 2014) can receive an ar-

bitrary RDF source and transform it to RDF conform-

ing to the Data Cube Vocabulary. After that, it can

provide user-friendly visualizations. At present, how-

ever, this tool seems unstable and cannot run its sam-

ple experiments.

The difﬁculties in analyzing data sources that

these tools face stem from the variability in the

use of the vocabulary. A considerable number

of datasets only describe a part of the vocabulary.

For example, in Figure 1, without using Compo-

nent Deﬁnition Region, we cannot directly differen-

tiate between dimensions and measures in the dataset.

In addition, there are datasets which use slices (Cy-

ganiak and Reynolds, 2011) to build subsets of ob-

servations without using the predicate qb:dataset. We

implemented an algorithm to cope with such hetero-

geneity and inconsistency.

Furthermore, existing solutions use open data to

provide closed applications which run on the server

side. This poses a contradiction. We provide a novel

approach that is open for adaptation and extension by

end users. To this end, our approach presents data in

a well-deﬁned standard format.

http://vmsgov03.deri.ie:8080/RDF-faceted-

browser/start.html

http://code.know-center.tugraz.at/search

http://ldcx.linked-data-cubes.org/projects/ldcx

http://datacube.payola.cz

4 DATA SOURCE ANALYSIS

To analyze data sources provided via SPARQL end-

points and automatically generate widgets for the

identiﬁed datasets, we introduce an algorithm that

involves the following steps: (i) datasets identiﬁca-

tion, (ii) dimensions and measures identiﬁcation, and

(iii) values and labels identiﬁcation. In the following,

we use Figure 1 as an illustrative example.

4.1 Datasets Identiﬁcation

The vocabulary allows to identify a dataset (i.e.,

eg:dataset) through one of the following triple pat-

terns: eg : observation −qb : dataset → eg : dataset,

eg : dataset − rd f : type → qb : DataSet, and eg :

dataset −qb : structure →eg : dsd.

The ﬁrst pattern is available in almost all evaluated

SPARQL endpoints, because it provides the relation-

ship between a dataset and its observations. However,

an endpoint can provide millions of observations, and

therefore this query, which is essential for dataset de-

tection, will eventually timeout. To alleviate this is-

sue, we apply the two latter triple patterns. They

represent a 1:1 relationship between a dataset and its

type, and between a dataset and its data structure def-

inition, respectively. Unfortunately, datasets can fo-

cus on describing observations without describing the

remaining components of the vocabulary, rendering

these triple patterns useless. Overall, however, the

combined utilization of all three triple patterns results

in high recall when detecting datasets.

4.2 Dimensions and Measures

Identiﬁcation

Ideally, the predicates qb:dimension, qb:measure

or rdf:type – which are deﬁned in the vocabulary –

indicate dimensions and measures (i.e., eg:refYear,

eg:pas, eg:kmh) directly. Otherwise, we can only

derive dimension and measures based on their URIs

and values in the observations. In statistical data,

the values of measures are typically represented via

numerical formats whereas the values of dimensions

are often either years or strings (e.g. country code:

AT for Austria, BE for Belgium, etc.). Furthermore,

URIs of components may indicate their role, such as

http://.../measure/pas for a measure.

4.3 Values and Labels Identiﬁcation

A complete list of values of each dimension must

be obtained, because it can serve as ﬁlter values for

Widget-basedExplorationofLinkedStatisticalDataSpaces

285

Figure 2: An automatically generated widget.

end users. In addition, we need to identify labels

of dimensions and measures in each dataset, because

they support users to understand the meaning of these

components. For example, in the Bus Vehicle dataset,

the measure “Pas” does not make sense for users,

while its label – “Number of passengers” is a mean-

ingful description. To overcome limitations of query

time for large datasets, we use loops to retrieve data

within a given time threshold.

5 STATISTICAL WIDGET

GENERATION AND MASHUP

Figure 2 shows a sample widget. Each widget con-

sists of (i) a list of dimensions, (ii) a list of measures,

and (iii) a chart type, which allows users to impose

ﬁlter conditions in an easily comprehensible manner.

Next, depending on the options, each widget gen-

erates a SPARQL query to collect the desired data,

which is then converted into the JSON-LD format

(Sporny et al., 2013) and set as the input of the chart.

The use of JSON-LD facilitates the integration of data

between disparate systems, thereby supporting the

combination of statistical datasets. Figure 4 provides

an example of a JSON-LD description of a query.

Charts may illustrate statistical data in a meaning-

ful way and can uncover relationships that are not ob-

vious from studying a list of numbers. Based on the

Figure 3: Automatically generated widget of the European

Commission endpoint.

JSON-LD result of the query, the widget automati-

cally detects which types of charts are feasible for a

speciﬁc query from a list of nine common chart types,

e.g. Column, Line, Pie, Bubble, and Geo map chart

Developers need appropriate means to modify

auto-generated widgets in order to extend the inter-

face and incorporate additional functionality. For ex-

ample, the European Commission’s endpoint offers

only one statistical dataset, but it has more than 100

indicators and each indicator requires a speciﬁc value

set for the dimensions breakdown, unit of measure,

country, and time period as shown in Figure 3. There-

fore, additional functionality is necessary to list only

suitable values for remaining dimensions whenever

users change the value of the indicator. Developers

may also, for example, impose thresholds on dimen-

sion values, query a limited list of measures, or pro-

vide a new visual chart type.

Using JSON-LD, the generated widgets can col-

laborate in a mashup to compare and combine data

from different datasets. For example, Figure 5 shows

a mashup that compares the number of passengers us-

ing bus, tram, and metro vehicles.

https://developers.google.com/chart/interactive/docs/

gallery

DATA2014-3rdInternationalConferenceonDataManagementTechnologiesandApplications

286

{

“@context”:{

“vogd”: “http://ogd.ifs.tuwien.ac.at/vienna/”,

“qb”: “http://purl.org/linked-data/cube#”,

“xsd”: “http://www.w3.org/2001/XMLSchema#”,

“observation”: “qb: observation”,

“dimension”: “qb: dimension”,

“measure”: “qb: measure”,

“component”: “qb: component”,

“label”: “http://www.w3.org/2000/01/rdf-schema#label”,

“year”: “xsd: gYear”,

“pas”: “vogd: pas”,

“kmh”: “vogd: kmh”

“@id”: “vogd: betriebszweige2012-autobus”,

“@type”: “qb: DataSet”,

“label”: “Vienna Bus”,

“component”:

{

“@type”: “qb: ComponentSpeciﬁcation”,

“dimension”:

[

{

“@id”: “xsd: gYear”,

“@type”: “qb: DimensionProperty”,

“label”: “Year”

}

“measure”:

[

{

“@id”: “vogd: pas”,

“@type”: “qb: MeasurePropery”,

“label”: “Number of passengers (million)”

{

“@id”: “vogd:kmh”,

“@type”: “qb:MeasurePropery”,

“label”: “Average speed(km/h)”

}

“observation”:

[

{

“@type”: “qb:Observation”,

“@id”: “vogd: betriebszweige2012-autobus.9”,

“year”: “2010”,

“pas”: 114.4,

“kmh”: 16.7

}

]

}

Figure 4: Query result in JSON-LD format.

Each statistical mashup can be composed of three

types of widgets: (i) Dataset Widgets are auto-

generated widgets or modiﬁed widgets, (ii) Merger

Widgets, which integrate two datasets from compati-

ble input widgets, i.e. widgets with the same dimen-

sions, to a single combined dataset, and (iii) General

Presentation Widgets, which receive data from either

a Dataset Widget or a Merger Widget and displays

them visually.

We can also distinguish widget types by the cardi-

nality of their inputs and outputs: dataset widgets do

not have an input, but have an output; merger widgets

have two inputs and an output; general presentation

widgets have an input, but no outputs.

To illustrate the transformations performed by

merger widgets, assume that we have separate pas-

senger datasets for tram and bus vehicles (cf. Table 2

and 1, respectively). We use dimensions, i.e. Year, to

group data by year, allowing users to easily compare

the number passengers using Bus and Tram for each

year (cf. Table 3).

Table 2: Subset of a Tram dataset.

Year Pas Hast

2011 193.8 1031

2012 295.1 1056

Pas = number of passengers; Hast = Number of stops

Table 3: Combined Bus & Tram dataset.

Year Pas Bus Pas Tram Khm Hast

2010 114.4 - 16.7 -

2011 113.6 193.8 17 1031

2012 167.1 295.1 17.3 1056

We outline the algorithm to merge dimensions,

measures, and observations from two datasets in the

following:

1. Dimensions of the output dataset are the same as

the dimensions of the input datasets.

2. Measures of the output dataset contain all mea-

sures from two input datasets. Measures which

belong to both of two input datasets, e.g., Pas,

play an important role in comparing and contrast-

ing data. However, since they have the same

URI, each measure will receive a new URI. The

new URI and the original URI are linked via the

owl:sameAs predicate.

3. Observations: the algorithm merges observations

of two input datasets based on the values of

dimensions to build observations of the output

dataset. If two observations from two input

datasets have the same dimension value, e.g. Year

- 2011, the algorithm will merge them into a sin-

gle observation in Table 3. Otherwise, if the di-

mension value of an observation O

from one

dataset does not appear in the other dataset, e.g.,

Year - 2010 in Bus Vehicle, the algorithm gener-

ates an empty observation O

for the latter dataset

before merging it with O

to the output dataset.

Widget-basedExplorationofLinkedStatisticalDataSpaces

287

Figure 5: An example of a mashup for data comparison.

6 EVALUATION

We performed two tasks for a preliminary evaluation

of our approach: (i) test the data source analysis algo-

rithm with 23 available endpoints using the Data Cube

Vocabulary, and (ii) validate the features provided by

auto-generated widgets.

In our experiments for the ﬁrst task, we found that

our prototypical implementation can analyze and cre-

ate widgets for all endpoints tested. We compared our

results to those obtained using existing alternatives

with respect to four aspects: (i) datasets identiﬁed,

(ii) dimensions identiﬁed, (iii) measures identiﬁed in

each dataset, and (iv) list of values identiﬁed for each

dimension. The result in Table 4 shows that the exist-

ing browsers can only handle data from less than four

SPARQL endpoints correctly. The Linked Data Cube

Explorer even cannot analyze a single dataset out of

the 23 tested endpoints. The Linked Data Query Wiz-

ard, by contrast, can analyze the ﬁrst twelve end-

points. However, since it supports only a limited num-

ber of ﬁxed input endpoints, we cannot evaluate its

capabilities for the full set of tested endpoints.

In the latter task, we compared features of auto-

generated widgets for the endpoint of the European

Commission with visual charts designed speciﬁcally

for this endpoint

. Using the developed platform,

users can easily impose ﬁlters on a single or multiple

dimensions to explore particular slices of the dataset

(cf. Figure 3 for an example). Hence, users do not

need to write complex SPARQL queries such as the

one listed in Figure 6, which retrieves the same data

as the mashup depicted in Figure 3. Our automatic

http://ec.europa.eu/digital-agenda/en/graphs

chart generation feature ensures that the widget can

provide suitable charts for the selected data, e.g. col-

umn, bar, pie, donut, geo charts, and Geo Maps are

available for the view in Figure 3). In addition, users

can use mashups to compare the values between in-

dicators or compare values from different countries.

We built two widgets – Merge widget and General

Presentation widget – which are suitable for arbitrary

statistical Dataset widgets. Overall, our automatically

generated widget offers visualizations which are com-

parable in functionality to those provided by the ap-

plication of the European Commission. However, our

visualizations can be used not only for this speciﬁc

endpoint, but more generally for arbitrary endpoints.

PREFIX qb: <http://purl.org/linked-data/cube# >

PREFIX sdmx: <http://purl.org/linked-data/sdmx/2009/measure# >

PREFIX digital:<http://semantic.digital-agenda-data.eu/def/property/>

PREFIX dataset: <http://semantic.digital-agenda-data.eu/dataset/ >

PREFIX indi: <http://semantic.digital-agenda-data.eu/codelist/indicator/>

PREFIX time: <http://reference.data.gov.uk/id/gregorian-year/>

SELECT DISTINCT * WHERE {

?o qb:dataSet ?ds.

?o sdmx:obsValue ?obsValue.

?o digital:indicator ?indicator.

?o digital:breakdown ?breakdown.

?o digital:ref-area ?ref area.

?o digital:time-period ?time period.

?o digital:unit-measure ?unit measure.

FILTER(?ds = dataset:digital-agenda-scoreboard-key-indicators)

FILTER(?indicator=indi:mbb 3gcov)

FILTER(?time period= time:2008)

}

Figure 6: An example of a SPARQL query.

DATA2014-3rdInternationalConferenceonDataManagementTechnologiesandApplications

288

7 CONCLUSIONS

Based on the idea of Linked Data, which aims to con-

nect and reuse data rather than storing it in isolated

silos, we propose a novel approach to provide end

users with efﬁcient mechanisms to analyze, combine,

remix, visualize, and make sense of statistical data

available via SPARQL endpoints using the Data Cube

Vocabulary. Each statistical dataset is automatically

made available as a widget that allows effective query-

ing, provides output in a standard format, facilitates

automatic chart generation, and embodies principles

of openness and linkage. Two of these characteristics

– effective querying and automatic chart generation –

allow end users to effectively explore the dataset. Fur-

thermore, standard format and openness enable de-

velopers to modify and develop new functionalities

and new types of visualization. Finally, linkage al-

lows all automatically generated and manually modi-

ﬁed widgets to be combined ﬂexibly in a mashup.

We also presented a prototypical implementation

of the proposed system and evaluated it using 23

SPARQL endpoints that use the Data Cube Vocab-

ulary. We found that the approach shows great po-

tential and handles data from all identiﬁed SPARQL

endpoints well, even those that only partly follow the

RDF Data Cube Vocabulary.

Due to the problems of co-reference between

URIs, ontology mapping, etc. (Millard et al., 2010;

Schlegel et al., 2014), widget linkage is currently sup-

ported only for datasets from the same endpoint. As a

next step, we plan to address this limitation.

REFERENCES

Brunetti, J. M., Auer, S., Garc

ıa, R., Kl

ımek, J., and

Necask

y, M. (2013). Formal linked data visualization

model. In IIWAS’13, International Conference on In-

formation Integration and Web-based Applications &

Services.

aceres, M. (2011). Requirement for Standardizing Wid-

gets.

Crupi, J. (2010). A business guide to enterprise mashups.

Cyganiak, R. and Reynolds, D. (2011). The RDF Data Cube

Vocabulary.

Dadzie, A. S. and Rowe, M. (2011). Approaches to visual-

ising linked data: A survey. Semantic Web, 2:89–124.

Helmich, J., Kl

ımek, J., and Necask

y, M. (2014). Visualiz-

ing rdf data cubes using the linked data visualization

model. In ESWC’14, Extended Semantic Web Confer-

ence - Posters & Demonstrations Track.

Hoeﬂer, P., Granitzer, M., Veas, E., and Seifert, C. (2014).

Linked data query wizard a novel interface for access-

ing sparql endpoints. In LDOW’14, Linked Data on

the Web Workshop.

ampgen, B. and Harth, A. (2014). Olap4ld a framework

for building analysis applications over governmental

statistics. In ESWC’14, Extended Semantic Web Con-

ference - Posters & Demonstrations Track.

Maali, F., Shukair, G., and Loutas, N. (2012). A dynamic

faceted browser for data cube statistical data. In W3C

Using Open Data Workshop.

Millard, I. C., Glaser, H., Salvadores, M., and Shadbolt,

N. (2010). Consuming multiple linked data sources:

Challenges and experiences. In COLD’10, Interna-

tional Workshop on Consuming Linked Data.

Salas, P. E. R., Mota, F. M. D., Martin, M., Auer, S., Bre-

itman, K., and Casanova, M. A. (2012). Publishing

statistical data on the web. International Journal of

Semantic Computing, 6:373–388.

Schlegel, T., Stegmaier, F., Bayerl, S., Granitzer, M., and

Kosch, H. (2014). Balloon fusion: Sparql rewrit-

ing based on uniﬁed co-reference information. In

DESWeb’14, 5th International Workshop on Data En-

gineering Meets the Semantic Web.

Sporny, M., Kellogg, G., and Lanthaler, M. (2013). JSON-

LD 1.0 - A JSON-based Serialization for Linked Data.

Trinh, T.-D., Do, B.-L., Wetz, P., Anjomshoaa, A., and

Tjoa, A. M. (2013). Linked widgets an approach

to exploit open government data. In IIWAS’13, Inter-

national Conference on Information Integration and

Web-based Application & Services.

Widget-basedExplorationofLinkedStatisticalDataSpaces

289

APPENDIX

Table 4: Evaluation results for existing browsers.

Endpoint

Faceted browser CubeViz LDCE

DS D M V DS D M V DS D M

http://open-data.europa.eu/en/sparqlep

√ √ √

√ √ √ √

χ χ χ

http://digital-agenda-data.eu/data/sparql

√ √

χ χ

√ √ √ √

χ χ χ

http://ogd.ifs.tuwien.ac.at/sparql

√ √ √ √ √

χ χ χ χ χ χ

http://zaire.dimis.ﬁm.uni-passau.de:8890/sparql ∞ ∞ ∞ ∞ χ χ χ χ χ χ χ

http://ecb.270a.info/sparql ∞ ∞ ∞ ∞ χ χ χ χ χ χ χ

http://fao.270a.info/sparql ∞ ∞ ∞ ∞ χ χ χ χ χ χ χ

http://imf.270a.info/sparql ∞ ∞ ∞ ∞ χ χ χ χ χ χ χ

http://oecd.270a.info/sparql ∞ ∞ ∞ ∞ χ χ χ χ χ χ χ

http://transparency.270a.info/sparql

√ √ √ √ √

χ χ χ χ χ χ

http://worldbank.270a.info/sparql ∞ ∞ ∞ ∞

√

χ χ χ χ χ χ

http://datameti.go.jp/sparql

√ √ √ √ √

χ χ χ χ χ χ

http://semantic.eea.europa.eu/sparql ∞ ∞ ∞ ∞ χ χ χ χ χ χ χ

http://gov.tso.co.uk/coins/sparql ∞ ∞ ∞ ∞

√

χ χ χ χ χ χ

http://openuplabs.tso.co.uk/sparql/gov-coins ∞ ∞ ∞ ∞ χ χ χ χ χ χ χ

http://agencies.publicdata.eu/sparql

√

χ χ χ χ χ χ χ χ

http://unodc.publicdata.eu/sparql

√

χ χ χ χ χ χ χ χ

http://cofog01.data.scotland.gov.uk/sparql ∞ ∞ ∞ ∞ χ χ χ χ χ χ χ

http://eur-lex.publicdata.eu/sparql

√

χ χ χ χ χ χ χ χ

http://prelex.publicdata.eu/sparql

√

χ χ χ χ χ χ χ χ

http://n-lex.publicdata.eu/sparql

√

χ χ χ χ χ χ χ χ

http://eventmedia.eurecom.fr/sparql

√

χ χ χ χ χ χ χ χ χ χ

http://opendatacommunities.org/sparql ∞ ∞ ∞ ∞ χ χ χ χ χ χ χ

http://open-data.europa.eu/linked-data ∞ ∞ ∞ ∞ χ χ χ χ χ χ χ

DS: Dataset, D: Dimensions, M: Measures, V: Values of Dimension

√

:Yes; χ:No; ∞: No result after one hour

DATA2014-3rdInternationalConferenceonDataManagementTechnologiesandApplications

290