sures). Furthermore, the framework does not rely on
semantic description of charts and it offers just few
very ordinary chart types.
3 WORKFLOW OVERVIEW
3.1 The CODE Platform
The CODE project
4
offers a platform to structure re-
search data and release them as Linked Data (Bizer
et al., 2009). Linked Data describes methods to pub-
lish and to interlink structured data (meta-data) on
the World Wide Web. The intent of these methods
is to connect data with semantic technologies, mak-
ing them automatically readable by computers. In the
CODE project, Linked Data act as a basis to publish
and to interlink research data (Seifert et al., 2013),
thereby strongly focusing on their content and not
only on structural attributes.
Figure 1 shows the CODE approach to organize
and analyze research publications. To bring published
data to the hands of the user in an intuitive man-
ner, CODE envisions a workflow with the following
major steps: (a) Data Extraction, (b) Integration and
Aggregation, and (c) Analysis and (d)Visualisation.
Automation of this workflow is essential for ana-
lysts who have to integrate huge amounts of research
knowledge in short time. For instance, the first two
steps deal with the automated extraction and integra-
tion of the research knowledge into a common meta-
model (in further text, vocabulary - e.g. from the
unstructured text stored in the PDF format, to struc-
tured LOD in RDF), while the last step offers the au-
tomated support for visualising that knowledge. Sec-
tion 4 looks further at the process to suggest visuali-
sations.
3.2 Visualisation Workflow
Consider the following scenario: While looking at
a publication, Jane feels overwhelmed with numbers
spread across tables throughout the pages. To make
sense of these data, she quickly exports it to the
CODE visual analysis tools. Before visualizing the
data, Jane has to specify the dimensions and measures
(in further text, RDF Data Cube Components) of the
data. The Vis Wizard then suggests appropriate visu-
alisations, which Jane can then fine-tune to her liking.
Figure 1 shows how the above scenario is realised
with CODE components. The first step, extracting
the data, is automated by the CODE PDF Extractor
4
CODE: code-research.eu/
(Klampfl et al., 2013) , which extracts tables from
scientific publications. In the second step, export-
ing the data in an appropriate format the CODE Data
Extractor (Schlegel et al., 2013) is used to semanti-
cally annotate the table (i.e. specify dimensions and
measures, and their types), producing an RDF Data
Cube. We chose the RDF Data Cube Vocabulary
(RDF-DCV) because it was developed by the W3C
to represent statistical data (e.g. the research results
from tables in a publication) (Salas et al., 2012).
Once a data set is available in the RDF Data Cube
format it is passed to the Vis Wizard. In this third
step, mapping the data onto visualisations, a map-
ping algorithm uses the semantic descriptions of vi-
sual components and the semantic annotations of the
data to suggest visualisations, suitable for that partic-
ular data set. The user only needs to choose a visuali-
sations by pressing one of the enabled buttons and the
chosen visualisation will automatically generated and
displayed. The fourth step, visualising the data set,
allows the user to modify the mapping of visual chan-
nels (i.e. visual attributes such as axes, size or colour
of visual items, etc.) to the structured data. The user
has the option to re-adjust how the data columns are
mapped to the visual channels, whereby only mean-
ingful mappings are permitted. It is also possible to
generate additional visualisations for the same data
set, which are displayed within the same browser win-
dow, empowering the user to analyse different aspects
of a heterogeneous data set in a combined view.
3.3 The Data Representation
The RDF-DCV represents data as a collection of so
called observations, each consisting of a set of dimen-
sions and measures. Dimensions identify the obser-
vation, measures are related to concrete values. For
example: in the dataset for the PAN
5
scientific chal-
lenge, that evaluates software for uncovering plagia-
rism developed by different teams. The RDF-DCV
includes a collection of observations with dimensions
describing the teams and with concrete values for the
challenge result (Figure 2 shows a sample visualisa-
tion).
Therefore, using the RDF-DCV, one such obser-
vation is created for each of the statistical values in a
publication. The format guarantees a uniform repre-
sentation for all (unstructured) statistics, thereby en-
abling the Vis Wizard to access data in a standard way
defined by the RDF Data Cube specification.
5
PAN: http://pan.webis.de/
SuggestingVisualisationsforPublishedData
269