Suggesting Visualisations for Published Data

Belgin Mutlu

, Patrick Hoeﬂer

, Gerwald Tschinkel

, Eduardo Veas

, Vedran Sabol

1,3

, Florian

Stegmaier

and Michael Granitzer

Know-Center, Graz, Austria

University of Passau, Passau, Germany

Graz University of Technology, Graz, Austria

Keywords:

Linked Data, RDF Data Cube, Visualisation, Visual Mapping, Research Data.

Abstract:

Research papers are published in various digital libraries, which deploy their own meta-models and tech-

nologies to manage, query, and analyze scientiﬁc facts therein. Commonly they only consider the meta-data

provided with each article, but not the contents. Hence, reaching into the contents of publications is inherently

a tedious task. On top of that, scientiﬁc data within publications are hardcoded in a ﬁxed format (e.g. tables).

So, even if one manages to get a glimpse of the data published in digital libraries, it is close to impossible

to carry out any analysis on them other than what was intended by the authors. More effective querying and

analysis methods are required to better understand scientiﬁc facts. In this paper, we present the web-based

CODE Visualisation Wizard, which provides visual analysis of scientiﬁc facts with emphasis on automating

the visualisation process, and present an experiment of its application. We also present the entire analytical

process and the corresponding tool chain, including components for extraction of scientiﬁc data from publica-

tions, an easy to use user interface for querying RDF knowledge bases, and a tool for semantic annotation of

scientiﬁc data sets.

1 INTRODUCTION

We are currently confronted with a continuous, mas-

sive increase of published content, and the applied

methods used by current digital libraries are not suf-

ﬁcient anymore. They mainly expose the research

knowledge using domain-speciﬁc meta-models and

technologies, such as the widely used Dublin Core

meta-model (Powell et al., 2005). But these meta-

models mostly focus on structural attributes like title,

author or keywords, and rarely consider the content

of the publications. Hereby, the ability to effectively

ﬁnd the desired information is limited due to weakly-

deﬁned querying attributes. In addition, scientiﬁc data

or facts included in publications are unstructured or,

at best, in tabular format, so that once the information

is found, there is hardly a way to reuse it.

Our goal is to provide a tool to automatically ex-

tract data from scientiﬁc publications and propose

the appropriate means to visualise the facts and data

therein. Figure 1 illustrates the envisioned workﬂow

with a scenario, starting with extraction of data from

a publication to visualisation. The main concern of

this paper is the automated suggestion of visualisa-

tions appropriate for the data contained in a publi-

cation. Furthermore, we strictly focus on suggesting

only proper visual tools (e.g. visualisations that really

apply to the data), avoiding failure cases that render

the analysis process tedious. To do so, the CODE Vi-

sualisation Wizard

(Vis Wizard) (Mutlu et al., 2013)

relies on a prior extraction and organization of the un-

structured, scientiﬁc data in a publication into Linked

Open Data (LOD) by using the RDF Data Cube Vo-

cabulary

. The strength of LOD lies in its interlinking

structured data in a format that can be read and pro-

cessed by computers. The Vis Wizard then applies

semantic technologies to derive and propose visual

analysis tools.

In order to automate the process of generating and

proposing visual analysis tools, it is necessary to in-

tegrate visualisation aspects into the Semantic Web.

Our contribution is, on the one hand, a vocabulary,

CODE Visualisation Wizard: http://code.know-center.

tugraz.at/vis

RDF Data Cube Vocabulary: www.w3.org/TR/vocab-data-

cube/

267

Mutlu B., Hoeﬂer P., Tschinkel G., Veas E., Sabol V., Stegmaier F. and Granitzer M..

Suggesting Visualisations for Published Data.

DOI: 10.5220/0004674902670275

In Proceedings of the 5th International Conference on Information Visualization Theory and Applications (IVAPP-2014), pages 267-275

ISBN: 978-989-758-005-5

 2014 SCITEPRESS (Science and Technology Publications, Lda.)

Figure 1: Visual analysis of published data: one scenario of application for automated extraction and visualisation of data. (a)

a pdf with a bunch of tables containing interesting data. (b) the UI to structure the extracted data from the pdf (c) visualisations

suggested for the current data, grayed-out options cannot be reiﬁed. (d) visual analysis UI allows to reassign mappings of

visual channels (e.g. axes, colors, sizes etc.) to data.

which describes the semantics of visualisations and

of their mapping to the corresponding data. On the

other hand, we introduce the process that drives the

automated visualisation workﬂow. After introducing

relevant work in Section 2, we summarise the com-

plete process in Section 3, carried out in the frame of

the EU funded CODE project (Stegmaier et al., 2012)

. Thereafter, in Section 4, we concentrate on the main

topic, our approach to automatically generating visu-

alisations. In Section 5 we give a brief summary of its

evaluation, and ﬁnally we conclude and outline some

important topics for the ongoing work in Section 7.

2 RELATED WORK

In this section, we overview of most relevant ap-

proaches with focus on methods for describing and

analyzing visualisations in general. We also review

related approaches on visualisation of structured data.

2.1 Semantics of Visualisations

The general method to describe and structure the

Linked Data is called the Resource Description

Framework

(RDF). Semantic description of visual-

isations using RDF is a new research topic and the

literature, up to now, offers just a few related works.

The most signiﬁcant research (Dumontier et al., 2010)

comes from the biomedical domain, called the Statis-

tical Graph Ontology (SGO), and presents a new ap-

proach to manage statistical graph knowledge by se-

mantic annotation of graphs. Describing graphs se-

mantically has been recognized as a very effective

way for querying and analyzing research data visu-

ally (Voigt et al., 2013). Because of the tight coupling

between visualisations and statistical data achieved in

RDF: www.w3.org/RDF/

this way, the querying is considerably simpliﬁed and

the data can be directly mapped onto visualisations.

While SGO provides a sophisticated ground for

describing statistical graphs, some attribute are miss-

ing in this ontology to enable automated visualisation

like the attribute datatype, instantiation occurrence,

etc. (more details about this in Section 4.1). These

attributes may be useful to build a model that deﬁnes

the mapping between data and visualisations.

2.2 Visualization and Analysis of Papers

Close to our concept of analysing scientiﬁc facts con-

tained in research publications, Utopia Documents

(Attwood et al., 2010) developed a PDF reader to sim-

plify searches for relevant information. This intelli-

gent tool provides additional data (links to suitable

web resources and metadata) and methods to inter-

actively analyse the content of the evaluation results

contained in papers (in table to be exact). For the vi-

sualisation, the table content and ﬁgures are enriched

with provenance information. Visualizations are cre-

ated following a very common principle: columns are

statically mapped to axes of a scatterplot. There is

no vocabulary deﬁning the structure of this data and,

therefore, no intermediate model which would allow

to make more sophisticated visualization mappings.

2.3 Visualisation of RDF Data Cubes

An approach which is very similar to our Vis Wizard

is realized within the scope of the CubeViz Frame-

work (Salas et al., 2012). This framework supports

the visualisation and visual querying for statistical

data sets, stored in form of the RDF Data Cube. How-

ever, in contrast to our approach, there is no support

for automated suggestions of possible visualisations

and also no support for large cubes (i.e. cubes hav-

ing varying number of dimensions and multiple mea-

IVAPP2014-InternationalConferenceonInformationVisualizationTheoryandApplications

268

sures). Furthermore, the framework does not rely on

semantic description of charts and it offers just few

very ordinary chart types.

3 WORKFLOW OVERVIEW

3.1 The CODE Platform

The CODE project

offers a platform to structure re-

search data and release them as Linked Data (Bizer

et al., 2009). Linked Data describes methods to pub-

lish and to interlink structured data (meta-data) on

the World Wide Web. The intent of these methods

is to connect data with semantic technologies, mak-

ing them automatically readable by computers. In the

CODE project, Linked Data act as a basis to publish

and to interlink research data (Seifert et al., 2013),

thereby strongly focusing on their content and not

only on structural attributes.

Figure 1 shows the CODE approach to organize

and analyze research publications. To bring published

data to the hands of the user in an intuitive man-

ner, CODE envisions a workﬂow with the following

major steps: (a) Data Extraction, (b) Integration and

Aggregation, and (c) Analysis and (d)Visualisation.

Automation of this workﬂow is essential for ana-

lysts who have to integrate huge amounts of research

knowledge in short time. For instance, the ﬁrst two

steps deal with the automated extraction and integra-

tion of the research knowledge into a common meta-

model (in further text, vocabulary - e.g. from the

unstructured text stored in the PDF format, to struc-

tured LOD in RDF), while the last step offers the au-

tomated support for visualising that knowledge. Sec-

tion 4 looks further at the process to suggest visuali-

sations.

3.2 Visualisation Workﬂow

Consider the following scenario: While looking at

a publication, Jane feels overwhelmed with numbers

spread across tables throughout the pages. To make

sense of these data, she quickly exports it to the

CODE visual analysis tools. Before visualizing the

data, Jane has to specify the dimensions and measures

(in further text, RDF Data Cube Components) of the

data. The Vis Wizard then suggests appropriate visu-

alisations, which Jane can then ﬁne-tune to her liking.

Figure 1 shows how the above scenario is realised

with CODE components. The ﬁrst step, extracting

the data, is automated by the CODE PDF Extractor

CODE: code-research.eu/

(Klampﬂ et al., 2013) , which extracts tables from

scientiﬁc publications. In the second step, export-

ing the data in an appropriate format the CODE Data

Extractor (Schlegel et al., 2013) is used to semanti-

cally annotate the table (i.e. specify dimensions and

measures, and their types), producing an RDF Data

Cube. We chose the RDF Data Cube Vocabulary

(RDF-DCV) because it was developed by the W3C

to represent statistical data (e.g. the research results

from tables in a publication) (Salas et al., 2012).

Once a data set is available in the RDF Data Cube

format it is passed to the Vis Wizard. In this third

step, mapping the data onto visualisations, a map-

ping algorithm uses the semantic descriptions of vi-

sual components and the semantic annotations of the

data to suggest visualisations, suitable for that partic-

ular data set. The user only needs to choose a visuali-

sations by pressing one of the enabled buttons and the

chosen visualisation will automatically generated and

displayed. The fourth step, visualising the data set,

allows the user to modify the mapping of visual chan-

nels (i.e. visual attributes such as axes, size or colour

of visual items, etc.) to the structured data. The user

has the option to re-adjust how the data columns are

mapped to the visual channels, whereby only mean-

ingful mappings are permitted. It is also possible to

generate additional visualisations for the same data

set, which are displayed within the same browser win-

dow, empowering the user to analyse different aspects

of a heterogeneous data set in a combined view.

3.3 The Data Representation

The RDF-DCV represents data as a collection of so

called observations, each consisting of a set of dimen-

sions and measures. Dimensions identify the obser-

vation, measures are related to concrete values. For

example: in the dataset for the PAN

scientiﬁc chal-

lenge, that evaluates software for uncovering plagia-

rism developed by different teams. The RDF-DCV

includes a collection of observations with dimensions

describing the teams and with concrete values for the

challenge result (Figure 2 shows a sample visualisa-

tion).

Therefore, using the RDF-DCV, one such obser-

vation is created for each of the statistical values in a

publication. The format guarantees a uniform repre-

sentation for all (unstructured) statistics, thereby en-

abling the Vis Wizard to access data in a standard way

deﬁned by the RDF Data Cube speciﬁcation.

PAN: http://pan.webis.de/

SuggestingVisualisationsforPublishedData

269

Figure 2: The automatically generated visualisation of PAN

Data with Team as dimension and with a Challenge Result

as measure.

4 AUTOMATED

VISUALISATIONS

Once the data are extracted, the Vis Wizard under-

takes the complex task of suggesting only the appro-

priate visualisations. To do so, it relies on semantic

descriptions of visualisations in terms of visual chan-

nels and mappings, supplemented with a process for

semantic mapping.

Figure 3 shows the layered architecture of the

process to automate the suggestion of visualisations.

Similar to the common vocabulary used in CODE to

structure heterogeneous data as Linked Data, we de-

ﬁne a common vocabulary to describe and integrate

visualisations in the aforementioned CODE work-

ﬂow. In the following, we describe involved vocab-

ularies and their relevance to research data and the

visualisation process.

4.1 Describing Visualisations

To support the process of mapping visualisations to

data described in RDF-DCV, we developed a Vi-

sual Analytics Vocabulary (VA Vocabulary

) that de-

scribes visualisations semantically as an OWL

ontol-

ogy. Our semantic description strictly focuses on de-

scribing the visual encoding process, hence we rep-

resent visualisations in terms of their visual chan-

nels (Bertin, 1983). However, instead of pursuing a

thorough speciﬁcation encompassing all known facts

about visual perception as (Voigt et al., 2012), we

concentrate on pragmatic, simple facts that will aid

VA Vocabulary: code-research.eu/ontology/visual-

analytics

Web Ontology Language: www.w3.org/TR/owl-features/

the sensible mapping (e.g. (Mackinlay, 1986)), ex-

tending the description to many different types of vi-

sualisations. Thus, we have separated our VA Vo-

cabulary into two parts: (1) the model of an abstract

visualisation (i.e. an abstract visualisation type) that

captures only the commonalities shared between all

concrete visualisations, and (2) concrete visualisa-

tion models, which capture just speciﬁc information.

Concrete visualisations reﬁne the abstract visualisa-

tion model depending on their type. The abstract vi-

sualisation model speciﬁes most important structural

components that any concrete visualisation may have.

These are:

• Name: Identiﬁcation for a visualisation.

• Visual Channel: A container for data, which

have to be visualised. It contains structural rules

required to correctly map statistical data to vi-

sualisation. For example, a visual channel for a

bar chart is reﬁned to represent its x-axis and

y-axis.

• Description: A collection of non-mandatory

components (e.g. textual description or image

such as SVG ﬁgure for a concrete visualisation).

The difference between concrete visualisations

lies in their reiﬁcation of visual channels. For ex-

ample, a bar chart has only two visual channels,

x-axis and y-axis. According to their type deﬁni-

tion, y-axis always represents a numeric (e.g. dec-

imal, ﬂoat, integer, etc.), whereby x-axis supports

more types (e.g., categorical, string). Further, this vi-

sualization will be suggested only if both visual chan-

nels are provided and have data (we say here, they are

instantiated). In this case, both visual channels are

mandatory for the bar chart. In contrast, parallel co-

ordinates, require at least one x-axis and additional

instances of that visual channel are optional, a char-

acteristic shared by tabular visualisation and the scat-

terplot matrix.

To capture such differences in our VA Vocabulary,

we characterize visual channels with the following at-

tributes:

• Datatype: Deﬁnes a set of primitive datatypes

that a visual channel can support.

• Occurrence: Deﬁnes the cardinality of a visual

channel (i.e. how many instances are allowed for

the concrete visual channel).

• Persistence: Deﬁnes whether a visual channel

is mandatory part of the concrete visualisation or

not.

The occurrence attribute identiﬁes whether a vi-

sual channel can be instantiated only once (e.g. bar

IVAPP2014-InternationalConferenceonInformationVisualizationTheoryandApplications

270

RDF Cube Datasets

Suggested Visualisations

Suggested Mapping

Generated Visualisations

RDF Data Cube Vocabulary

Visual Analytics Vocabulary Visual Analytics Vocabulary Visualisation Technology

(1) Dataset Query

(2) Visualisation Suggestion

(3) Mapping Suggestion (4) Visualisation Generation

Dataset

x-Axis

Country

y-Axis Value

Mapping Vocabularies by common datatypes Mapping by code generators

VOCABULARIES MAPPINGVISUALISATION PROCESS

Figure 3: Main parts of the Visualisation Wizard: automated visualisation process (bottom), vocabularies (middle) and map-

ping vocabularies (top).

chart x-axis and y-axis, see Figure 2) or multi-

ple times (e.g. parallel coordinates x-axis). There

are two different values for this attribute: one and

many. The occurrence many is used for visualising

high-dimensional RDF Data Cubes. In contrast, the

occurrence one deﬁnes a ﬁxed cardinality.

The persistence attribute helps deﬁne more com-

plex cases. For example, a visualisation with three

mandatory and two optional visual channels. Hereby,

the case with the parallel coordinates can be alterna-

tively deﬁned as follows: one mandatory visual chan-

nel with the occurrence one, and another one, which

has an occurrence many and no persistence.

4.2 Suggesting Visualisations

The mapping between both mentioned vocabularies,

the RDF Data Cube and the VA Vocabulary, is a re-

lation from dimensions and measures in the former to

the corresponding visual channels of a visualisation in

the latter. This relation is valid only if the datatypes

of the cube components and visual channels are com-

patible. Datatype compatibility in our context means

having exactly the same primitive datatypes, both

conforming to the XSD datatype deﬁnitions

. Beyond

datatype compatibility, a valid mapping needs to ac-

count for structural compatibility, since visualisations

from the VA Vocabulary may have ﬁxed or varying

number of visual channels. To clarify this, let us con-

sider the bar chart from the Figure 2. It has two visual

channels, x-axis and y-axis, and can visualise data

only if both channels are instantiated. That means, it

can plot the RDF Data Cubes with exactly one dimen-

XSD Datatypes: www.w3.org/TR/2001/REC-xmlschema-

2-20010502/

sion and one measure. The additional requirement is

that both visual channels support datatypes, which are

compatible to datatypes of the RDF Data Cube model.

Visualisations with optional visual channels sup-

port different structures of the RDF Data Cube model

(i.e. number of dimensions and measures). From

these observations, we derive the following require-

ments for a valid mapping:

• Structural Compatibility: The instantiation of

dimensions and measures in the RDF Data Cube

is unbounded. That is, we can deﬁne observations

with arbitrary dimensions and measures. There-

fore, possible instantiation patterns (i.e. in the for-

mat dimension:measure) for each observation are:

(1) 1:1, (2) 1:n, (3) n:1 and (4) n:n. In order to

ﬁnd a valid mapping, we have to ﬁnd in the VA

Vocabulary visualisations with the same instantia-

tion patterns. Not all visualisations have the same

structural deﬁnition as a RDF Data Cube; there-

fore not all visualisations are able to display arbi-

trary Cubes. Thus, only those visualisations with

the instantiation patterns that match the observa-

tions of the RDF Data Cube are candidates for the

valid mapping. This is called structural compati-

bility in our context.

• Datatype Compatibility: The structural compat-

ibility is not sufﬁcient to claim the correctness of

the mapping. Therefore, in addition to the struc-

tural compatibility, the visual channels and related

RDF Data Cube components have to be compati-

ble regarding their datatypes.

If for a given RDF Cube model at least one

visualisation does match both requirements above,

there is a valid mapping and the RDF Data Cube can

be visualised. The following pseudo-code shows the

SuggestingVisualisationsforPublishedData

271

mapping algorithm.

Data: RDF Data Cube

Result: set(mapping suggestions)

get visualisation candidates;

get observation components;

while visualisation candidates exist do

instantiate visual channels;

generate all combinations for datatypes of

visual channels;

while datatype combinations exist do

map combination to instantiated visual

channels;

pack mapping conﬁguration;

if ( ((occurrence matches) and

(persistence matches)) and (type

matches) ) then

add to mapping suggestion set;

else

throw invalid mapping

conﬁguration;

Algorithm 1: Simpliﬁed algorithm for determining

feasible mapping suggestions.

The attributes of visual channels form the basis

to prove both types of the compatibility. For a given

RDF Data Cube, the algorithm returns either a list

of visualisation candidates and concrete mappings be-

tween data dimensions, measures and visual channels,

or nothing. According to the algorithm above, we

ﬁrst get all concrete visualisations from the reposi-

tory and observation components from the RDF Data

Cube. For each visualisation candidate, we instantiate

its visual channels according to the structure of the

observations (i.e. according to the number of exist-

ing dimensions and measures within an observation).

This means, we produce the structure, which is simi-

lar to the one of the observations. Hereby, we achieve

the structural compatibility between both models, but

it needs to be proved. In the next step, we gener-

ate all combinations of the form visual channel →

datatype and pack those combinations as a mapping

conﬁguration. The mapping conﬁgurations are candi-

dates for a valid mapping. In the last step, we verify

each of these mapping conﬁgurations. First, the struc-

tural compatibility is veriﬁed by checking the occur-

rence and persistence attributes of the visual channels.

Based on these attributes, we identify whether a visual

channel is missing or not allowed. The mapping con-

ﬁgurations which pass this step, satisfy the structural

compatibility requirement and are forwarded to type

checking, where datatypes of the visual channels are

compared with the datatypes of the observations. Fi-

nally, only those mapping conﬁgurations, which pass

this last veriﬁcation step, are considered valid sugges-

tions.

5 PRELIMINARY EVALUATION

We performed a preliminary evaluation, principally

to identify usability gaps, but also to explore the re-

action to recommended visualisations as well as the

mapping suggestions. Instead of following the com-

plete workﬂow from extraction to visualisation, we

concentrated on the latter part. Thus we used datasets

that had been previously structured as RDF-DC. This

also insured the independence from other services and

components of the CODE workﬂow, offering a more

controlled evaluation environment.

During the experiment, participants were mainly

exposed to the Query Wizard and the Vis Wizard.

The Query Wizard (H

oﬂer et al., 2013) aims to help

users select relevant data from Linked Data reposito-

ries, and is one of the components for visual analy-

sis inside of the CODE platform. The user only per-

forms a keyword search in the CODE Semantic Web

endpoints and gets the resulting data presented in an

easy-to-use web-based interface.

5.1 Procedure

The evaluation procedure started with a demonstra-

tion of the Vis Wizard. Three different example

datasets were visualized, each with incremental com-

plexity: one dataset with one dimension and one mea-

sure, one dataset with two dimensions and one mea-

sure, and the last one with three dimensions and one

measure. During the demonstration the user received

a description on how a visualisation would be sug-

gested. We also introduced the fact that the Vis Wiz-

ard proposes a visualisation with a default mapping,

and described how to set a new mapping.

After this demonstration, users were presented

with one of ten datasets randomly chosen from the

Query Wizard to visualize it and answered evalua-

tion questions based on a simple analysis task. Every

one of these datasets collects data from the European

Union (EU), referring different statistics to funding

amounts per country. The datasets were constrained

to two dimensions and one measure (country, year,

and funding, respectively), which corresponds with

our mid-level complexity in the initial demonstrator.

The task was chosen to let the participants get a feel-

ing of using the interfaces to obtain data from these

massive repositories. Thus, the task was to analyze

the funding amounts distributed over countries and

IVAPP2014-InternationalConferenceonInformationVisualizationTheoryandApplications

272

Figure 4: The result of the second Task of the evaluation.

ﬁnd the country that was assigned the largest amount

in 2010.

The next task was to ﬁgure out the funding re-

ceived by countries in 2010, in ascending order, from

lowest to largest. This simple task had particular im-

plications, since it forced participants to interact with

the mappings for a given visualisation to ﬁgure out

the results.

5.2 Participants

The heuristic evaluation was performed by ten IT ex-

perts: 8 males and 2 females. Some of the participants

were experienced in the visualisation of Linked Data

whereas the others had little or no experience in this

area. As our Vis Wizard suggests all possible visuali-

sations for a given dataset, and participants were free

to choose different ones, we did not collect any quan-

titative measures. We did, however collect subjective

feedback towards the overall usage of the Vis Wizard,

and the appreciation of interacting with mappings.

5.3 Results

The results of this preliminary evaluation are as fol-

lows:

• All participants found the Vis Wizard easy to use

after a short introduction. They especially liked

the aesthetic of the website, according to their

opinion there is neither too much nor too little in-

formation, buttons or icons on the website.

• The user perception was very good concerning the

limitations on selecting invalid mappings, since

the Vis Wizard only allowed the selection of sug-

gested and valid visualisation combinations.

• The collection of the charts was sufﬁcient for all

users.

• The ﬁrst mapping done by the server was not al-

ways satisfactory by the users. However, the abil-

ity to easily set a new mapping variant changed

this.

• Sometimes there was too much data which has

been visualized so that the identiﬁcation of the

data was difﬁcult. This is the reason why the user

prefers to have the option to take only a part the

data in order to have a clear visual representation.

• The user also wanted to have the option to zoom,

to ﬁlter and to select the data on the visualisations.

During the ﬁrst task of ﬁnding the country which

received the largest funding in 2010, users interacted

for example with parallel coordinates or scatterplot

matrix. For the second task, one example solution

was to select a time based visualisation and organize

the mappings so that year was on the x-axis. As a

result, country data is distributed across the other axis

(see Figure 4).

From our preliminary evaluation, we argue that

automating the visualisations for statistical data can

be very beneﬁcial for target user group (i.e. re-

searchers, students, etc.). In order to cover many

query scenarios, it is necessary to complement

the visualisation-based approach with the traditional

query such as a tabular one, as shown in this evalua-

tion.

6 DISCUSSION

We have described a full analytical process, going

from unstructured data in publications, through ex-

traction and structuring of the data, to visual anal-

ysis. We also leverage the wealth of information

present in the Linked Open Data Cloud, by making

it easily searchable and accessible for visual analysis.

We covered a tool-chain instantiating every stage of

this workﬂow. It is available through our website or

through tools integrated by our partners (e.g. Mende-

SuggestingVisualisationsforPublishedData

273

ley desktop

). The motivation was to visualize scien-

tiﬁc data from publications. However, our tools are

not constrained to the scientiﬁc domain, they can also

be deployed in any other domain that requires extract-

ing data from published text, such as governmental

reports. Many institutions require manuals for main-

tenance and ﬁnding the threshold numbers, for exam-

ple, for a calibration procedure, is always a tedious

task. The application scenarios for the technology we

propose in this paper span numerous areas, both sci-

entiﬁc and industrial. Although we have numerous

datasets from published Open Data (e.g., EU Open

Data), ad-hoc analysis of arbitrary publications is lim-

ited by the data extraction process (precision 79%, re-

call 76%, on ICDAR dataset

). For visual analysis,

the Vis Wizard can combine visualizations in a single

view, but it cannot yet suggest sensible combinations,

and interaction across views is limited.

7 CONCLUSIONS

Organizing and analyzing research publications using

current technologies of digital libraries remains a te-

dious task. The continuous increase of the published

content drives a need to ﬁnd more effective solutions

to manage that content.

In this paper, we have outlined and instrumented

a workﬂow, whereby the research data has to traverse

several stages, starting from the original and unstruc-

tured text to its ﬁnal structured form and visualisa-

tion. The essential aspect of the this approach is the

automated support for this workﬂow. Automating the

visualisations allows users to easily ﬁnd and to ana-

lyze research data. In this context, we have developed

a common vocabulary for deﬁning the visualisations

semantically. Further, in order to identify the match-

ing visualisations for given research data, we have de-

ﬁned a mapping of this vocabulary to the existing vo-

cabulary of that data. Based on these vocabularies and

their mapping, we are able to automatically suggest

visualisations.

The development of the Vis Wizard will continue

throughout the rest of the year and includes the on-

going topic, visualisation reﬁnement. In further the

user should have the possibility to aggregate, to ﬁlter

and to select the data for the visualisations. These re-

ﬁnements will depend on the visualisation features to

serve users with intelligent processing options.

Mendeley: www.mendeley.com/

ICDAR: dag.cvc.uab.es/icdar2013competition/

ACKNOWLEDGEMENTS

This work is partially funded by the EC 7th Frame-

work projects CODE (grant 296150) and EEXCESS

(grant 600601). The Know-Center GmbH is funded

within the Austrian COMET Program Competence

Centers for Excellent Technologies of the Aus-

trian Federal Ministry of Transport, Innovation and

Technology, the Austrian Federal Ministry of Econ-

omy, Family and Youth and by the State of Styria.

COMET is managed by the Austrian Research Pro-

motion Agency (FFG).

REFERENCES

Attwood, T. K. et al. (2010). Utopia documents: linking

scholarly literature with research data. Bioinformatics,

26(18).

Bertin, J. (1983). Semiology of graphics. University of Wis-

consin Press.

Bizer, C. et al. (2009). Linked data - the story so far. Int. J.

Semantic Web Inf. Syst., 5(3):1–22.

Dumontier, M. et al. (2010). Modeling and querying graph-

ical representations of statistical data. Web Semant.,

8(2-3):241–254.

oﬂer, P. et al. (2013). Linked data query wizard: A tabular

interface for the semantic web. In ESWC (Satellite

Events), pages 173–177.

Klampﬂ, S. et al. (2013). An unsupervised machine learning

approach to body text and table of contents extraction

from digital scientiﬁc articles. In International Con-

ference on Theory and Practice of Digital Libraries

2013, Valetta, Malta.

Mackinlay, J. (1986). Automating the design of graphical

presentations of relational information. ACM Trans.

Graph., 5(2):110–141.

Mutlu, B. et al. (2013). Automated visualization support for

linked research data. In I-Semantics 2013.

Powell, A. et al. (2005). Dublin core metadata initia-

tive - abstract model. White paper, Eduserv Founda-

tion, UK, KMR Group, CID, NADA, KTH, Sweden,

DCMI.

Salas, P. et al. (2012). Publishing statistical data on the web.

In Semantic Computing (ICSC), 2012 IEEE Sixth In-

ternational Conference on, pages 285–292.

Schlegel, K. et al. (2013). Trusted facts: Triplifying primary

research data enriched with provenance infoamtion. In

ESWC 2013.

Seifert, C. et al. (2013). Crowdsourcing fact extrac-

tion from scientiﬁc literature. In Workshop on

Human-Computer Interaction and Knowledge Dis-

covery, Maribor, Slovenia. Springer.

Stegmaier, F. et al. (2012). Unleashing semantics of re-

search data. In Second Workshop on Big Data Bench-

marking, Pune, India.

IVAPP2014-InternationalConferenceonInformationVisualizationTheoryandApplications

274

Voigt, M. et al. (2012). Context-aware recommendation of

visulization components. In The Fourth International

Conference on Information, Process, and Knowldege

Management.

Voigt, M. et al. (2013). Capturing and reusing empirical vi-

sualization knowledge. In 1st International Workshop

on User-Adaptive Visualization.

SuggestingVisualisationsforPublishedData

275