Managing Discipline-Speciﬁc Metadata Within an Integrated Research

Data Management System

Marius Politze

, Sarah Bensberg and Matthias S. Müller

IT Center, RWTH Aachen University, Templergraben 55, Aachen, Germany

Keywords:

Semantic Web, Linked Data, Knowlegde Graph, Service Oriented Architecture, Web Service, Data Reposi-

tory.

Abstract:

Our university intends to improve the central IT-support for management of research data. A core demand is

supporting FAIR guiding principles. In order to make research data ﬁndable for future research projects, an

application for the creation and storage of structured meta data for research data was developed. The created

meta data repository enable creating, maintaining and querying research data based on discipline-speciﬁc

properties. Since large number of meta data standards exist for different scientiﬁc domains, technologies from

the areas of Linked Data and Semantic Web are used to process and store meta data. This work describes

the requirements, the design and the implementation a the meta data application that can be integrated into

existing research work ﬂows and gives an overview of technical backgrounds used for creating the meta data

repository.

1 INTRODUCTION

Within a research data management system we in-

tend to create IT services allowing researchers to

store, retrieve and work with research data. Since re-

search data is fundamental for scientiﬁc knowledge,

one challenge therefore is the long-term storage and

availability for re-use within future research projects.

Only preserving the raw data is fairly enough to make

it available to future researchers. In order to endorse

good data management the FAIR Guiding Principles

to support discoverability of scientiﬁc data (Wilkin-

son et al., 2016). These principles place requirements

towards both, the actual research data and describing

meta data to allow researchers and their peers the re-

trieval and meaningful interpretation of re-used data.

It is therefore our goal as a university IT-service

provider to foster the creation, consolidation and us-

ability of IT services, resources and infrastructure to

provide additional value to researchers across scien-

tiﬁc disciplines. There is a tension between discipline

speciﬁc use cases show how tailored, decentralized

IT-systems can support scientiﬁc processes (Kirsten,

Kiel, Wagner, Rühle, and Löfﬂer, 2017; Curdt et al.,

2016) and approaches focusing on generic and cen-

tralized support of research process not accounting for

https://orcid.org/0000-0003-3175-0659

discipline-speciﬁc needs (Van Garderen, 2010; Kraft

et al., 2016). Discipline-speciﬁc services often pro-

vide higher value to the researcher, however generic

systems offer higher scalability. To combine both

scalability and individualization we established a dis-

tributed infrastructure that offers services from vari-

ous providers to build a generic system that can be

integrated into discipline-speciﬁc research processes

(Politze, Decker, and Eifert, 2017).

While working with research data, each data set

can be processed in different environments that de-

ﬁne its visibility and accessibility to researchers and

their peers. The domain model formalizes that re-

search data is processed in and transferred between

different domains of access from personal via group

to persistent (Klar and Enke, 2013). Other mod-

els depict research data management as a life cycle

with data passing through different phases of Collec-

tion and Analysis to Preservation and Re-Use (“Re-

search Data Lifecycle”, 2012). The goal of a research

data management system thus is to provide tools for

the researchers to allow transitions between phases

(Schmitz and Politze, 2018). Both models share the

assumption that data is used more actively within ini-

tial phases. Thus, more knowledge about data sets is

available: When research data is produced meta in-

formation like authors, associated research projects

or used instruments are easily available. As the data

Politze, M., Bensberg, S. and Müller, M.

Managing Discipline-Speciﬁc Metadata Within an Integrated Research Data Management System.

DOI: 10.5220/0007725002530260

In Proceedings of the 21st International Conference on Enterprise Information Systems (ICEIS 2019), pages 253-260

ISBN: 978-989-758-372-8

253

passes through the life cycle, this implicit knowledge

is often not transferred and is then lost for peers re-

using the data. In accordance to the FAIR guiding

principles, it is our goal to retain this information.

Meta data associated to data sets can provide imme-

diate value if used by peers within a research group.

Additionally, meta data allows data sets to cross do-

main boundaries to long-term storage and persistence

and forms the basis for re-use of data in future con-

texts.

Bibliographic information like authorship, de-

scriptions or licensing are widely standardized by

meta data schemas like the DataCite Metadata

Schema (DataCite Metadata Working Group, 2017).

Looking at the diverse disciplines of research at

the university, shows that each of these disciplines

poses its own discipline-speciﬁc requirements to-

wards meta data. Recording discipline-speciﬁc meta

data within an integrated system used across dis-

ciplines thus demands a high ﬂexibility from data

models. Models provided by linked data applica-

tions give exactly this ﬂexibility by describing digital

(and also real world) objects using triples of the form

(sub ject, predicate, ob ject).

Based on a previously developed linked data

model (Politze and Decker, 2016), our goal thus is to

provide an integrated application allowing researchers

of various disciplines to describe their research data

using discipline-speciﬁc meta data schemas and trace

their data sets along the research data life cycle

(RDLC). The application should integrate into the ex-

isting research data management system and research

processes within research groups. Considering the

data stored in this kind of repository, we therefore aim

at building a queryable knowledge graph, as proposed

by Galkin et al. (Galkin, Auer, Vidal, and Scerri,

2017) or Decker (Decker, 2017), to make meta data

accessible and interoperable and thus make research

data ﬁndable within and across organization bound-

aries.

2 SOFTWARE ARCHITECTURE

The application and work ﬂows are designed to ﬁt

into an existing decentralized IT-service landscape.

Even though services are provided by different or-

ganizations within the university, they share a com-

mon understanding and nomenclature in terms of the

supported business processes. This is achieved by in-

troducing abstraction layers (see Fig. 1). Individual

services are merged to consistent minimal valuable

processes that are in turn used by different applica-

tions. The connection and integration of the systems

Transcendent Authorization &

Security Layer

Integrated Services

Local

Data

Integrated Business-Processes

PublishArchiveCollab.

Sync &

Manage …

Discipline-Specific Applications

Installed

Applications

Connected

Instruments

Web Applications …

Meta Data:

Knowledge Graph

Data:

Object Store

Identifiers:

PID

…

REST Interface

Figure 1: Overview of the integrated system landscape sup-

porting research processes.

is governed by a transcendent layer providing means

for security and authorization.

2.1 Data Model

The Resource Description Format (RDF)

data model deﬁnes triples of the form

(sub ject, predicate, ob ject) that form graphs of

connected entities. In order to identify data sets

within the graph they need an Internationalized

Resource Identiﬁers (IRI) compatible identiﬁer, that

allows the ﬂexibility to follow the data along the

RDLC. While several ways to create such Persistent

Identiﬁer (PID) exist, the application uses the EPIC

service that in turn is based on the Handle system

(Kálmán, Kurzawe, and Schwardmann, 2012). EPIC

allows creation of PIDs for data set as soon as they

are created. Additional key-value-pairs that are part

of the PIDs’ meta data allow further tracing the

state of data sets. Following this idea, every data

set is identiﬁed by the URI of the PID connecting

data and meta data. Data sets thus are nodes in a

graph of meta data forming a knowledge graph of

linked resources (Bizer, Heath, and Berners-Lee,

2009). RDF offers several serialization formats

like Turtle (Beckett, Berners-Lee, Prud’hommeaux,

and Carothers, 2014), RDF/XML (Gandon and

Schreiber, 2014) and JSON-LD (Sporny, Longley,

Kellogg, Lanthaler, and Lindström, 2014) that

are well suitable for conveying machine readable

information. It is unlikely that researchers author

meta data in these formats due to their complexity.

To provide researchers with a more comprehensive

interface, matching discipline-speciﬁc requirements

and restricted the expressiveness, for each discipline

this is done by gradually identifying the following

entities

Vocabularies provide sets of terms and their rela-

tionships. Relations are often hierarchical but can also

present other structures. Terms refer to digital or real

Unfortunately these entities are ambivalently used

throughout literature and contexts. Within this paper these

deﬁnitions are followed.

ICEIS 2019 - 21st International Conference on Enterprise Information Systems

254

world objects or any other type of concept, includ-

ing but not limited to concepts like numbers, texts or

dates.

Schemas are sets of properties that make up the

meta data and their relationships. A property is an

attribute that describes a certain detail of an entity.

The range of each these properties is a vocabulary.

Standards additionally deﬁne a set of required at-

tributes from one or more schemas in order to fulﬁll

the standard.

(Application) Proﬁles select properties from mul-

tiple schemas and combine them to discipline-speciﬁc

templates. Proﬁles may further narrow the range of

properties or deﬁne default values used in provided

interfaces. Often proﬁles are used to extend standards

with additional properties.

Meta Data (Sets) then are instances of a proﬁle

describing a real world object or a digital data set.

By creating application proﬁles based on common

schemas and standards, meta data remain compatible

and form a consistent graph. There exist several lan-

guages to deﬁne requirements towards RDF graphs

like the SHACL (Knublauch and Kontokostas, 2017)

or SHEX (Prud’hommeaux, Boneva, Labra Gayo, and

Kellogg, 2017) and these are in principle feasible for

building proﬁles. Without loss of generality, a ﬁrst ap-

proach for proﬁles are described in RDF. RDF already

supplies general properties like label and range ,

but additional properties need to be deﬁned to sup-

port consistency or increase usability of generated in-

terfaces, for example: position deﬁnes the order in

which properties appear, calculatedValue deﬁnes a

template for a default value.

As an illustrative and simpliﬁed example, the

code below shows such a proﬁle that combines prop-

erties from two meta data schemas, Dublin Core

(ISO, 2017) and the Core Scientiﬁc Metadata Model

(Matthews and Fisher, 2013):

dc:creator

··a·owl:AnnotationProperty·;

··md:calculatedValue·"{ME}"·;

··md:position·1;

··rdfs:label·"Lab·Technician"@en·;

··rdfs:range·rdfs:Literal·.

dc:title

··a·owl:AnnotationProperty·;

··md:position·2;

··rdfs:label·"Description"@en·.

dc:subject

··a·owl:AnnotationProperty·;

··rdfs:range·<http://udcdata.info/029653>·;

··md:position·3;

··rdfs:label·"Subject·Area"@en·.

:solute

··rdfs:subPropertyOf·

csmd:sampletype_molecularFormula·;→

Store

research data

Gather

research data

meta data

Validate

conformity

Save

meta data

Create

PID

Set meta

data URL

Set data

URL

Researcher

Connected

Instrument

Storage

Service

MetaData

Service

PID

Service

Figure 2: Integration of meta data management in a research

data management process.

··a·owl:AnnotationPropery·;

··md:position·4;

··rdfs:label·"Solute"@en·.

:solvent

··rdfs:subPropertyOf·

csmd:sampletype_molecularFormula·;→

··a·owl:AnnotationPropery·;

··md:position·5;

··rdfs:label·"Solvent"@en·.

The code deﬁnes meta data properties e.g.

creator . Some attributes of a property are deﬁned or

overridden in the proﬁle: calculatedValue , label

and range . The label for the property is overrid-

den to be Author instead of Creator . The range is

deﬁned as Literal , meaning any kind of plain text.

The calculatedValue is converted at application run

time to the name of the user as a default value for the

property. In the case of subject the range is deﬁned

by referencing a ﬁxed sub set of terms from the uni-

versal decimal classiﬁcation vocabulary. solute and

solvent ﬁnally deﬁne two discipline-speciﬁc prop-

erties used to describe conditions chemical experi-

ment.

To satisfy requirements of researchers for

discipline-speciﬁc support, proﬁles need to be care-

fully crafted for each discipline to select adequate

meta data properties. Researchers therefore are to be

guided by librarians and information scientists before

using the application. On the long run this allows the

creation of a consistent ontology the super set of all

proﬁles, schemas and vocabularies.

Each data set is assigned a PID allowing reso-

lution using the handle system. As such, PIDs can

be transferred between applications and are then en-

riched using application speciﬁc capabilities. A pro-

cess, as shown in Fig. 2, is established within the in-

tegrated research data management system. This pro-

cess allows distributed handling of data sets. Appli-

cations only take a speciﬁc role like handling data or

meta data. Applications store their states within the

PIDs’ meta data by which it becomes accessible in

the process. Allowing to trace data sets across differ-

ent phases of the RDLC.

Managing Discipline-Speciﬁc Metadata Within an Integrated Research Data Management System

255

2.2 Functional Requirements

The application will be prototyped as an HTML5

web application. While this provides an easy-to-use

way for researchers to manually register meta data,

future use-cases should instead use a RESTful API

that allows automation and integration into digital re-

search work ﬂows. Nevertheless, the prototype will

use AJAX-Requests and thus the same API intended

for future applications. Depending on the afﬁliation

of the researcher, different proﬁles can be selected.

These are used to create a user interface (UI) and to

validate the input provided by the researcher. Addi-

tionally, to the entered meta data, the application also

records the users’ afﬁliation and selected proﬁle. The

API and prototype therefore need to support the fol-

lowing use cases:

F1: Retrieve Available Proﬁles: The application

should present the user a list of the available proﬁles.

The available proﬁles depend on the afﬁliation of the

user.

F2: Save New Meta Data Set: The application

should validate meta data according to a proﬁle and

save them to the data store. Meta data is converted

from a simpliﬁed JSON serialization to RDF. Data

sets are associated with a PID that identiﬁes the data

set.

F3: Metadata Visibility: A level of visibility for

meta data allows users to make meta data publicly

ﬁndable, to keep it for a research group or just pri-

vately for themselves.

F4: Show All Own Meta Data Sets: All meta data

sets created by the user can be retrieved by the appli-

cation.

F5: Edit Stored Meta Data Set: User who created

a meta data set can also edit the meta data. If the meta

data set is visible all researchers within the research

group should be able to edit the meta data.

F6: Query Stored Meta Data Sets: Researchers

can search the data store to ﬁnd data sets according

to a query. Queries should allow searching within

meta data properties and account for hierarchical rela-

tionships deﬁned by vocabularies. As in F2, the API

should be based on a comprehensive JSON serializa-

tion that is translated to conform the RDF data store.

F7: Suggestions for Vocabulary Ranges: While

searching and editing meta data users may require

suggestions for ranges deﬁned by a vocabulary. The

application needs to give human readable suggestions

according to ranges deﬁned by proﬁles.

F8: Render Meta Data form based on Schema:

The prototype should provide the functionality to cre-

ate an input form from a proﬁle as an easy-to-use way

to store meta data.

Terms & Vocabularies

Meta data sets

Profile labels and

affiliations

Properties from all

meta data schemas

default properties

profileN

Profile overriding meta

data schemas

profile1

…

Profile overriding meta

data schemas

…

Figure 3: Different contexts encapsulate meta data and def-

inition of proﬁles.

2.3 Non-functional Requirements

The prototype should ensure that it can be integrated

within the RDLC and that meta data sets can be trans-

ferred data across domains:

N1: Internationalization: By default all terminol-

ogy should be presented in English. However, the

application should allow switching the language and

support multi language proﬁles and meta data sets.

N2: Compatibility to DCAT: The Data Catalog

Vocabulary (DCAT) (Maali and Erickson, 2014) is

(according to the deﬁnition above) a meta data stan-

dard to allow interoperability of published data cat-

alogs. Meta data sets registered by the application

should be able to be mapped to DCAT to allow future

integration with data repositories.

N3: Dublin Core as Cross Discipline Standard:

Well established cross discipline standards should be

adhered. For example DCAT and Data Cite reuse

ﬁelds from the Dublin Core meta data schema (ISO,

2017). Assuring that meta data is compatible to these

standards allows seamless integration with applica-

tions in other domains of the RDLC.

2.4 Database Model

For the database management system Virtuoso is used

as it allows to efﬁciently store and retrieve RDF

triples. Virtuoso offers a SPARQL (“SPARQL 1.1

Overview”, 2013) endpoint to query or manipulate

the stored triples. Within one database instance Vir-

tuoso supports storing multiple graphs. This property

of Virtuoso is used to save different proﬁles as shown

in Fig. 3.

Queries concerning meta data are run against the

default graph holding information about meta data,

deﬁnitions of terms, vocabularies and labels of de-

ﬁned proﬁles. Additionally, the default graph holds

information about the association of proﬁles with af-

ﬁliated research groups.

A properties graph contains the deﬁnition of all

properties from schemas and therefore forms a super

ICEIS 2019 - 21st International Conference on Enterprise Information Systems

256

set of all deﬁned proﬁles. An explicit record of all

properties, labels and potential ranges deﬁned by any

of the proﬁles is necessary to allow more efﬁcient

computation of suggestions needed for query inter-

faces.

Every proﬁle is an additional graph. This gives the

ﬂexibility to re-use properties from the properties

graph but also allows narrowing down ranges or over-

riding of properties for discipline-speciﬁc applica-

tions. All proﬁles are associated with a URI identify-

ing their current version. The set of proﬁles therefore

forms a vocabulary in the default graph. Likewise,

also afﬁliations are deﬁned as a vocabulary to deﬁne

the ownership of proﬁles and meta data.

3 IMPLEMENTATION

To allow integration within already existing digital re-

search work ﬂows, the main focus of the application

lies on the implementation of a RESTful API. This

API provides a discipline speciﬁc interface for the re-

searchers and translates calls to RDF and SPARQL

accordingly.

3.1 RESTful API

To create new meta data, the API endpoint Create

accepts the meta data in JSON-LD as well as a sim-

pliﬁed JSON serialization. In the simpliﬁed format all

triples within the meta data are assumed to have the

data set as subject. If no PID is speciﬁed as an URL

for the data set a new PID is created and used. Within

the simpliﬁed JSON serialization triples are therefore

reduced to pairs of (predicate, ob ject). Rather than

using the URIs deﬁning the predicates, substitution of

properties by the labels deﬁned in the proﬁle allows

submitting meta data sets as simple JSON key-value-

pairs. Basing on the proﬁle in the example above the

endpoint accepts a document of the form:

{

····"Description":·"Solving·salt·in·water",

····"Lab·Technician":·"John·Doe",

····"Subject·Area":·"http://udcdata.info/030042",

····"Solute"·:·"NaCl",

····"Solvent"·:·"H2O"

}

If the provided values exactly match with the URI

of an object within the range of the property, this ob-

ject is used. Otherwise, the back end will then retrieve

the most likely properties based on similarity accord-

ing to the labels by querying the graph of the proﬁle

using a SPARQL query:

SELECT·?s·WHERE·{

····GRAPH·<profileN>·{

········?s·rdf:label·?label·.

········FILTER·REGEX(STR(?label),·"Value",·"i")·.

····}

}

In order to retrieve a list of all available schemas,

the GetAllProfiles endpoint retrieves a list of pro-

ﬁles available for the afﬁliation of the user. The

GetProfile endpoint then retrieves a single proﬁle

with full information about all used properties.

The API endpoint GetAll allows to retrieve all

stored meta data sets. The method works uses an ad-

ditional properties to assess the afﬁliation and visibil-

ity of the meta data. To achieve this behaviour the

back end translates the request into a SPARQL query:

SELECT·?s·?title·?author·WHERE·{

····GRAPH·<default>·{

········?s·dc:title·?title·.

········?s·dc:creator·?author·.

········?s·md:hasOwner·?owner·.

········?s·md:publishMetadata·?visibility·.

········?s·md:hasAffiliation·?affiliation·.

········FILTER·(

············?owner·=·"#U#"·||

············?visibility·=·md:metadataIsPublic·||

············?visibility·=·md:metadataIsProtected·&&

············?affiliation·=·#A#

········)

····}

}

Where the placeholders #U# and #A# are re-

placed with identiﬁcations of the user and the afﬁli-

ations. The GetAll endpoint then provides URIs of

all described data sets visible for the user. Retrieval

of a single meta data set identiﬁed by its URI is then

performed by the Get endpoint.

Especially for vocabulary ranges, it is necessary

to get valid terms for a property. The GetRange end-

point therefore allows to retrieve those terms match-

ing a query. Especially for hierarchies vocabularies

sometimes use different ways to describe the relation-

ship between the terms. In order to resolve hierarchies

GetRange uses the following strategies:

RDF Schema Class Hierarchies: Given a class C

all other classes ?c that transitively satisfy the condi-

tion ?c rdfs:subClassOf C are within the range.

RDF class instantiations or DCMI Abstract

Model members: Given a class C all terms ?t that

satisfy one of the conditions ?t rdf:type C or

?t dcam:memberOf C are within range.

Simple Knowledge Management Organization

System (SKOS) Instances (Miles and Bechhofer,

2009): Given an skos:Concept C all other

terms ?t transitively satisfying the the condition

?t skos:narrower C are within range.

Managing Discipline-Speciﬁc Metadata Within an Integrated Research Data Management System

257

In order to transitively resolve matching terms, the

SPARQL query additionally needs the TRANSITIVE

option. In the case of SKOS the query with place-

holders for the given Concept #C# and a query #Q#

looks as follows:

SELECT·DISTINCT·?subject·STR(?label)·AS·?label·WHERE{

····?subject·skos:prefLabel·?label

····{

········SELECT·?subject·?y·WHERE·{

············GRAPH·<default>·{

················?subject·skos:broader·?y

············}

········}

····}

····OPTION(TRANSITIVE,·t_in(?y),·t_out(?subject))

····FILTER(?y·=·#C#)·.

····FILTER·regex(STR(?label),·"#Q#",·"i")

}

After storing meta data sets in the repository, re-

searchers furthermore are able to retrieve data sets

based on the provided meta data. The Find endpoint

queries both, meta data sets created by the user and

public meta data sets according to their visibility and

retrieves a list of matching meta data sets according to

a query. This query can either be a full text search or

conform to a query syntax that allows targeting single

meta data ﬁelds. The placeholder #Q# creates such a

full text query:

SELECT·?s·?title·?author·WHERE·{

····GRAPH·<default>·{

········?s·dc:title·?title·.

········?s·dc:creator·?author·.

········?s·?p·?o·.

········FILTER·REGEX(STR(?o),·"#Q#",·"i")·.

····}

}

A little more complex queries can be performed

using a JSON query language supplying labels and

respective values in the form

{

····"property1":·"...",

····"property2":·"..."

}

The algorithm for matching properties that was al-

ready discussed for the Create endpoint is used to

select properties from the proﬁle and a dynamic query

is created to target the speciﬁed properties. At the mo-

ment the Find endpoint only allows logical conjunc-

tion of properties. More advanced queries need an

extension of the simpliﬁed query language or require

the user to formulate them directly in SPARQL.

To build an interface for the user that allows build-

ing this kind of queries, the GetProperties endpoint

retrieves a list of properties from the properties

graph according to a query. Since properties can have

multiple labels within different proﬁles, all labels are

retrieved, however, the label provided by the meta

data schema is highlighted.

Table 1: RDF ranges mapped to HTML5 ﬁeld types.

RDF Range HTML5 Type

rdfs:Literal text

xml:dateTime date

md:metadataVisibility radio

none text

other select

The requirements towards internationalization re-

sult in an optional parameter to supply the language

for all endpoints discussed above. Since RDF sup-

ports internationalization, the provided language, for

example en or de , can then be directly passed to the

data base within a sparql statement. RDF allows the

speciﬁcation of labels with language classiﬁers:

dc:title

··a·owl:AnnotationProperty·;

··rdfs:label·"Title"@en,·"Titel"@de·.

When accesing labels from a SPARQL query a ﬁl-

ter can be supplied to access only labels of a speciﬁc

language:

SELECT·STR(?label)

WHERE·{

····?s·rdfs:label·?label·.

····FILTER·(LANG(?label)·=·"en")

}

3.2 HTML5 Prototype

With the endpoints as a back end the prototype is re-

quired to build a presentation layer that can be eas-

ily used by researchers to manually register data sets

to the system. Depending on label and range of

the properties selected in the proﬁle, the UI should

present different input ﬁelds. For presentation within

an web application Table 1 deﬁnes the mapping be-

tween range and HTML5 ﬁelds type .

Especially the ﬁelds of the type select require

dedicated attention. These ﬁelds present the re-

searcher suggestions based on the range of the prop-

erty. Each term is therefore identiﬁed by its URI that

can be used to link the term deﬁnition within the meta

data set. Instead of displaying the URI to users its

label should be used. Researchers need to be able to

submit a query to ﬁnd a desired term based on its la-

bel. This is done using the

GetRange endpoint. A

fully rendered meta data form is shown in Fig. 4.

The prototype also features a basic search inter-

face allowing researchers to retrieve stored meta data

sets. The interface lets users pick from a list of avail-

able properties and therefore allows building a query

that can be processed by the find endpoint. The

resulting meta data sets and their URIs are then dis-

played to the users.

ICEIS 2019 - 21st International Conference on Enterprise Information Systems

258

Figure 4: Screenshot of genreated UI for a meta data

schema.

Table 2: Block coverage by unit tests.

Area Coverage

SPARQL 95.92% 47/49

RDFWrapperSchema 100.00% 68/68

RDFWrapperMetadata 91.32% 442/484

API 85.50% 171/200

MetadataSchema 94.44% 34/36

Metadata 92.65% 189/204

Total 91.35% 951/1041

4 FIRST EVALUATION RESULTS

The back end application has been tested using an

extensive set of automated white box tests. The test

cases have been developed for various external and in-

ternal methods and cover both error and normal cases.

Table 2 shows that with 91.35% block coverage the

current testing methodology achieves satisfying re-

sults.

The biggest draw back of the current application is

the necessity of deﬁning application proﬁles in RDF.

While RDF powers the ﬂexibility of the application it

has shown to be a quite complex task to ﬁnd adequate

meta data schemas and use them to build the neces-

sary proﬁles. Currently this is done by a joint team of

the university library and the IT Center forming a ser-

vice unit to support the researchers. To further elab-

orate the application it is necessary to at least lower

the threshold of building proﬁles based on available

shemas without in depth knowledge of the underlying

semantic data model.

To allow interoperability with other repositories it

is possible to map the registered meta data to the data

model of DCAT. DCAT therefore deﬁnes three main

classes: Catalog, Dataset, Distribution and Catalo-

gRecord that can be mapped as follows:

Catalog: The application deﬁnes multiple cata-

logs. Each researcher manages a private catalog. By

associating each meta data set with a afﬁliation a cat-

alog for each organization is created. Additionally,

there is the public catalog that contains all meta data

sets that are publicly visible. A meta data set can be

part of multiple catalogs.

CatalogRecord: For each meta data set some

properties like the afﬁliation and user are automat-

ically assessed. This information maps to catalog

records.

Dataset: All properties provided by the researcher

form the data set. This includes minimal information

like author and title but also other discipline-speciﬁc

properties deﬁned in the proﬁle.

Distribution: The PID used as an identiﬁer allows

resolving the research data. The integrated character

of the system allows gathering necessary information.

If the data is accessible this can be retrieved from ad-

ditional properties of the PID.

5 CONCLUSION

The presented application is a building block for a

continuous support of the RDLC. Central leverage

point for the integration into the research work ﬂow

is the usage of PID identiﬁers throughout the research

process. PIDs and meta data are therefore not only

used as a virtual reference for the data set, but are also

used to store references to the data set and directly

query and retrieve them using standardized and open

protocols within the integrated research data manage-

ment system at our university.

By setting a minimal requirement towards the pro-

ﬁles, the application fosters FAIR guiding princi-

ples: Centrally collecting meta data and making pub-

lic meta data sets queryable by local and external re-

searchers using simpliﬁed JSON and complex, stan-

dardized SPARQL endpoints makes the data sets ﬁnd-

able and meta data accessible. All data sets are addi-

tionally assigned a PID to clearly identify them and to

be able to reference data sets throughout the research

process. Using the linked data model based on RDF

also allows sharing and distributing proﬁles and meta

data schemas according to the FAIR guiding princi-

ples.

The application for creating and querying meta

data was successfully launched with several pilot

users at our university and is slowly introduced to

other research groups. It currently fulﬁlls the dis-

cussed requirements. Discipline-speciﬁc proﬁles that

base on meta data schemas are formulated in RDF

and are then made available to the researchers via

the RESTful API and user interface. Researchers

can make use of these easy-to-use interfaces to in-

tegrate the registration of meta data sets at early

phases within the RDLC. We have shown that our ap-

Managing Discipline-Speciﬁc Metadata Within an Integrated Research Data Management System

259

proach to save and manage discipline-speciﬁc meta

data within an discipline-agnostic repository and

database can be successfully implemented as central

and scalalabe service at our university.

REFERENCES

Beckett, D., Berners-Lee, T., Prud’hommeaux, E., &

Carothers, G. (2014). RDF 1.1 Turtle. W3C. Re-

trieved June 10, 2018, from https://www.w3.org/TR/

turtle/

Bizer, C., Heath, T., & Berners-Lee, T. (2009). Linked

Data - The Story So Far. International Journal on

Semantic Web and Information Systems, 5(3), 1–22.

doi:10.4018/jswis.2009081901

Curdt, C., Hoffmeister, D., Jekel, C., Udelhoven, K.,

Waldhoff, G., & Bareth, G. (2016). Implementa-

tion of a centralized data management system for

the CRC Transregio 32 ’Patterns in Soil-Vegetation-

Atmosphere-Systems’. In C. Curdt & C. Wilmes

(Eds.), Proceedings of the 2nd Data Management

Workshop (pp. 27–33). Kölner Geographische Ar-

beiten. doi:10.5880/TR32DB.KGA90.6

DataCite MetadataWorking Group. (2017). DataCite Meta-

data Schema Documentation for the Publication and

Citation of Research Data v4.1. doi:10.5438/0014

Decker, S. (2017). Rethinking access to Scientiﬁc

Knowledge: Knowledge Graphs. Retrieved Febru-

ary 3, 2018, from https://www.linkedin.com/

pulse/rethinking-scientiﬁc-knowledge-graphs-stefan-

decker/

Galkin, M., Auer, S., Vidal, M.-E., & Scerri, S.

(2017). Enterprise Knowledge Graphs: A Seman-

tic Approach for Knowledge Management in the

Next Generation of Enterprise Information Systems.

In Proceedings of the 19th International Confer-

ence on Enterprise Information Systems (pp. 88–98).

doi:10.5220/0006325200880098

Gandon, F., & Schreiber, G. (2014). RDF 1.1 XML Syn-

tax. W3C. Retrieved June 10, 2018, from http://

www.w3.org/TR/rdf-syntax-grammar/

ISO. (2017). Information and documentation - The Dublin

Core metadata element set. Geneva, Switzerland:

ISO.

Kálmán, T., Kurzawe, D., & Schwardmann, U. (2012).

European Persistent Identiﬁer Consortium - PIDs für

die Wissenschaft. In R. Altenhöner & C. Oellers

(Eds.), Langzeitarchivierung von Forschungsdaten

(pp. 151–164). Berlin, Germany: Scivero Verl.

Kirsten, T., Kiel, A., Wagner, J., Rühle, M., & Löfﬂer, M.

(2017). Selecting, Packaging, and Granting Access

for Sharing Study Data. In M. Eibl & M. Gaedke

(Eds.), INFORMATIK 2017: Digitale Kulturen (pp.

1381–1392). GI Edition Lecture Notes in Informatics

Proceedings (LNI). doi:10.18420/in2017_138

Klar, J., & Enke, H. (2013). Projekt RADIESCHEN:

Rahmenbedingungen einer disziplinübergreifenden

Forschungsdateninfrastruktur, Report “Organisation

und Struktur”. doi:10.2312/RADIESCHEN_005

Shapes Constraint Language (SHACL). (2017). W3C. Re-

trieved June 10, 2018, from https://www.w3.org/TR/

shacl/

Kraft, A., Razum, M., Potthoff, J., Porzel, A., Engel, T.,

Lange, F., . . . Furtado, F. (2016). The RADAR Project

- A Service for Research Data Archival and Publica-

tion. ISPRS International Journal of Geo-Information,

5(3), 28. doi:10.3390/ijgi5030028

Data Catalog Vocabulary (DCAT). (2014). W3C. Re-

trieved June 10, 2018, from http://www.w3.org/TR/

vocabdcat/

Matthews, B., & Fisher, S. (2013). CSMD: the Core Scien-

tiﬁc Metadata Model. Retrieved June 10, 2018, from

http://icatproject-contrib.github.io/CSMD/csmd-

4.0.html

SKOS Simple Knowledge Organization System Refer-

ence. (2009). W3C. Retrieved June 10, 2018, from

https://www.w3.org/TR/skos-reference/

Politze, M., & Decker, B. (2016). Ontology Based Se-

mantic Data Management for Pandisciplinary Re-

search Projects. In C. Curdt & C. Wilmes

(Eds.), Proceedings of the 2nd Data Manage-

ment Workshop. Kölner Geographische Arbeiten.

doi:10.5880/TR32DB.KGA96.10

Politze, M., Decker, B., & Eifert, T. (2017). pSTAIX - A

Process-Aware Architecture to Support Research Pro-

cesses. In M. Eibl & M. Gaedke (Eds.), INFOR-

MATIK 2017: Digitale Kulturen (pp. 1369–1380).

GI Edition Lecture Notes in Informatics Proceedings

(LNI). doi:10.18420/in2017_137

Shape Expressions Language 2.0. (2017). Retrieved from

http://shex.io/shex-semantics/

Research Data Lifecycle. (2012). Retrieved February 13,

2019, from https://www.ukdataservice.ac.uk/manage-

data/lifecycle

Schmitz, D., & Politze, M. (2018). Forschungsdaten man-

agen – Bausteine für eine dezentrale, forschungsnahe

Unterstützung. o-bib. Das offene Bibliotheksjournal,

5(3), 76–91. doi:10.5282/o-bib/2018H3S76-91

SPARQL 1.1 Overview. (2013). W3C. Retrieved June

10, 2018, from https://www.w3.org/TR/sparql11-

overview/

Sporny, M., Longley, D., Kellogg, G., Lanthaler, M., &

Lindström, N. (2014). W3C. Retrieved June 10, 2018,

from https://www.w3.org/TR/json-ld/

Van Garderen, P. (2010). Archivematica: Using mi-

croservices and open-source software to deliver a

comprehensive digital curation solution. In A.

Rauber (Ed.), Proceedings of the 7th International

Conference on Preservation of Digital Objects (pp.

145–149). Books@ocg.at. Vienna, Austria: Österre-

ichische Computer Gesellschaft.

Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J.

J., Appleton, G., Axton, M., Baak, A., Mons, B.

(2016). The FAIR Guiding Principles for scientiﬁc

data management and stewardship. Scientiﬁc data, 3.

doi:10.1038/sdata.2016.18

ICEIS 2019 - 21st International Conference on Enterprise Information Systems

260