persistent identifier are uniqueness, which can be
addressed by defining namespaces or using special
identifier generation strategies, and resolvability,
which means that the identifier can be resolved per-
sistently. Further important properties in the context
of PID systems are, e.g. the association of metadata
with the identifier, the ability to incorporate legacy
identifiers or identifiers of other types, or the han-
dling of versioning, granularity and management of
the PIDs (Ball and Duke, 2012). In general, we can
distinguish two categories among the systems as-
signing persistent identifiers: systems that store
metadata associated with the PID and systems that
do not store metadata. The main part of the systems
storing metadata has a basic metadata schema,
which often consists of Dublin Core elements.
The DOI Foundation provides a managed resolu-
tion system for identifiers. A DOI name may be
represented as a URL by prefacing the string
http://dx.doi.org/ to the DOI of the document (e.g.,
the DOI name 10.4232/1.11380, can be resolved by
http://dx.doi.org/10.4232/1.11380).
One of the biggest PID systems is Crossref
(Crossref, 2012), which is mainly registering DOI
names for different literature types. DataCite is reg-
istering DOI names, but their focus is on PIDs for
datasets. DataCite also provides a very general
metadata schema for datasets of all types. Further-
more, several institutions exist, e.g. national librar-
ies, which allow registration of URNs (Daigle et al.,
2002) for publications. We build our system on the
services provided by DataCite, since the purpose of
DataCite is to promote science and research, which
perfectly matches our use cases. Thus, we use DOI
names as PIDs (Hausstein, 2012).
3 METADATA SCHEMA
The main goal of the da|ra information system is the
registration of scientific social and economic da-
tasets and to allow for searching for metadata of
research datasets. Typical data in social sciences is
empirical primary data from survey research, histor-
ical social research and texts for content analyses.
The typical economics data is statistical data collect-
ed with surveys of individuals, companies or states
but also data representing experiment results.
The main requirements when developing the
da|ra metadata schema to describe the data were the
following: (1) Interoperability with other standards
such as the DDI metadata specification (DDI, 2012)
and the Dublin Core Metadata Initiative (DCMI); (2)
Quality assurance of metadata; (3) Sustainability,
e.g. the availability for semantic web applications.
The metadata schema of da|ra is implemented as
XML Schema Definition (Hausstein et al., 2012) and
is partially based on the metadata schema of the
Metadata store of DataCite (Starr et al., 2011). As
we are interfacing with the DataCite services, we
incorporated all required fields of the metadata store
schema in our schema, but also adapted and intro-
duced new fields. The following fields are consid-
ered as the minimal set of fields required for a cita-
tion of a dataset: Title; Principal Investigator; Publi-
cation Agent; DOI; URL; Publication Date. Since
da|ra does not store the data itself but only the
metadata, the mandatory field ‘Availability’ addi-
tionally holds information about the access status of
the dataset.
The da|ra schema includes 28 optional fields to
give users the possibility to describe social and eco-
nomic science data in detail, e.g. by fields such as
Data Collector, Sampled Universe, Sampling, Tem-
poral Coverage, Time Dimension, Collection Mode,
Data, and Publication. These additional fields also
increase the visibility of the datasets and make them
easier to be found by a domain expert.
In the da|ra system, editing of metadata is sup-
ported by controlled vocabularies in order to support
quality assurance and standardization. Hence, some
fields of the da|ra metadata schema accept only
values from controlled vocabularies from the social
and economic sciences, such as TheSoz (Thesaurus
Social Sciences) (Zapilko et al., 2012) or STW
(Thesaurus for Economics)
(Gastmeyer, 1998). For
each controlled field there exists also a free text field
to increase flexibility.
Versioning and granularity are issues in the con-
text of persistent identifiers. In da|ra, we offer a
comprehensive versioning mechanism and let the
publication agents decide how to use it. For exam-
ple, publication agents can register a new DOI name
for each version of the metadata or update the exist-
ing metadata in order to, e.g., remove typos. Publica-
tion agents are also free to decide on the granularity
of the datasets, which means that it is also possible
to assign a DOI name for a package, e.g. a CD con-
taining several datasets.
4 SYSTEM ARCHITECTURE
In this section, we give an overview over the archi-
tecture of the da|ra information system. The architec-
ture of our system is visualized in Figure 1. On the
left, we see the two types of user groups, Publication
Agents and Researchers. The main difference be-
WEBIST2013-9thInternationalConferenceonWebInformationSystemsandTechnologies
156