and communicate best practices of data curation for
other stakeholders (research papers authors in the
last example). The consistent and clearly formulated
framework will make a collaborative data curation
effort much better defined and communicated, and
the best data curation practices more readily adopted
by the research community. Supervision of various
kinds of information through the research lifecycle
will help then to create rich data aggregations and
reproducible research workflows with contributions
naturally made by different lifecycle stakeholders.
The next challenge and opportunity is presented
by the emergence of research services such as the
aforementioned UK National Crystallography
Service. This trend raises questions on the user
management, research proposals management and
data management in facilities science. Just one
example of that are the future role and the content of
data management policies which some facilities tend
to impose on their users as a pre-condition for
getting a facility resource for research. The policy
may ask users to agree with the public release of
their experimental data after a period of exclusive
access (typically a few years), or contain the
requirement to submit the list of resulting
publications back to the facility user office. This
works well in a traditional business model of
facilities science but does not take into account the
emergence of the service intermediaries who may
need to be a subject of the data management policy,
too, so that it becomes a multilateral agreement.
The data management policy format which is
now just plain text is also questionable as it is not
interpretable without a human; this will be likely not
enough for the automated research proposals
management and data release management across
different facilities. The development of licences for
data re-use, or the adoption of suitable ones could
alleviate the problem but licences might need a
proper machine-oriented modelling for policy
enforcement; the indication of what is possible in
respect to structured modelling and automation of
data licences can be seen in the recent formation of
the Linked Content Coalition
(www.linkedcontentcoalition.org) endorsed by the
European Commission and some national
governments. Again, information departments of
large research facilities might consider borrowing
the advanced practices and models of data licensing
for their re-use in facilities science.
Another important consideration is the
interoperability of metadata models and their actual
implementations for different research facilities. The
idealized metadata model for facilities science that
we call Core Scientific MetaData (CSMD)
(Matthews et al., 2012) is derived from a generic
research lifecycle in facilities science:
Figure 1: Generic research lifecycle in facilities science.
The different stages of research lifecycle produce
data artefacts (research proposals, user records,
datasets, publications etc.) that are similar across
research facilities so having a common metadata
model like CSMD seems sensible. However, it may
be applied differently by different facilities; there are
a few CSMD implementations in data catalogues
across Europe by virtue of the ICAT platform
(http://code.google.com/p/icatproject/) but the
model, and the actual use of its elements may vary
among implementations. This may result in extra
design and implementation overheads when we
consider federated services for a few facilities (even
when based on the same software platform), also
there is no guarantee that once we have the federated
solution agreed and implemented, it will be not
affected sooner or later by the diverging business
needs of different participants. The common data
curation framework for facilities science might help
to have these needs permanently monitored, properly
communicated and effectively reconciled thus
serving as a well-structured business analysis
wrapper for technology solutions.
An interesting development that may be
considered a part of the emerging data curation
framework but has exposed certain challenges, too,
is the recent effort of minting Digital Object
Identifiers for investigations performed on ISIS
neutron facility (Wilson, 2012). Having permanent
identifiers minted for particular investigations
(experiments) should be enough for linking them to
datasets and publications but in order to have a
structured and linkable representation of a facility
research environment, other parts of it such as
scientific instruments, experimental techniques,
people, organizations, software, derived data sets
etc. need minting or borrowing identifiers for them,
too. There is currently no sustainability model for
this activity, as well as for the steady production and
support of landing Web pages where the permanent
identifiers (all kinds of them) should ideally resolve
into. The different aspects – modelling,
technological, operational – of the permanent
identifiers management should be an important part
DataCurationFrameworkforFacilitiesScience
213