of identifiers, that can be used to access content using
several methods exposed in SInBAD’s web services.
5 GRITO
GRITO is a Portuguese digital preservation project,
funded by FCT (GRID/GRI/81872/2006). It aims to
use data grids as a support for digital preservation.
In order to avoid investment in expensive resources
GRITO plans to leverage existing resources in re-
search grids (by extending them to allow their use for
digital preservation) while also creating a grid cluster
exclusively dedicated to digital preservation. The fi-
nal goal of the project will be to integrate both types,
creating a low cost digital preservation grid that can
be used by public institutions.
The grid middleware that will be used is iRODS
5
,
developed by the San Diego Supercomputer Center
(SDSC). iRODS is a grid middleware that provides
management policies in the form of rules. In iRODS
management policies can be changed in run time (Ra-
jasekar et al., 2006), and its functionalities can be tai-
lored to specific needs. New rules can be created by
combining and creating microservices (small proce-
dures written in C programing language). It can also
function in a federation of grid clusters and present
collections contained in them as one, a trait that will
help GRITO integrate grids under the control of dif-
ferent institutions on a single preservation grid.
iRODS offers several means to interact with the
grid. Out of the box provides a set of command line
tools (itools) inspired by known UNIX commands
(like icp to copy files, or irm to remove files). In-
teraction with the grid can also be achieved by using
a PHP client API (Prods), a Java based API (Jargon
6
)
and on Linux systems collections can be mounted as
a regular folder with the use of a FUSE module.
This middleware was not specially designed to
handle some of the specific requirements that digi-
tal preservation scenarios impose, but since it is open
source it could be extended (Barateiro et al., 2008) to
accommodate such requirements.
6 INTEGRATING SINBAD AND
IRODS
Since SInBAD is a deployed production system it is
a requirement of the integration that as little changes
as possible should be made to it. This excludes any
5
https://www.irods.org/index.php/
6
http://www.sdsc.edu/srb/jargon/
possible integration scheme were SInBAD would be
changed, requiring the creation of a transparent digital
preservation system.
In order to place content from SInBAD into the
preservation grid it is possible to adopt a strategy
where no changes are made to SInBAD itself, since
SInBAD already exposes all the needed interfaces as
web services. In the future SInBAD must be able to
recover content from the grid. To achieve that goal
some modifications will have to be made, yet these
changes should be kept only to the essential for con-
tent recovery.
The use of the OAI-PMH allows us to obtain in-
formation about the content that was added to each
collection since a specific date (along with informa-
tion on how to retrieve it). We can then use the in-
formation retrieved with OAI-PMH on how to access
content to use SInBAD’s existing web services. These
web services allow us to retrieve the content stored in
the digital library (we must also retrieve any remain-
ing associated metadata) and place it into the grid,
thus achieving integration without disturbing the ex-
isting SInBAD infrastructure.
In this integration scheme iRODS with an ex-
tended service set (Barateiro et al., 2008) will be
responsible for the preservation of any content har-
vested from SInBAD. To harvest content from SIn-
BAD two approaches can be taken:
• Creation of an iRODS microservice.
• Creation of an Intermediary system.
The merits and drawbacks of each approach will
be discussed in the following sections.
6.1 iRODS Microservice
Creating an iRODS microservice (illustrated by Fig-
ure 2) is an option to achieve integration between
iRODS and SINBAD. iRODS already provides the
necessary tools to enable a rule to be executed peri-
odically, so periodic harvest of content can be consid-
ered a trivial task.
The integration process would become a two mi-
croservices rule:
1. A microservice would contact SInBAD using the
OAI-PMH protocol, generating a list of targets to
be retrieved.
2. A microservice would take as input the list of tar-
gets and use SInBAD’s web services to retrieve
and store content into the grid.
Since iRODS microservices must be written in C
we consider that ideal libraries to deal with these steps
SINBAD DIGITAL LIBRARY PRESERVATION USING IRODS DATA GRID
109