bookkeeping of the content storage. To take a simple
example, the metadata is just like the super block
and inodes and the journal database of a file system;
and the grid storage resource is like the storage
block of a file system. Almost all of the digital
library operation are tightly related with querying
and update the ARCO metadata.
volume: name, id, mirror, mirror-info, date,
username, status, log;
grid node: domain-name, ip, volume-id, status,
log;
file system: dynamically query from LDAP
directory;
book: name, id, size, grid-node-ip, full-path-
name, status, date, log;
4.2 Operation example: Insert new
book into ARCO system
Qualified operators with administrator rights can
insert new digital books or files into ARCO system.
When the operator submits an “insert new book” job
to ARCO system, some metadata item must be
clearly defined: (Inside quotation mark, we compare
it with traditional library operation).
- Book name (Book name in traditional library);
- Source server domain name, full path (Where
is the book);
- Volume id (Which catalogue section the book
belong to? i.e. Society or Law or Science.);
- Insert (This should be a new book).
Then the ARCO system will look up the
PostgreSQL database to make sure what grid nodes
are belong to this Volume and if there is another
mirror Volume:
- Grid nodes domain name list of this volume
(Bookshelves belong to this catalogue section);
- Grid nodes domain name list of mirror
volume.
Then we query the LDAP server in each Grid
node and get the basic information:
- All available file systems and free space (The
free space in available bookshelves);
Then make choice of one proper server and file
system according to specific policy, i.e. largest one
first, or the first available first. The next step is to
copy the file from the source entry point to the
destination. If this volume has mirror, the book will
also copy into the mirror volume. If the copy
operation is successful then insert the metadata into
the PostgreSQL database:
Book name, id, size, grid-node-ip, full-path-
name, status, date, log, and operator.
Books, which reside inside a volume, can be
indexed, looked up and browsed through web
services. Sometimes the operation is to insert a new
book, but there is already a book with the same
name in this volume. Under this case, the system
will send out warning message and do nothing.
There is a book update operation in ARCO for
update an old version. But the system allows the
storage of different book with the same name but in
different volumes.
4.3 Operation example: Volume copy
In ARCO system, many basic operations can
combine with each other to compose of complex
semantics structures and sentences. It is like in our
natural language, when we want to express ourselves
or generalize complex imagination, we always
construct our thinking by using basic idea unites. In
ARCO system, volumes don't share grid nodes, so
that a volume is logically and physically
independent with its copy. After copying, the
duplicated volume can be labelled as the mirror of
the main volume; otherwise, the volume copy
operation can just act as an intermedium step of
more complex operations.
A volume can be very large, it can include many
grid nodes, each node can have many file systems,
and each file system can have many books stored.
The volume copy operation is time consuming; it
can take very long time to finish, so we have
programmed its executing in batch mode.
The first step of volume copy is to query the node
name list of the destination volume, and then to each
grid node, send LDAP query and get file system
details and free space information about file
systems. The second step is to make a decision if the
total free space of the destination volume can hold
all the new files, after that, create a job description
record for each book. This record has enough
information, so that basic functions of insert a new
book operation can be reused in here. All the job
description records are stored in a link list structure,
the link list can be written to a file and late read back
from the file. The job link list is the basis of the
batch mode execution. After the file transferring
finished successfully, the final step is to update the
metadata information about the new volume and all
its books.
ARCO: A LONG-TERM DIGITAL LIBRARY STORAGE SYSTEM BASED ON GRID COMPUTATIONAL
INFRASTRUCTURE
49