different Digital Libraries or information
repositories, format it into a single and uniform
metadata language, and provide search facilities over
collected metadata/information. On the other hand,
we consider dynamic aggregation protocols as
methodologies that gather content from different
Digital Libraries or information repositories in real-
time. In the last methodology, the collected
metadata/information is obtained at the moment the
query is submitted to the search interfaces provided
by the systems that contain the information.
It is important to refer the role of service providers
and data providers according to each of these
methodologies. In the static model approach, the
data provider's only responsibility is to provide
metadata. The service provider must collect the
metadata, index it and provide search facilities.
In the dynamic model approach, data providers must
implement extra services, namely search facilities
over the metadata they contain. The role of the
service provider is less demanding, as, according to
this model, it does not need to store the metadata, or
index and provide search facilities over the indexed
content. Service providers act as pure information
aggregators.
Each of these aggregation methods has advantages
and drawbacks, but these concepts will be discussed
in more detail further ahead. At this stage, we intend
to clarify some of the concepts associated with each
methodology and, in the next section, we will
present some related work.
3 RELATED WORK
During the last years, several researchers from all
over the world have been studying the problematic
of aggregating information from different
repositories. The problems that arise are related with
the usage of different metadata to describe multi-
format digital object content and the different
technologies used to implement the information
systems that store and retrieve metadata from
repositories.
3.1 Z39.50
One of the first available protocols to search
simultaneously different databases is the Z39.50, an
American national standard for information
retrieval. It is formally known as ANSI NSO
Z39.50-1995 – Information Retrieval (Z39.50):
Application Service Definition and Protocol
Specification (National Information Standards
Organization, 2003). The main purpose of this
standard is to define a communication protocol to
access and query databases stored in different
computers with different software, facilitating the
process of interconnecting computer systems.
The standard specifies the formats and procedures
involved in the exchange of messages between a
client and server, enabling a Z39.50 client to request
the server to search a database, identify records
which meet specified criteria, and to retrieve some
or all of the identified records.
Z39.50 protocol also defines different record
syntaxes (Library of Congress, 2005), being most of
them variants of MARC records (Library of
Congress, 2007a) and resource format types, such as
mime-types and other file formats.
The Z39.50 Information retrieval protocol is
composed by a group of facilities to access database
information, namely: Initialization, Search,
Retrieval, Result-set-Delete, Access Control,
Accounting / Resource Control, Sort, Browse,
Explain and Termination. These operations specify
the interaction between the Z39.50 client and the
Z39.50 server, defining the services and functions
that can be invoked by the client.
In order to obtain the facilities that a Z39-50 server
supports, the Z39.50 clients can invoke the Explain
facility, and the server will answer with details of
the implementation, a list of databases available for
searching and the schema, record syntax and
element specification definitions supported for
record content retrieval.
Another operation defined in the Z39.50 is the
Browse facility. It is composed by a single service,
Scan. The Scan service is used to scan database
content, as long as the client provides an ordered
term list to scan (subject, names, titles, etc.), a
starting term and a number of entries to be returned.
Using these two facilities, Ray R. Larson developed
a Cross-Domain Information Server, using Z39.50
as the protocol to implement Distributed Resource
Discovery (Larson, 2001). As stated in the article,
the author used the Z39.50 functionality Explain
Database to determine the databases and indexes of
a given server. Then, using the SCAN facility, the
author extracted the contents of the indexes and used
that information to build “collection documents”.
The records were retrieved using probabilistic
retrieval algorithms. Z39.50 also defines a network
protocol to transfer information between the client
and server. Usually, the port number 210 is defined
as the default port for Z39.50 message transfers.
Using port 210 in modern Information Systems
causes a problem in large networks. As network
ICEIS 2008 - International Conference on Enterprise Information Systems
406