2 PHRONESIS DATA CENTRIC
ARCHITECTURE
Phronesis follows the client server architecture, with
the server as the main component composed of
several internal subcomponents for search, retrieval,
indexing and managing of documents in the digital
library. Clients are World Wide Web interfaces
using HTTP, HTML and PERL Common Gateway
Interfaces (CGI’s) technology.
Phronesis clients are users accessing the system
for performing actions such document search and
retrieval as well as document submission. Three
types of users are presented in Phronesis:
Collection Contributors. They are users that
have the proper permissions to submit documents to
the collection.
Administrators. They are users that maintain a
Phronesis server.
Patrons. They are users who access the server to
search and retrieve full documents.
The server, key component of the system,
performs the following tasks: administration, access
control, physical storage of documents, indexing,
local and distributed search and retrieval.
Functionality for document storage and retrieval is
based on MG (Witten 1999), a powerful research
tool for the compression, indexing and retrieval of
textual documents. We have extended the MG
system in order to provide all the desired
functionality in Phronesis.
Phronesis supports search and retrieval of
English and Spanish documents. The user interface
is also available in both languages. The server
implements five different types of full-text
document and/or metadata search. The search query
can include diacritic characters common in the
Spanish language. When searching words that
contain diacritic characters, the server tolerates
simple mistakes common in Spanish language, such
as the omission of an accent. For example, a search
for the word computing in Spanish will be
performed using the keyword “computación”. If the
keyword “computacion” (accent omitted) is used,
the query will find the same documents as the
previous one. A stemmer algorithm to support
Spanish language is also available as part of the
system. A single Phronesis system can interact not
only with other Phronesis systems but also provides
interoperability support for Z39.50 based libraries
and Open Archives Initiative based libraries.
The current components of the data-centric
architecture of a Phronesis server are a set of
subsystems working together to satisfy services of a
digital library. Since all of these services are tightly-
coupled, making improvements is a time consuming
process that requires a good amount of knowledge
about the internals of the system.
As a result, the Phronesis system has become a
complex, monolithic piece of software, hard to
maintain, with no flexibility to easily evolve. A clear
sign of the inherent problems with the current
Phronesis' architecture is the time that it takes to run
a test case compared with the time it took to run the
same test case in previous versions. Also, it takes
longer to add new functionality. The differences in
time are because Phronesis' components are highly
coupled and with a poor cohesion. This complexity
has an impact in the quality assurance process since
fixing a bug in one component may introduce
problems in another component. Therefore, a
redesign of the architecture was highly needed in
order to support future improvements of the system.
2.1 Analysis of the Data Centric
Architecture
Phronesis' subsystems are not implemented as
independent components with well-defined
boundaries. Phronesis is a highly integrated set of
programs and tools that interact by means of the
shared data in the central repository.
Based on their functionality (Garza, 2003) each
program or tool can be further classified as follows:
Document Searching Subsystem. This subsystem
allows the searching of documents in different and
distributed Phronesis' repositories.
Document Browsing Subsystem. This subsystem
includes all the programs for visualizing the
documents stored in the repository. Visualization is
based on predefined categories such as document
title, year of publication, or author's name.
Document and Library Builder Subsystem. This
subsystem groups the programs for document
processing and index creation that is required during
document searching.
Document Storage Subsystem. This subsystem
includes the programs and file structures required to
assign unique identifiers to documents. It also
includes the programs to save and to retrieve
documents, and to do the preprocessing needed prior
to the indexing of documents.
Management Subsystem. This subsystem
includes programs for the digital library
configuration.
Z39.50 Interoperability Subsystem. This
subsystem includes all the programs that allow
Phronesis to perform queries to/from libraries that
support Z39.50 protocol
According to (Bass,1998), the following criteria
allow to determine the strengths and weaknesses of
different architectural styles in software.
ICEIS 2005 - INFORMATION SYSTEMS ANALYSIS AND SPECIFICATION
496