DISCOVERY CHALLENGES AND AUTOMATION FOR
SERVICE-BASED APPLICATIONS IN GRID
Serena Pastore
Astronomical Observatory of Padova, National Institute of Astrophysics, vicolo Osservatorio 5, 35122, Padova, Italy
Keywords: Service discovery, semantic technologies, grid technologies, web services, service discovery.
Abstract: Discovery is a necessary task; any modern distributed system must provide this for searching and finding
resources in the network according some criteria. There are many solutions for providing such tool for grid
and web service environments that are essentially based on a directory service as a specialized optimized
database. One great challenge in such complex distributed networks is that the effective automation of
process usually fails. This paper describes the discovery issue for WSDL-based applications exported in a
specific grid system by analyzing different software solutions typical to grid and web service areas. The
need for automation can be partially solved with the introduction of semantic technologies that may be
applied to the provider-client interaction to semantically describe the resource or directly to the registry,
allowing a semantic discovery for both client and provider. Several research projects are developing
software tools that will be able to be used to test the efficacy of such solutions.
1 INTRODUCTION
Discovery is a necessary task that any modern
distributed system must provide. Its aim is to allow
both users and applications to search and find
resources in the network according to some criteria.
Architectural implementations used in distributed
infrastructures usually offer a mixed environment of
grid system (Foster, I., Kesselman, C., 2003) and
web service frameworks (Cerami, E., 2002). In the
context of a project that studied the porting of
astrophysical applications into grid (Benacchio, L.,
et al., 2005), the discovery challenge has been arisen
as a key element. Once deployed in grid, any
software becomes a grid resource. Therefore it
should be searched and found by each grid user or
application throughout the distributed system. A
survey of the existing discovery solutions indicates
that there are many different ways to provide this
tool. Most of them rely on a directory service which
is a specialized database optimized for reading,
browsing and searching information to be stored.
Each method places different requirements on how
the information can be referenced and queried.
However the main challenge in this complex
environment is that the automation usually fails,
requiring a manual investigation. Semantic
technologies (Daconta, M.C., et al., 2003) aim to
solve the automation issue that is a requirement of
any discovery method. The paper describes different
approaches followed in allowing discovery for Java
web applications exported in a grid system as a set
of web services. The solutions cover the methods
used by grid information systems and also the
software implementation of web service standards as
a complementary method. Semantics may be applied
to the provider-client interaction to semantically
describe the resource or to the registry process for
allowing a semantic discovery. Many research
projects are in the process of developing software
tools that will be able to be used in order to prove
the feasibility of the solution in this specific
scenario.
2 DISCOVERY METHODS FOR
SOFTWARE RESOURCES
Grid and web services environments use different
approaches in discovery problem solving.
Information is distributed, meaning that it is spread
across many disseminated machines, all of which
cooperate to provide the distributed system. The
focus is on the methods available for discovering
333
Pastore S. (2007).
DISCOVERY CHALLENGES AND AUTOMATION FOR SERVICE-BASED APPLICATIONS IN GRID.
In Proceedings of the Third International Conference on Web Information Systems and Technologies - Internet Technology, pages 333-336
DOI: 10.5220/0001264403330336
Copyright
c
SciTePress
software resources like web applications. The
hosting environment is a Java Web Services
framework composed of an application and service
engine (Apache Tomcat, http://tomcat.apache.org
plus Apache Axis, http://ws.apache.org/axis). It
manages both the HTTP transport protocol used for
messages exchange and the structure of the
messages involved in the transaction specified by the
SOAP protocol (www.w3.org/2000/xp/Group/). The
whole framework is in turn deployed on a grid
machine that acts as a resource provider for grid
users and applications. The node is a component of a
grid site which contributes to form the INFN grid
(http://grid-it.cnaf.infn.it) part of the EGEE
(http://www.eu-egee.org) grid infrastructure. This
grid system, built on the gLite software
(http://www.glite.org), is logically organized
according to the EGEE structure (EGEE JR1, 2005)
in Virtual Organizations (VOs), each one consists of
sites that through physical machines provide grid
logic functionalities.
2.1 The Grid Web Application
Distributed technologies are largely used in the
astrophysical context both as interoperable web
services applications and grid applications. A global
framework (the Virtual Observatory or Vobs)
(McDowell, J.C., 2004) has been proposed to
provide a uniform and controlled access platform to
generic astronomical resources or VObs resources.
The use case of the web application has been
developed (Volpato, A., et al., 2004) as a VObs
resource. It consists of a set of Java Web Services
implementing specific querying tasks (i.e. a specific
selection with SQL commands) to an astronomical
catalogue. The application is described by its WSDL
(http://www.w3.org/TR/wsdl) interface. The WSDL
document says, in XML language, what operation
the service supports and how to invoke it. It gives
information about the data types used (types
element) for all exchanged messages (message
element), the operations performed by the service
(portType element), and the communication protocol
used for these operations (binding element). The set
of related endpoints (service element) are further
specified, making (port element) the combination of
binding and network address useful as an access
point. Figure 1 shows a list of such services
available through the web, meaning that their
automatically-generated descriptions are accessible
by URL. The approaches for the discovery tasks
should consider the mixed environments; thus both
grid solution and web service specifications based
implementations have been analyzed (fig. 2).
Figure 1: Examples of deployed web services accessible
by a web URL.
2.2 The Different Mechanisms
Any solution in a distributed environment should
provide a schema to describe the resources, a
repository to store the information, a query language
to interrogate them and a protocol to interact with
them. Grid resources are mainly described by using
the GLUE schema
(http://glueschema.forge.cnaf.infn.it). This schema
specifies the main features by attributes that are used
as keywords in the discovery process. Usually a grid
job submission includes the job’s requirements in a
file expressed in a specific language (Pacini, F.,
2003) that uses GLUE attributes as possible values
for the expressions. The method allows grid
components to select a resource by performing a
match between client requirements and the available
resources published by the grid information system
(IS). Each software resource, for example, is
identified with a specific string (the
RunTimeEnvironment attribute) representing its
name; it is also associated to a grid node. The
current grid IS is based on an LDAP directory
service making use of the OpenLDAP software
(http://www.openldap.org). It realizes (EGEE JR1,
2005) a hierarchical structure composed of a set of
distributed index servers (or BDII) that maintain the
list of the site’s Globus MDS2 systems. The MDS2
system consists of components (GIIS/GRIS)
working together to gather information coming from
each node as entries in an LDAP information tree
(DIT) with attributes and values. Searching tools are
comprises of what is available from the software,
together with some middleware toolkits. Each
component of the hierarchy may be queried, and the
search is based on filtering entries attributes.
WEBIST 2007 - International Conference on Web Information Systems and Technologies
334
Figure 2: Different discovery components deployed in the
grid system within each site.
However, the gLite toolkit is adopting the R-GMA
(Relational Grid Monitoring Architecture,
http://www.r-gma.org) system as its IS.
Implemented as a Java web application on a site
grid node, it refers to a central registry listing all the
deployed systems. The method realizes a consumer-
producer model and describes resources as tables in
a relational database. The query language is thus
based on SQL. Available searching tools consist of a
browser, a command line interface that supports
single query and interactive modes and other
commands. The resource schema adopted is the
same as in the previous solution (the R-GMA system
is tagged as a software resource with the R-GMA
string), but the communication protocol is different
(LDAP vs. HTTP).
Web services standards, according the web
services architecture that uses a consumer-producer-
registry model, focus instead on a registry solution
to follow the OASIS UDDI (http://www.uddi.org)
specifications. By using the Apache jUDDI software
(http://ws.apache.org/juddi) implementation, the
registry is deployed (Pastore, S., 2005) in a grid
node as a complementary approach to the discovery
software resources. All information is stored as
database tables like the R-GMA model, but the
UDDI data model fully describes this kind of
resource by specifying the provider (businessEntity)
and its services (businessService), each of which is
accessed via a number of bindings to protocols and
physical locations (bindingTemplate). The UDDI
objects refer to a technical models (tModel) structure
that is a mechanism used to identify property
namespace and categorization schemes. Search in
UDDI is based on property-based lookup (i.e. the
specific properties of a provider) or on
categorization and classifications according to
specific schemes (i.e. industry classification).
tModels are also used as references in the mapping
of WSDL features into the UDDI structure
(Colgrave, J., 2004). Searching tools are web
browsers and APIs, allowing operations to interact
with it that use the HTTP protocol and essentially
the SQL language. Table 1 summarizes the main
common and differing features of the three methods.
While the GLUE schema and the related methods
are not sufficient to exploit software functionalities,
UDDI data structures are not easily included in the
grid schema. Moreover the solutions do not
guarantee the automation of the discovery process.
Table 1: Summarization of main common and differing
features in the three methods.
Methods Common Differences
BDII/MDS2
vs. R-GMA
Glue Schema
DIT/ table model;
LDAP/HTTP; ldap
and gLite
commands/SQL
R-GMA vs.
UDDI
HTTP; tables
model; SQL
Glue Schema/UDDI
data model
3 AUTOMATION AND
SEMANTICS
All the analyzed systems require a human
intervention in the process of web application
discovering. Even if WSDL described capabilities
and its features may be integrated into a registry,
further discrimination is done by the manual
inspection of the service description. The same
manual activity is done using the grid discovery
system. Automation challenges are partially solved
with semantic technologies
(http://www.w3.org/2001/sw), a set of standards and
tools able to provide machine-processable
descriptions of the information. Each resource is
described according a semantic model in terms of
classes (a set of entities), properties and
relationships through a model (i.e. RDF, the
Resource Description Framework) and a schema
(i.e. RDF Schema), while the area of knowledge is
described by an ontology through a specific
language (i.e. OWL, the Web Ontology Language).
In order to consider different domains (astronomical
and web service knowledge), several ontologies may
be combined into a single model. Studies in an
astrophysical context are starting to develop an
OWL-based ontology of astronomy (Shaya., E.,
2006) that could better describe this area. The
Semantic Web Services arm of the DAML
(http://www.daml.org) program is developing a
language (OWL-based web service ontology) and
tools to enable the automation of services. The W3C
DISCOVERY CHALLENGES AND AUTOMATION FOR SERVICE-BASED APPLICATIONS IN GRID
335
(http://www.w3.org) has submitted a specific
language called WSDL-S to associate semantic
annotations with WSDL-based web services. They
are the technologies applicable to the studied
context. This entails two approaches:
- a client-side view adding a semantic description
to the resource (client-provider interaction);
- a server-side view adding a semantic module to
the registry (semantic discovery).
Figure 3: Relations between WSDL, UDDI and OWL-S
and the available converters.
Software tools (http://projects.semwebcentral.org/)
primarily developed by the Software Agents Group
(http://www.cs.cmu.edu/~softagents) at Carnegie
Mellon University (http://www.cmu.edu) are going
to being used to test the feasibility of the various
solutions. A WSDL2OWL-S converter provides a
partial automatic translation between the two
description languages. It is used to generate the three
ontology models that make up an OWL-S document
(Figure 3) and provide both the discovery
information and, once found, the details needed to
make use of the service. The OWL-S description of
the application and its representation differs from
that provided by UDDI. However, one way to
combine the two efforts has been (Paolucci, M.
2002) to define a mapping between the two data
structures. The mapping relates semantic models to a
UDDI tModels container; it may be automatically
performed by the OWL-S2UDDI software tool
(Figure 3). By this conversion, OWL-S web services
can be registered with UDDI. Furthermore, to
exploit semantic information for the purpose of
discover, UDDI engines need specific software
modules added that handle semantic data (i.e. an
OWL-S/UDDI Matchmaker module that allows for
the processing of the OWL-S description present in
the UDDI advertisement). With this approach a
client discovers the agreed-upon semantic model
using UDDI and loads it over standard HTTP. Then
it locates the OWL document representing the
semantic model by finding the appropriate tModel
and accesses the service category. Having identified
the relevant concepts, it navigates the mappings that
link the model to the required WSDL files.
4 CONCLUSIONS
Discovery in a distributed environment merging grid
systems and web service frameworks has proven to
be a big challenge. The existing methods offer some
characteristics in common according data schema,
protocols, and tools and each method has advantages
and disadvantages in addressing web application
discovery. In all cases they share the same problem
in providing automation. Until the introduction of
semantic technologies, the best mechanism to
facilitate searches will be through property-based
lookup and taxonomic categorization and
classification. With semantics, the web service
resource can be described and thus discovered.
Current research has led to semantic web services
described by different languages like OWL-S and to
semantic discovery which may exploit such
descriptions through the use of UDDI tools. The
availability of software tools that help the
conversion is the basis of this feasibility study aimed
at automating discovery of web software
applications in a grid system.
REFERENCES
Foster, I., Kesselman, C., 2003. The Grid 2:Blueprint for a
New Computing Infrastructure. Morgan Kaufmann.
Cerami, E., 2002. Web Services Essentials. O’Reilly.
Benacchio, L., et al. , 2005. In INAF Grid related activities
in the framework of the Grid.it project Workshop
GRID and e-Coll. for the Space Com., ESA/ESRIN.
Daconta, C.M., Obrst, J.L., Kevin, T., 2003. The Semantic
Web. Wiley Publishing.
EGEE JR1, 2005. EGEE Middleware Architecture.
EGEE-DJRA1.1-594698-v1.0.
McDowell, J.C., 2004. Downloading the sky. IEEE
Spectrum Online.
Volpato, A., et al., 2004. Astronomical database related
applications in the Grid.it project. In ADASSXIV,
Proceedings, Pasadena, California.
Pacini, F., 2003. DataGrid Job Description Language
Attributes, release 2.x, DataGrid-01-TEN-00142-0_2.
Pastore, S., 2005. Searching methods for services: an
UDDI solution for grid and web service environment”.
In ACM Proc. of the Int. Workshop on “P2P&Service
Oriented Hypermedia”, Salzburg, Austria
Colgrave, J., Januszewski, K., 2004. Using WSDL in a
UDDI Registry, Ver.2.0.2. In OASIS UDDI Spec TC.
Shaya, E., Thomas, B., Teuben, P., Huang, Z., 2006. A
Science Ontology for Goal Driven Datamining. In
Astronomy American Astr. Soc. 207th Meeting.
Paolucci, M., Kawamura, T., Payne, T. R., Sycara, T.
2002. Importing the Semantic Web in UDDI. In Proc
of Web Services, E-business and Semantic Web Work.
WEBIST 2007 - International Conference on Web Information Systems and Technologies
336