2 RELATED WORK
As stated in (Andrade, H., Kurc, T., Sussman, A.,
and Saltz, J., 2001)( Beynon, M. D., Sussman, A.,
Catalyurek, U.,. Kurc, T., and Saltz, J., 2001)( Dail,
H., Sievert, O., Berman, F., Casanova, H., YarKhan,
A., Vadhiyar, S., Dongarra, J., Liu, C., Yang, L.,
Angulo, D., Foster. I., 2003), several research
projects have been developed distributed data
management over the Grid. The Active Proxy-G
service (Alpdemir, N., Mukherjee, A., Paton, N. W.,
Watson, P., Fernandes, A. A. A., Gounaris, A.. and
J. Smith., 2003) is dedicated to be able to cache
query results, use these results for the parts of a
query that cannot be produced from the cache, and
submit the sub-queries for final processing at
application servers that store the raw data sets.
(Rodr´ guez-Mart´ nez, M., Roussopoulos., N.,
2000) proposed a database middleware (MOCHA)
which is designed to interconnect distributed data
sources.
Many types of environments for executing grid-
aware app
lications can be found in the literature.
(Gounaris, A., Norman W., Alvaro A.A.,
Sakellariou, R.)(Tierney, B., Johnston, W., Lee, J.,
Hoo, G., Thompson, M). In distributed database
systems, there is an infrastructure that supports the
deliberate distribution of a database with some
measure of central. In federated database systems,
we can see the systems that allow multiple
autonomous databases to be integrated for use within
an application, and in query-based middleware, there
is a query language that is used as the programming
mechanism for expressing requests over multiple
wrapped data sources. This paper is most closely
related to the third category, in that we consider the
use of parallel query processing for integrating
various Grid resources, including database systems.
3 PARALLEL QUERY
PROCESSING
Parallel query processing is a well established
mechanism in relational DBMS. The objective of
parallel query processing is to translate a high-level
query into an efficient low-level execution plan and
allocate processors to each operation in such a way
that the overall query execution time is minimized.
(Kossmann, D., 1998)( Lu, H., Shan, M-C., Tan, K-
L., 1991)( DeWitt, D.J., Gray, J., 1992)
3.1 Types of Parallelism
As pointed out in (Kossmann, D., 1998), the
methods for exploiting parallelism in a database
environment can be divided into three categories:
namely intra-operator, inter-operator(intra-query),
and inter-query parallelism. In intra-operator
parallelism, the major issue is task creation and the
objective is to split an operation into tasks in a
manner such that the load can be spread evenly
across a given number of processors. The second
form of parallelism is termed inter-operator
parallelism, meaning that several operators within a
query can be executed in parallel. This can be
achieved either through parallel execution of
independent operations or through pipelining
Thirdly, parallelism can be achieved by executing
multiple queries simultaneously within a
multiprocessor system. This is termed inter-query
parallelism. (Turek, J., Philip, S., Chan, M.S., Wolf,
J.L)
3.2 Exploiting Resources
There are many alternative approaches for the
queries to execute at the client machine at which the
query was initiated or at the server machines that
store the relevant data. These are query shipping,
which is executing the query at the server side, data
shipping, which is executing the query at the client,
and hybrid shipping, which is the combination of
above two.
Another important class of system in which
que
ries run over data that is distributed over a
number of physical resources is parallel databases.
Parallel databases are now a mature technology, and
experience shows that parallel query processing
techniques are able to provide cost-effective
scalability for data-intensive applications. The
purpose of a parallel database system is to improve
transaction and query response times, and the
availability of the system for centralized
applications.
4 ARCHITECTURE
The proposed design of the system is shown in
figure 1. The system implements a multi-threaded
runtime environment in order to simultaneously
handle queries submitted by multiple users, and also
to manage multiple connections with application
servers. The system also performs resource
balancing between multiple application servers using
a replication model that employs statistics that
ICEIS 2005 - DATABASES AND INFORMATION SYSTEMS INTEGRATION
204