(e.g., system architecture, location of the executable
and libraries, programming language). These
attributes cannot be altered by users of the system,
but are typically specified by the developer of the
program during the process of publishing the
program on the grid. The ADS instance also includes
default values for all options, but the exact values
are not set.
When querying for a data mining program, the
client side components (implemented as Triana
units) use the ADS instance in order to dynamically
create a GUI, which conforms to the description of
that particular data mining program. For example,
for each option a form field is generated, where the
user can specify the values for that option. At this
stage the user provides the exact values for the
applications parameters (during runtime, e.g.,
application parameter values, data input, additional
requirements) of the program.
A fully specified ADS instance represents a
multi-job description and is submitted to the
Resource Broker for parallel execution in the grid.
The Resource Broker uses the information
contained in the ADS instance to aggregate
appropriate resources. Particularly useful are the
following information:
Static Resource Requirements. regarding
system architecture and operating system.
Applications implemented in a hardware-
dependent language (e.g., C) typically run
only on the system architecture and operating
system they have been compiled for (e.g.,
PowerPC or Intel Itanium running Linux). For
this reason, the Resource Broker has to select
execution machines that offer the same system
architecture and operating system as required
by the application.
Modifiable Resource Requirements. memory
and disk space. While data mining
applications may require a minimal amount of
memory and disk space at start-up time,
memory and disk space demands typically rise
with the amount of data being processed and
with the solution space being explored.
Therefore, end users are allowed to specify
these requirements in accordance with the data
volume to be processed and their knowledge
of the application’s behaviour. The Resource
Broker will take into account these user-
defined requirements and match them to those
machines and resources that meet them.
Modifiable Requirements. identity of
machines. In some cases end users may
generally wish to limit the list of possible
execution machines based on personal
preferences, for instance, when processing
sensitive data. To support this requirement, it
is possible for the user to specify the IPs of
such machines in the job description. Such a
list causes the Resource Broker to match only
those resources and machines listed and to
ignore all other machines independent of their
capabilities.
The Total Number of Jobs. Instead of
specifying single values for each option and
data input that the selected application
requires, it is also possible to declare a list of
distinct values (e.g., true, false) or a loop (e.g.,
from 0.50 to 10.00 with step 0.25). These
represent rules for variable instantiations,
which are translated into a number of jobs
with different parameters by the Resource
Broker. This is referred to as a multi-job. As a
result, the Broker will prefer computational
resources that are capable of executing the
whole list of jobs at once in order to minimize
data transfer. Typically, such resources are
either clusters or high-performance machines
offering many distinct processors. As an
example, if the user specifies two input files
(a.txt, b.txt) for the same data input and two
loops running from 1 to 10 with step 1 as
parameters for two options, the Resource
Broker will translate this into 200 (2 x 10 x
10) distinct jobs. If no singe resource capable
of executing them at once is available, the
Broker will distribute these jobs over those
resources that provide the highest capability.
In addition, the Resource Broker evaluates
further information from the job description that
becomes important at the multi-job submission
stage. This information is briefly described below:
Instructions. on where the program
executables are stored, including all required
libraries, and how to start the selected
program. These are required for transferring
executables and associated libraries to
execution machines across the grid, which is
part of the stage-in process. By staging-in
programs together with the input data
dynamically at run-time, the system is capable
of executing these applications on any suitable
machine in the grid without prior installation
of the respective data mining program.
All Data Inputs and Data Outputs. that have
to be transferred prior the execution.
All Option Values (Data Mining Program
Parameters). that have to be passed to the
ICSOFT 2008 - International Conference on Software and Data Technologies
226