
 
(e.g., system architecture, location of the executable 
and libraries, programming language). These 
attributes cannot be altered by users of the system, 
but are typically specified by the developer of the 
program during the process of publishing the 
program on the grid. The ADS instance also includes 
default values for all options, but the exact values 
are not set. 
When querying for a data mining program, the 
client side components (implemented as Triana 
units) use the ADS instance in order to dynamically 
create a GUI, which conforms to the description of 
that particular data mining program. For example, 
for each option a form field is generated, where the 
user can specify the values for that option. At this 
stage the user provides the exact values for the 
applications parameters (during runtime, e.g., 
application parameter values, data input, additional 
requirements) of the program. 
A fully specified ADS instance represents a 
multi-job description and is submitted to the 
Resource Broker for parallel execution in the grid. 
The Resource Broker uses the information 
contained in the ADS instance to aggregate 
appropriate resources. Particularly useful are the 
following information: 
  Static Resource Requirements. regarding 
system architecture and operating system. 
Applications implemented in a hardware-
dependent language (e.g., C) typically run 
only on the system architecture and operating 
system they have been compiled for (e.g., 
PowerPC or Intel Itanium running Linux). For 
this reason, the Resource Broker has to select 
execution machines that offer the same system 
architecture and operating system as required 
by the application. 
  Modifiable Resource Requirements. memory 
and disk space. While data mining 
applications may require a minimal amount of 
memory and disk space at start-up time, 
memory and disk space demands typically rise 
with the amount of data being processed and 
with the solution space being explored. 
Therefore, end users are allowed to specify 
these requirements in accordance with the data 
volume to be processed and their knowledge 
of the application’s behaviour. The Resource 
Broker will take into account these user-
defined requirements and match them to those 
machines and resources that meet them.  
  Modifiable Requirements. identity of 
machines. In some cases end users may 
generally wish to limit the list of possible 
execution machines based on personal 
preferences, for instance, when processing 
sensitive data. To support this requirement, it 
is possible for the user to specify the IPs of 
such machines in the job description. Such a 
list causes the Resource Broker to match only 
those resources and machines listed and to 
ignore all other machines independent of their 
capabilities. 
  The Total Number of Jobs. Instead of 
specifying single values for each option and 
data input that the selected application 
requires, it is also possible to declare a list of 
distinct values (e.g., true, false) or a loop (e.g., 
from 0.50 to 10.00 with step 0.25). These 
represent rules for variable instantiations, 
which are translated into a number of jobs 
with different parameters by the Resource 
Broker. This is referred to as a multi-job. As a 
result, the Broker will prefer computational 
resources that are capable of executing the 
whole list of jobs at once in order to minimize 
data transfer. Typically, such resources are 
either clusters or high-performance machines 
offering many distinct processors. As an 
example, if the user specifies two input files 
(a.txt, b.txt) for the same data input and two 
loops running from 1 to 10 with step 1 as 
parameters for two options, the Resource 
Broker will translate this into 200 (2 x 10 x 
10) distinct jobs. If no singe resource capable 
of executing them at once is available, the 
Broker will distribute these jobs over those 
resources that provide the highest capability.  
In addition, the Resource Broker evaluates 
further information from the job description that 
becomes important at the multi-job submission 
stage. This information is briefly described below: 
  Instructions. on where the program 
executables are stored, including all required 
libraries, and how to start the selected 
program. These are required for transferring 
executables and associated libraries to 
execution machines across the grid, which is 
part of the stage-in process. By staging-in 
programs together with the input data 
dynamically at run-time, the system is capable 
of executing these applications on any suitable 
machine in the grid without prior installation 
of the respective data mining program. 
  All Data Inputs and Data Outputs. that have 
to be transferred prior the execution. 
  All Option Values (Data Mining Program 
Parameters). that have to be passed to the 
ICSOFT 2008 - International Conference on Software and Data Technologies
226