Other category of studies attempt to minimize
the total makespan without a particular focus on the
allocation rule of jobs to servers. They used different
criteria such as total completion time, total weighted
completion time and makespan, in heuristics
solutions approaches. In (Damodaran and Vélez-
Gallego, 2012), they compare the performance of a
Simulated Annealing (SA) approach, a Modified
Delay (MD) heuristic and a Greedy Randomized
Adaptive Search Procedure (GRASP) in minimizing
the makespan of parallel batch processing machines.
In (Lim and Cho, 2007), the authors propose a CPU
process scheduling algorithm using fuzzy inference
with user models. They classify tasks into three
categories, batch, interactive and real-time
processes, and models user's preferences to each
process class. The algorithm assigns the scheduling
priority of each process according to the class of the
process and user’s preference through the fuzzy
inference. Another important area of application is
in database management systems when queries are
to be addressed. Single query optimization has been
well studied with various strategies depending on the
architecture of the database system, (Mehta et al.,
1993). However the use of batch queries scheduling
in multi-user environment is still to be properly
tackled.
3 BATCH DATA PROCESSES
SCHEDULING PROBLEM
The problem deals with the scheduling and
allocation of a set of data files to a set of processors
for maximizing some performance measures while
satisfying all the priority and precedence constraints.
Data files cannot pre-assign to processors,
however, only data files that are identified available
for IC processing are to be scheduled and allocated
to processors based on the precedence weight and
predetermined scheduling priority. The processing
time of a data file is a function of file size/number of
records, and sizes of data files change after
performing each IC processing task; it can be
assumed that the size of a data file changes
proportionally with the original file size.
Each file may require performing a number of
computing instruction (CI) processing and I/O
reading. CI processing and I/O readings can be
executed in parallel operating systems. CI
processing tasks can split into two or more
processors. CI processing tasks can be processed in
processors until it get interrupted by an I/O reading
task or reaches “end of processing”.
In the processor set, there are several processors
running concurrently (multiprocessing), each
processor is assigned to ONLY one CI processing
task at a time; we cannot perform more than one task
at a time in a single processor. The reservation or
pre-allocation processor is banned.
The notion of interruption must be taken into
account. It is essential to the functioning of the
operating system: a data file A is allocated to a
processor to perform CI processing task, if the
program reaches a reading input-output (I/O) task,
then the data file A is interrupted in order to allow
the processor to run another data file that was
waiting, say B for instance. At the end of the I/O
instruction task, data file A will return in the queue
in a position which depends on an updated
precedence weight and a predefined scheduling
priority (dispatching priority DP) and expect to
benefit again from the processor when its turn
comes. As a result, a task has a total duration that
varies according to the number of concurrent tasks
and multiprocessing – we use these rules to set
priorities among the data files.
4 PROPOSED APPROACH –
BATCH DATA PROCESSES
SCHEDULING (BDPS)
ALGORITHM
We present a dynamic optimization framework for
the batch data processes scheduling based on a
dynamic algorithm. The batch data processes
scheduling (BDPS) algorithm is an iterative process
optimizing the allocation several processors to
different tasks and scheduling batch data processes.
The BDPS uses an iterative approach, Step 1
reflects a preparatory stage where the initial data is
set up and BDPS algorithm begins, then BDPS
enters a loop that includes repetitive steps. The first
task in this loop, Step 2, is to update and increment
the iteration clock by one time unit (estimated by the
time it takes to process the smallest size unit of data
file). At each time T the BDPS algorithm considers
allocating data files available for IC processing to
several processors so Step 3 involves setting up data
files subset; only data files that are not immediately
preceded by other data files can be IC processed at
current time T and, therefore, belong to the data file
subset (
'
). In step 3 we also set the data file weight
based precedence/dependency matrix. Next, in Step
4 BDPS solves an integer network optimization
ICORES2014-InternationalConferenceonOperationsResearchandEnterpriseSystems
222