nificantly. Science Grid systems have a large number
of resources, and the basic Shadow Routing algorithm
requires searching over all resources to update all the
virtual queues. The algorithm will be more efficient,
if it reduces the number of time it triggers the virtual
queue updating process.
When a new task arrives to the system, our
scheduling algorithm considers three factors for
choosing a resource: the current load on each re-
source, the estimated execution time of the arriving
task on each resource, and the effect of the incoming
task on each resource load. So, we define the first
step of our algorithm as follows: the algorithm com-
pares the quantity ((Q
r
+
ˆ
L
k
ˆµ
r
) ×
ˆ
L
k
ˆµ
r
) for all resources,
and chooses the resource with the smallest value. The
algorithm has a trade-off between a resource which
finishes the currently assigned tasks earlier, and a re-
source which executes the incoming task faster. Also,
it aims to minimize the load which is going to be
added to each resource. The algorithm adds the es-
timated execution time of the task on the selected re-
source to the virtual queue of that resource, which is
given by
ˆ
L
k
ˆµ
m
for task k on resource m.
If the loads on faster resources increase such that
the proper balancing of loads is going to be vio-
lated, the total virtual queue length of all resources
will reach a predefined limit. In this case, the virtual
queue lengths of all resources are reduced by a spe-
cific amount. This is a normalization step of the al-
gorithm, by which the algorithm is making the virtual
queue length of slower resources smaller, and is in-
creasing the chance of slower resources being chosen
for executing future tasks. The parameter η should be
chosen based on the features of the system. For the
workloads considered in this work, we conclude that
a good value of η is 1/300.
Since the Grid Shadow Routing algorithm makes
each decision based on the values of virtual queues,
if task input rates, or resource execution rates change
in the system, no explicit detection of such an event
(or any other input rate measurement/estimation) is
necessary. The virtual queues automatically readjust
and the algorithm starts routing along the new best
matchings of resources to tasks.
4 EXPERIMENTAL SET-UP
We use simulation to evaluate the scheduling algo-
rithms. This section gives details of the simulation
toolkit used, the performance metrics applied, and
the experimental set-up. Simulation models were im-
plemented with the Java package GridSim (Buyya
and Murshed, 2002). Depending on the Grid sce-
nario and applications run in the system, there ex-
ist different performance metrics for evaluating Grid
scheduling algorithms. We use two of the most impor-
tant performance metrics to evaluate the algorithms
from different aspects. The metrics that we consider
are: Makespan (the maximum completion time of all
tasks), and Flowtime (the average completion time of
all tasks).
We consider a Grid system consisting of 50 ded-
icated resources with different CPU speeds, working
in parallel with an overall high load. To simulate a
widely distributed Grid system, and because the band-
width between elements of the system which are far
from each other is low, we set the bandwidth inside
the elements of the system to be 1 Gbps, and the band-
width between the scheduler and each of the 50 re-
sources to be 10 Mbps.
As mentioned before, our proposed algorithm is
mostly advantageous for EGEE Grids, so we evalu-
ate our algorithm in a real workload from the CERN
Grid project. We use a workload from the Grid Work-
load Archive, in Grid Workloads Format (GWF). This
workload is collected from the LCG project. The
LCG testbed represents the Large Hadron Collider
(LHC) Computing Grid. We use the LCG trace, ver-
sion 0.1 which is provided by the Grid Workloads
Archive (Iosup et al., 2006). We use the first 20,000
tasks in this trace for our experiment. We run our
simulation until 20, 000 tasks arrive to the system and
then wait until the system becomes empty.
Our algorithm uses estimates of the task lengths
and resource execution rates. However, various esti-
mation methods may have differentlevelsof accuracy.
So, we evaluate our algorithm in a system that has var-
ious levels of error in the estimated task lengths and
resource execution rates. In order to completely study
the robustness of our algorithm, we examine cases
that have 0% to 40% error in our estimates; however,
typically these errors are on the order of 10% (Akioka
and Muraoka, 2004). We evaluate our proposed al-
gorithm by considering the error model discussed in
(Iosup et al., 2008) for estimating task lengths and re-
source execution rates. Generally the two models of
error in these estimates are:
• Over and Under Estimation Error. In our sim-
ulations,
ˆ
L
k
and ˆµ
r
are obtained using the fol-
lowing relations:
ˆ
L
k
= L
k
× (1 + E
k
) and ˆµ
r
=
µ
r
× (1 + E
r
). Here, E
k
and E
r
are the errors for
task lengths and resource execution rates, respec-
tively, which are sampled from the uniform distri-
bution [−I, +I], and I is the maximum error.
• Over Estimation Error. The main error models
are obtained using the relations
ˆ
L
k
= L
k
×(1+E
′
k
)
ICSOFT 2011 - 6th International Conference on Software and Data Technologies
134