64 128 256 512 1024
FCFS Sim - nfs
MINMIN Sim - nfs
MAXMIN Sim - nfs
HEFT Sim - nfs
CPFL Sim - nfs
CPFL Sim - ramd+disk+ssd
Figure 9: Synthetic CPFL simulation scaling to 1024 cores
versus others scheduling algorithms.
where some of them were sharing the same data files
as input. Techniques like caching shared input files
are desirable to prevent multiple file reads and to im-
prove the performance of the system I/O.
We used NFS as initial storage, locating reference
files and temporal files to Local RamDisk or Local
HDD or Local SSD obtained up to 69% of makespan
improvement on simulated large scale clusters with an
error between 0,9% and 3%.
Simulation of Synthetic Workflow Applications
has been correctly executed over a simulated cluster
tuned to behave like a real IBM cluster.
Even when a synthetic workflow has been used to
test scalability, our extension is able to simulate other
type of workflows due to the new storage hierarchy
added to WorkflowSim and the storage-aware sched-
As future work we are considering a list of op-
tions for data replacement polices in ramdisk, local
disk and SDD to further increase the efficiency of the
Looking forward, we plan to integrate this simula-
tor into a large cluster-based scientific workflow man-
ager like Galaxy (Goecks et al., 2010) which is a well
known workflow management system in the bioinfor-
matics community.
This work has been supported by project number
TIN2014-53234-C2-1-R of Spanish Ministerio de
Ciencia y Tecnolog
ıa (MICINN). This work is co-
founded by the EGI-Engage project (Horizon 2020)
under Grant number 654142.
A Data-aware MultiWorkflow Scheduler for Clusters on WorkflowSim