0
5000
10000
15000
20000
25000
30000
35000
40000
64 128 256 512 1024
Seconds
Cores
Synthetic
Ramdom
FCFS Sim - nfs
MINMIN Sim - nfs
MAXMIN Sim - nfs
HEFT Sim - nfs
CPFL Sim - nfs
CPFL Sim - ramd+disk+ssd
Figure 9: Synthetic CPFL simulation scaling to 1024 cores
versus others scheduling algorithms.
where some of them were sharing the same data files
as input. Techniques like caching shared input files
are desirable to prevent multiple file reads and to im-
prove the performance of the system I/O.
We used NFS as initial storage, locating reference
files and temporal files to Local RamDisk or Local
HDD or Local SSD obtained up to 69% of makespan
improvement on simulated large scale clusters with an
error between 0,9% and 3%.
Simulation of Synthetic Workflow Applications
has been correctly executed over a simulated cluster
tuned to behave like a real IBM cluster.
Even when a synthetic workflow has been used to
test scalability, our extension is able to simulate other
type of workflows due to the new storage hierarchy
added to WorkflowSim and the storage-aware sched-
uler.
As future work we are considering a list of op-
tions for data replacement polices in ramdisk, local
disk and SDD to further increase the efficiency of the
policies.
Looking forward, we plan to integrate this simula-
tor into a large cluster-based scientific workflow man-
ager like Galaxy (Goecks et al., 2010) which is a well
known workflow management system in the bioinfor-
matics community.
ACKNOWLEDGEMENTS
This work has been supported by project number
TIN2014-53234-C2-1-R of Spanish Ministerio de
Ciencia y Tecnolog
´
ıa (MICINN). This work is co-
founded by the EGI-Engage project (Horizon 2020)
under Grant number 654142.
REFERENCES
Acevedo, C., Hernandez, P., Espinosa, A., and Mendez, V.
(2016). A data-aware multiworkflow cluster sched-
uler. In Proceedings of the 1st International Confer-
ence on Complex Information Systems, pages 95–102.
SCITEPRESS.
Ananthanarayanan, G., Ghodsi, A., Wang, A., Borthakur,
D., Kandula, S., Shenker, S., and Stoica, I. (2012).
Pacman: coordinated memory caching for parallel
jobs. In Proceedings of the 9th USENIX conference
on Networked Systems Design and Implementation,
pages 20–20. USENIX Association.
Barbosa, J. and Monteiro, A. (2008). A list scheduling algo-
rithm for scheduling multi-user jobs on clusters. High
Performance Computing for Computational Science-
VECPAR 2008, pages 123—-136.
Bolze, R., Desprez, F., and Isnard, B. (2009). Evaluation
of Online Multi-Workflow Heuristics based on List-
Scheduling Algorithms. Gwendia report L.
Bryk, P., Malawski, M., Juve, G., and Deelman, E. (2016).
Storage-aware algorithms for scheduling of workflow
ensembles in clouds. Journal of Grid Computing,
14(2):359–378.
Calheiros, R. N., Ranjan, R., Beloglazov, A., De Rose,
C. A., and Buyya, R. (2011). Cloudsim: a toolkit for
modeling and simulation of cloud computing environ-
ments and evaluation of resource provisioning algo-
rithms. Software: Practice and Experience, 41(1):23–
50.
Chen, W. and Deelman, E. (2012). WorkflowSim: A toolkit
for simulating scientific workflows in distributed envi-
ronments. In 2012 IEEE 8th International Conference
on E-Science, e-Science 2012.
Costa, L. B., Yang, H., Vairavanathan, E., Barros, A., Ma-
heshwari, K., Fedak, G., Katz, D., Wilde, M., Ri-
peanu, M., and Al-Kiswany, S. (2015). The case for
workflow-aware storage: An opportunity study. Jour-
nal of Grid Computing, 13(1):95–113.
Dean, J. and Ghemawat, S. (2008). MapReduce: simplified
data processing on large clusters. Communications of
the ACM, 51(1):107–113.
Delgado Peris, A., Hern
´
andez, J. M., and Huedo, E. (2016).
Distributed late-binding scheduling and cooperative
data caching. Journal of Grid Computing, pages 1–
22.
Goecks, J., Nekrutenko, A., Taylor, J., and Team, T. G.
(2010). Galaxy : a comprehensive approach for sup-
porting accessible , reproducible , and transparent
computational research in the life sciences. Genome
biology.
Hirales-Carbajal, A., Tchernykh, A., R
¨
oblitz, T., and
Yahyapour, R. (2010). A grid simulation framework
to study advance scheduling strategies for complex
workflow applications. In Parallel & Distributed Pro-
cessing, Workshops and Phd Forum (IPDPSW), 2010
IEEE International Symposium on, pages 1–8. IEEE.
H
¨
onig, U. and Schiffmann, W. (2006). A meta-algorithm
for scheduling multiple dags in homogeneous sys-
tem environments. In Proceedings of the eighteenth
IASTED International Conference on Parallel and
Distributed Computing and Systems (PDCS’06).
A Data-aware MultiWorkflow Scheduler for Clusters on WorkflowSim
85