ulation results show the bio-backfill proposal can im-
prove average workflow turnaround by 8,6% and re-
source utilization by 3,8% compared to state-of-the-
art Firstfit and Bestfit backfill. Present experiments
show the promising performance improvements when
adapting backfill policies to the needs of bioinformat-
ics workflows applications, proving the viability of
the bio-backfill scheduler. To further develop and test
the proposal we are currently working on applying
it into larger environments, with a greater amount of
nodes, PUs, and workflows. Future steps also include
increasing the set of applications, as well as extending
comparisons to other backfill policies.
REFERENCES
Acevedo, C., Hern
´
andez, P., Espinosa, A., and M
´
endez,
V. (2017). A Critical Path File Location (CPFL) al-
gorithm for data-aware multiworkflow scheduling on
HPC clusters. In Future Generation Computer Sys-
tems. Elsevier.
Al-Ali, R., Kathiresan, N., Anbari, M. E., Schendel, E.,
and Zaid, T. (2016). Workflow optimization of per-
formance and quality of service for bioinformatics ap-
plication in high performance computing. In Journal
of Computational Science. Elsevier.
Altschul, S., Gish, W., Miller, W., Myers, E., and Lipman,
D. (1990). Basic Local Alignment Search Tool. In
Journal of molecular biology. Elsevier.
Arabnejad, V., Bubendorfer, K., and Ng, B. (2017). Dead-
line Constrained Scientific Workflow Scheduling on
Dynamically Provisioned Cloud Resources. In Future
Generation Computer Systems, special issue. Elsevier.
Badosa, F., Acevedo, C., Espinosa, A., Vera, G., and Ripoll,
A. (2017). A Resource Manager for Maximizing the
Performance of Bioinformatics Workflows in Shared
Clusters. In International Conference on Algorithms
and Architectures for Parallel Processing. Springer.
Bucur, A. and Epema, D. (2001). The influence of com-
munication on the performance of co-allocation. In
Job Scheduling Strategies for Parallel Processing.
Springer.
Burrows, M. and Wheeler, D. (1994). A block-sorting loss-
less data compression algorithm. Citeseer.
Chen, W. and E-Deelman (2012). Workflowsim: A toolkit
for simulating scientific workflows in distributed en-
vironments. In 8th International Conference on E-
Science. IEEE.
Dobin, A., Davis, C., Schlesinger, F., Drenkow, J., Zaleski,
C., Jha, S., Batut, P., Chaisson, M., and Gingeras, T.
(2013). STAR: ultrafast universal RNA-seq aligner.
In Bioinformatics. Oxford University Press.
Figueira, S. and Berman, F. (2001). A slowdown model
for applications executing on time-shared clusters of
workstations. In Transactions on Parallel and Dis-
tributed Systems. IEEE.
Guindon, S. and Gascuel, O. (2003). A simple, fast, and
accurate algorithm to estimate large phylogenies by
maximum likelihood. In Systematic biology. Society
of Systematic Zoology.
Hatem, A., Bozda
˘
g, D., Toland, A., and C¸ ataly
¨
urek, U.
(2013). Benchmarking short sequence mapping tools.
In BMC Bioinformatics. BioMed Central.
Huelsenbeck, J. and Ronquist, F. (2001). MRBAYES:
Bayesian inference of phylogenetic trees. In Bioin-
formatics. Oxford University Press.
Kathiresan, N., Temanni, M., and Al-Ali, R. (2014). Perfor-
mance improvement of BWA MEM algorithm using
data-parallel with concurrent parallelization.
Kim, D., Langmead, B., and Salzberg, S. (2015). HISAT:
a fast spliced aligner with low memory requirements.
In Nature methods. Nature Research.
Langmead, B. (2009). Ultrafast and memory-efficient align-
ment of short DNA sequences to the human genome.
In Journal of Genome Biology.
Lawson, B. and Smirni, E. (2002). Multiple-queue back-
filling scheduling with priorities and reservations for
parallel systems. In ACM SIGMETRICS Performance
Evaluation Review. ACM.
Li, H. and Durbin, R. (2009). Fast and accurate short
read alignment with Burrows-Wheeler Transform. In
Bioinformatics. Oxford University Press.
Li, R., Li, Y., Kristiansen, K., and Wang, J. (2008). SOAP:
short oligonucleotide alignment program. In Bioinfor-
matics. Oxford University Press.
Lord, E., Diallo, A., and Makarenkov, V. (2015). Classi-
fication of bioinformatics workflows using weighted
versions of partitioning and hierarchical clustering al-
gorithms. In BMC Bioinformatics. BioMed Central.
Needleman, S. and Wunsch, C. (1970). A general method
applicable to the search for similarities in the amino
acid sequence of two proteins. In Journal of Molecu-
lar Biology. Elsevier.
NHGRI (2017). DNA Sequencing Costs: Data.
https://www.genome.gov/sequencingcostsdata/.
Price, M. N., Dehal, P. S., and Arkin, A. P. (2009). Fast-
Tree: computing large minimum evolution trees with
profiles instead of a distance matrix. In Molecular bi-
ology and evolution. Oxford University Press.
Stamatakis, A., Ludwig, T., and Meier, H. (2004). RAxML-
III: a fast program for maximum likelihood-based in-
ference of large phylogenetic trees. In Bioinformatics.
Oxford University Press.
Talby, D. and Feitelson, D. (1999). Supporting priorities
and improving utilization of the IBM SP scheduler us-
ing slack-based backfilling. In 13th International and
10th Symposium on Parallel and Distributed Process-
ing. IEEE.
Waidyasooriya, H., Hariyama, M., and Kameyama, M.
(2014). FPGA-accelerator for DNA sequence align-
ment based on an efficient data-dependent memory ac-
cess scheme. In Proceedings of the 5th International
Symposium on Highly-Efficient Accelerators and Re-
configurable Technologies.
Wu, F., Wu, Q., and Tan, Y. (2015). Workflow scheduling
in cloud: a survey. In The Journal of Supercomputing.
Springer.
Xin, H., Lee, D., Hormozdiari, F., Yedkar, S., Mutlu, O.,
and Alkan, C. (2013). Accelerating read mapping with
FastHASH. In BMC Genomics. BioMed Central.
COMPLEXIS 2018 - 3rd International Conference on Complexity, Future Information Systems and Risk
156