to improve the makespan of workflows. As mentioned
above, we also discovered small gaps in the results of
our experiments. They are most likely due to the way
we implemented our system. We will investigate them
and try to find optimisations. Further, we will improve
our cloud manager so it creates multiple agents with
a given capability set in parallel.
REFERENCES
Altintas, I., Berkley, C., Jaeger, E., Jones, M., Ludascher,
B., and Mock, S. (2004). Kepler: an extensible sys-
tem for design and execution of scientific workflows.
In Proceedings. 16th International Conference on Sci-
entific and Statistical Database Management, 2004.,
pages 423–424. IEEE.
Apache Airflow (2020). Apache Airflow Website.
https://airflow.apache.org/. Last accessed: 2020-04-
14.
Bar-Yossef, Z., Jayram, T. S., Kumar, R., Sivakumar, D.,
and Trevisan, L. (2002). Counting distinct elements in
a data stream. In Rolim, J. D. P. and Vadhan, S., edi-
tors, Randomization and Approximation Techniques in
Computer Science, pages 1–10. Springer Berlin Hei-
delberg.
Berriman, G. B., Deelman, E., Good, J. C., Jacob, J. C.,
Katz, D. S., Kesselman, C., Laity, A. C., Prince, T. A.,
Singh, G., and Su, M.-H. (2004). Montage: a grid-
enabled engine for delivering custom science-grade
mosaics on demand. In Optimizing Scientific Return
for Astronomy through Information Technologies, vol-
ume 5493, pages 221–233. International Society for
Optics and Photonics.
Binato, S., Hery, W. J., Loewenstern, D. M., and Resende,
M. G. C. (2002). A Grasp for Job Shop Scheduling,
pages 59–79. Springer US.
Blythe, J., Jain, S., Deelman, E., Gil, Y., Vahi, K., Man-
dal, A., and Kennedy, K. (2005). Task scheduling
strategies for workflow-based applications in grids.
In Proceedings of the IEEE International Symposium
on Cluster Computing and the Grid (CCGrid), pages
759–767.
Carbone, P., Katsifodimos, A., Ewen, S., Markl, V., Haridi,
S., and Tzoumas, K. (2015). Apache Flink: Stream
and batch processing in a single engine. Bulletin of
the IEEE Computer Society Technical Committee on
Data Engineering, 36(4):28–38.
Casanova, H., Legrand, A., Zagorodnov, D., and Berman, F.
(2000). Heuristics for scheduling parameter sweep ap-
plications in grid environments. In Proceedings of the
9th Heterogeneous Computing Workshop HCW, pages
349–363.
Chircu, V. (2018). Understanding the 8 fallacies of
distributed systems. https://dzone.com/articles/
understanding-the-8-fallacies-of-distributed-syste.
Last accessed: 2020-02-18.
Deelman, E., Peterka, T., Altintas, I., Carothers, C. D., van
Dam, K. K., Moreland, K., Parashar, M., Ramakrish-
nan, L., Taufer, M., and Vetter, J. (2018). The fu-
ture of scientific workflows. The International Jour-
nal of High Performance Computing Applications,
32(1):159–175.
Deelman, E., Vahi, K., Juve, G., Rynge, M., Callaghan, S.,
Maechling, P. J., Mayani, R., Chen, W., Ferreira da
Silva, R., Livny, M., and Wenger, K. (2015). Pegasus:
a workflow management system for science automa-
tion. Future Generation Computer Systems, 46:17–
35.
Di Tommaso, P., Chatzou, M., Floden, E. W., Barja, P. P.,
Palumbo, E., and Notredame, C. (2017). Nextflow en-
ables reproducible computational workflows. Nature
biotechnology, 35(4):316–319.
Foster, I. and Kesselman, C., editors (1998). The Grid:
Blueprint for a New Computing Infrastructure. Mor-
gan Kaufmann Publishers Inc., San Francisco, CA,
USA.
Freund, R. F., Gherrity, M., Ambrosius, S., Campbell, M.,
Halderman, M., Hensgen, D. Z., Keith, E., Kidd, T.,
Kussow, M., Lima, J. D., Mirabile, F., Lantz, M.,
Rust, B., and Siegel, H. J. (1998). Scheduling re-
sources in multi-user heterogeneous computing envi-
ronments with SmartNet. Calhoun: The NPS Institu-
tional Archive.
Gherega, A. and Pupezescu, V. (2011). Multi-agent re-
source allocation algorithm based on the xsufferage
heuristic for distributed systems. In Proceedings of
the 13th International Symposium on Symbolic and
Numeric Algorithms for Scientific Computing, pages
313–320.
Giardine, B., Riemer, C., Hardison, R. C., Burhans, R.,
Elnitski, L., Shah, P., Zhang, Y., Blankenberg, D.,
Albert, I., Taylor, J., Miller, W., Kent, W. J., and
Nekrutenko, A. (2005). Galaxy: a platform for inter-
active large-scale genome analysis. Genome research,
15(10):1451–1455.
Graves, R., Jordan, T. H., Callaghan, S., Deelman, E., Field,
E., Juve, G., Kesselman, C., Maechling, P., Mehta, G.,
Milner, K., Okaya, D., Small, P., and Vahi, K. (2011).
Cybershake: A physics-based seismic hazard model
for southern california. Pure and Applied Geophysics,
168(3):367–381.
Hamad, S. A. and Omara, F. A. (2016). Genetic-based
task scheduling algorithm in cloud computing envi-
ronment. International Journal of Advanced Com-
puter Science and Applications, 7(4):550–556.
Hemamalini, M. (2012). Review on grid task scheduling in
distributed heterogeneous environment. International
Journal of Computer Applications, 40(2):24–30.
Hull, D., Wolstencroft, K., Stevens, R., Goble, C., Pocock,
M. R., Li, P., and Oinn, T. (2006). Taverna: a tool for
building and running workflows of services. Nucleic
acids research, 34:W729–W732.
Ibarra, O. H. and Kim, C. E. (1977). Heuristic algorithms
for scheduling independent tasks on nonidentical pro-
cessors. Journal of the ACM, 24(2):280–289.
Johnson, D. S. and Garey, M. R. (1979). Computers
and Intractability: A guide to the theory of NP-
completeness. WH Freeman.
Capability-based Scheduling of Scientific Workflows in the Cloud
53