TASK SCHEDULING IN A FEDERATED CLOUD INFRASTRUCTURE FOR BIOINFORMATICS APPLICATIONS

C. A. L. Borges, H. V. Saldanha, E. Ribeiro, M. T. Holanda, A. P. F. Araujo, M. E. M. T. Walter

Abstract

Task scheduling is difficult in federated cloud environments, since there are many cloud providers with distinct capabilities that should be addressed. In bioinformatics, many tools and databases requiring large resources for processing and storing enourmous amounts of data are provided by physically separate institutions. This article treats the problem of task scheduling in BioNimbus, a federated cloud infrastructure for bioinformatics applications. We propose a scheduling algorithm based on the Analytic Hierarchy Process (AHP) to perform an efficient distribution for finding the best resources to execute each required task. We developed experiments with real biological data executing on BioNimbus, formed by three cloud providers executing in Amazon EC2. The obtained results show that DynamicAHP makes a significant improvement in the makespan time of bioinformatics applications executing in BioNimbus, when compared to the Round Robin algorithm.

References

  1. Angiuoli, S. V. and et al. (2011). Resources and costs for microbial sequence analysis evaluated using virtual machines and cloud computing. PLoS ONE, 6(10):e26624.
  2. Bernstein, D. and et al. (2009). Blueprint for the intercloud - protocols and formats for cloud computing interoperability. Internet and Web Applications and Services, International Conference on, 0:328-336.
  3. Bresnahan, J. and et al. (2011). Cumulus: An open source storage cloud for science. In 2nd Workshop on Scientific Cloud Computing - ScienceCloud 2011, San Jose, CA, United States. http://www.nimbusproject.org/.
  4. Buyya, R. and et al. (2009). Cloud computing and emerging it platforms: Vision, hype, and reality for delivering computing as the 5th utility. Future Generation Computer Systems, 25(6):599-616.
  5. Celesti, A. and et al. (2010). How to enhance cloud architectures to enable cross-federation. In IEEE 3rd International Conference on Cloud Computing, pages 337-345. IEEE Computer Society.
  6. Chaisiri, S., Lee, B.-S., and Niyato, D. (2009). Optimal virtual machine placement across multiple cloud providers. In APSCC 2009, Services Computing Conference, pages 103-110. IEEE.
  7. Ergu, D. and et al. (2011). The analytic hierarchy process: task scheduling and resource allocation in cloud computing environment. The Journal of Supercomputing, pages 1-14. doi:10.1007/s11227-011-0625-1.
  8. Filichkin, S. A. and et al. (2010). Genome-wide mapping of alternative splicing in Arabidopsis thaliana. Genome Research, 20(1):45-58.
  9. Henzinger, T. A. and et al. (2011). Static scheduling in clouds. In HotCloud 2011. USENIX Association.
  10. Langmead, B. and et al. (2009). Ultrafast and memoryefficient alignment of short DNA sequences to the human genome. Genome Biology, 10(3):R25+.
  11. Langmead, B., Hansen, K., and Leek, J. (2010). Cloudscale RNA-sequencing differential expression analysis with Myrna. Genome Biology, 11(8):R83.
  12. Li, Q. and Guo, Y. (2010). Optimization of resource scheduling in cloud computing. In SYNASC'2010, 12th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, pages 315-320. IEEE Computer Society.
  13. LLC (2011). Amazon Elastic Compute Cloud EC2. http://aws.amazon.com/ec2/. Accessed Nov 22, 2011.
  14. Marshall, P., Keahey, K., and Freeman, T. (2010). Elastic site: Using clouds to elastically extend site resources. Cluster Computing and the Grid, IEEE International Symposium on, 0:43-52.
  15. Mehdi, N. A. and et al. (2011). Impatient task mapping in elastic cloud using genetic algorithm. Journal of Computer Science, 7:877-883.
  16. Nurmi, D. and et al. (2009). The eucalyptus open-source cloud-computing system. In 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, CCGRID'09, pages 124-131, Washington, DC, USA. IEEE Computer Society.
  17. OpenQRM (2011). the next generation, opensource data-center management platform. http://www.openqrm.com/.
  18. Pan, Q. and et al. (2008). Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nature Genetics, 40(12):1413-1415.
  19. Pratt, B. and et al. (2011). MR-Tandem: Parallel X!Tandem using Hadoop MapReduce on Amazon Web Services. Bioinformatics, 8:1-12.
  20. Saaty, T. L. (1990). How to make a decision: The analytic hierarchy process. European Journal of Operational Research, 48(1):9 - 26. Decision making by the analytic hierarchy process: Theory and applications.
  21. Saldanha, H. V. and et al. (2011). A cloud architecture for bioinformatics workflows. In 1st International Conference on Cloud Computing and Services Science, CLOSER.
  22. Schatz, M. C. (2009). CloudBurst: Highly Sensitive Read Mapping with MapReduce. Bioinformatics, 25:1363- 1369.
  23. Sultan, M. and et al. (2008). A Global View of Gene Activity and Alternative Splicing by Deep Sequencing of the Human Transcriptome. Science, 321(5891):956- 960.
  24. Wall, D. and et al. (2010). Cloud computing for comparative genomics. BMC Bioinformatics, 11(1):259.
  25. Zhang, L. and et al. (2011). Gene set analysis in the cloud. Bioinformatics, 13:1-10.
Download


Paper Citation


in Harvard Style

A. L. Borges C., V. Saldanha H., Ribeiro E., T. Holanda M., P. F. Araujo A. and E. M. T. Walter M. (2012). TASK SCHEDULING IN A FEDERATED CLOUD INFRASTRUCTURE FOR BIOINFORMATICS APPLICATIONS . In Proceedings of the 2nd International Conference on Cloud Computing and Services Science - Volume 1: CLOSER, ISBN 978-989-8565-05-1, pages 114-120. DOI: 10.5220/0003932801140120


in Bibtex Style

@conference{closer12,
author={C. A. L. Borges and H. V. Saldanha and E. Ribeiro and M. T. Holanda and A. P. F. Araujo and M. E. M. T. Walter},
title={TASK SCHEDULING IN A FEDERATED CLOUD INFRASTRUCTURE FOR BIOINFORMATICS APPLICATIONS},
booktitle={Proceedings of the 2nd International Conference on Cloud Computing and Services Science - Volume 1: CLOSER,},
year={2012},
pages={114-120},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003932801140120},
isbn={978-989-8565-05-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 2nd International Conference on Cloud Computing and Services Science - Volume 1: CLOSER,
TI - TASK SCHEDULING IN A FEDERATED CLOUD INFRASTRUCTURE FOR BIOINFORMATICS APPLICATIONS
SN - 978-989-8565-05-1
AU - A. L. Borges C.
AU - V. Saldanha H.
AU - Ribeiro E.
AU - T. Holanda M.
AU - P. F. Araujo A.
AU - E. M. T. Walter M.
PY - 2012
SP - 114
EP - 120
DO - 10.5220/0003932801140120