Hugo Saldanha, Edward Ribeiro, Maristela Holanda, Aleteia Araujo, Genaina Rodrigues, Maria Emilia Walter, João Carlos Setubal, Alberto Dávila


Cloud computing has emerged as a promising platform for large scale data intensive scientific research, i.e., processing tasks that use hundreds of hours of CPU time and petabytes of data storage. Despite being object of current research, efforts are mainly based on MapReduce in order to have processing performed in clouds. This article describes the BioNimbus project, which aims to define an architecture and to create a framework for easy and flexible integration and support for distributed execution of bioinformatics tools in a cloud environment, not only tied to the MapReduce paradigm. As a result, we leverage cloud elasticity, fault tolerance and, at the same time, significantly improve the storage capacity and execution time of bioinformatics tasks, mainly of large scale genome sequencing projects.


