using a greater number of resources. The results sug-
gest that both cluster and cloud can be used to achieve
a reduction of run-time, but the cloud scenery have a
overhead improved.
In future work, we believe that the researches
should to improve the features supported by the pro-
posed, such as those that concern generating feedback
from a mutation found in a patient gene to a database
of mutations like Ensembl or Gene Report can be im-
plemented. Other future study could attempt to im-
plement this system, for example, to finding an online
user-friendly solution in cloud. However, many issues
need still to be discussed to be explored graphic inter-
faces for the use of cloud to persist gene analysis of
the patients.
ACKNOWLEDGEMENTS
This work was supported by several institutions, in-
cluding the following: CNPq through grant number
MCT/CNPq N
o
70/2009 – PGAEST- MCT/CNPq;
FAPERGS via grant FAPERGS/CNPq 008/2009. It
was also assisted by the research project “GREEN-
GRID: Sustainable HPC”; Grid5000, a grid platform
developed by the INRIA ALADDIN project, and the
support provided by CNRS, RENATER and other uni-
versities. Some experiments were assisted by Mi-
crosoft Azure environment from Microsoft Research.
The research was also partly sponsored by CAPES
grant 99999.014966/2013-01(through DSE program)
and by CNPQ-PIBITI-UFRGS FUNTTEL - MARE-
MOTO project. Members from the GPPD MapRe-
duce group also assisted in the development of this
project.
REFERENCES
Alberts, B., Johnson, A., Lewis, J., Raff, M., Roberts, K.,
and Walter, P. (2014). Molecular Biology of the Cell.
Garland Science, 6th edition.
BCM (2014). DNA Nexus Project. Technical report.
Chung, W.-C., Chen, C.-C., Ho, J.-M., Lin, C.-Y., Hsu, W.-
L., Wang, Y.-C., Lee, D. T., Lai, F., Huang, C.-W.,
and Chang, Y.-J. (2014). CloudDOE: A User-Friendly
Tool for Deploying Hadoop Clouds and Analyzing
High-Throughput Sequencing Data with MapReduce.
PLOS ONE, 9:e98146.
Costa, F. F. (2014). Big data in biomedicine. Drug Discov-
ery Today, 19(4):433–440.
Dean, J. and Ghemawat, S. (2010). MapReduce - A Flexible
Data Processing Tool. Communications of the ACM,
53(1):72–77.
Frebourg, T. (2014). The challenge for the next generation
of medical geneticists. Hum Mutat, 35(8):909–11.
Gurtowski, J., Schatz, M. C., and Langmead, B. (2012).
Genotyping in the cloud with Crossbow. Curr Pro-
toc Bioinformatics.
Hansen, M., Miron-Shatz, T., Lau, A. Y. S., and Paton, C.
(2014). Big Data in Science and Healthcare: A Re-
view of Recent Literature and Perspectives. Yearbook
of medical informatics, 9(4):21–6.
Johnsen, J. M., Nickerson, D. A., and Reiner, A. P. (2013).
Massively parallel sequencing: the new frontier of
hematologic genomics. Blood, 122(19):3268–3275.
Kinsella, R. J., Kahari, A., Haider, S., Zamora, J., Proc-
tor, G., Spudich, G., Almeida-King, J., Staines, D.,
Derwent, P., Kerhornou, A., Kersey, P., and Flicek, P.
(2011). Ensembl BioMarts: a hub for data retrieval
across taxonomic space. Database, 2011:1–9.
McKenna, A., Hanna, M., Banks, E., Sivachenko, A.,
Cibulskis, K., Kernytsky, A., Garimella, K., Altshuler,
D., Gabriel, S., Daly, M., and DePristo, M. A. (2010).
The Genome Analysis Toolkit: A MapReduce frame-
work for analyzing next-generation DNA sequencing
data. Genome Research, 20(9):1297–1303.
MEDLINE (2013). The NCBI Handbook, volume
NBK143764. National Center for Biotechnology In-
formation, 2nd edition.
NCBI (2014). A Base Pathogenic Mutations. Technical
report.
Nguyen, T., Shi, W., and Shi, W. (2011). CloudAligner:
A fast and full-featured MapReduce based tool for se-
quence mapping. BMC Research Notes, 4(171):1–16.
Niemenmaa, M., Kallio, A., Schumacher, A., Klemela, P.,
Korpelainen, E., and Heljanko, K. (2012). Hadoop-
BAM: directly manipulating next generation sequenc-
ing data in the cloud. Bioinformatics, 28(6):876–877.
Nussbaum, R., McInnes, R., and Willard, H. (2013).
Thompson Genetics in Medicine. Elsevier Science
Publishers B. V., 7th edition.
O’Driscoll, A., Daugelaite, J., and Sleator, R. D. (2014).
Big data, Hadoop and cloud computing in genomics.
Journal of Biomedical Informatics, 46(5):774–781.
Sawyer, S. A., Parsch, J., Zhang, Z., and Hartl, D. L. (2007).
Prevalence of positive selection among nearly neutral
amino acid replacements in Drosophila. Proceedings
of the National Academy of Sciences, 104(16):6504–
6510.
Schatz, M. C., Langmead, B., and Salzberg, S. L. (2010).
Cloud computing and the DNA data race. NATURE
BIOTECHNOLOGY, 28(7):691–693.
Scientific, T. F. (2014). Choose Next-Generation Sequenc-
ing or Sanger Sequencing Solutions. Technical report.
White, T. (2012). Hadoop - The Definitive Guide, volume 1.
OReilly Media, Inc., 3rd edition.
William J, T. and Palladino, M. A. (2012). Introduction to
Biotechnology, volume 1. Pearson, 3rd edition.
Zou, Q., Li, X.-B., Jiang, W.-R., Lin, Z.-Y., Li, G.-L., and
Chen, K. (2014). Survey of MapReduce frame oper-
ation in bioinformatics. Briefings in Bioinformatics,
15(4):637–647.
ICEIS2015-17thInternationalConferenceonEnterpriseInformationSystems
286