used in the future. To execute simulations for another
family of proteins the user should just change the se-
quences data in the gridbean for Clustal and resubmit
the workflow.
During the protein sequence analysis scientists
usually look for motifs. These are small con-
served regions which have functional and structural
significance. However, regions with a high num-
ber of changes are responsible for the specificity of
molecules. Shannon entropy as the measure of uncer-
tainty in a data set is a good indicator of variability
(Bui et al., 2007). Entropy can be calculated in the
R environment in an easy way using aaMI package
(Wollenberg, 2005).
The article (Bui et al., 2007) provides an example
of such an application. Its authors analyze the are-
navirus protein sequence variability to identify con-
served regions that could be targeted for development
of a universal renaviral vaccine. They looked also for
high variable regions which could be helpful in diag-
nosis. To do this they performed multiple sequence
alignments of chosen proteins using ClustalW pro-
gram and calculated Shannon entropy. Fig. 2 presents
an example of a workflow performing similar tasks.
Of course workflows can be much more compli-
cated. The UNICORE can handle workflows with
thousands of elements and dependencies. With the
help of an editor it is very easy to create even so com-
plex simulations.
7 CONCLUSIONS
In this paper authors presented plugin designed for
statistical R environment. It makes it possible to
analyze and process data from many scientific ap-
plications, not only limited for molecular ones like
BLAST, Clustal or NAMD. Being used as a part of
workflow, it plays crucial role in experiment conclu-
sions. The workflow systems can be very useful for
scientists. With the help of special editors, like the
one in UNICORE middleware, workflow construction
is intuitive and user-friendly. An additional advan-
tage is the reduction of frequency of human errors.
Once designed workflow can by used for different
data. This automates the process of experiment en-
abling the scientists to focus only on results and con-
clusions.
ACKNOWLEDGEMENTS
This work was supported by European Commission
under IST grant Chemomentum (No. 033437) and
the European Social Fund with the National Budget
of the Republic of Poland under the Integrated Re-
gional Development Operational Programme, Objec-
tive 2.6 ,,Regional Innovation Strategies and trans-
fer of knowledge“ project of Kujawsko-Pomorskie
Province ,,Scholarships for PhD Students 2008/2009
- ZPORR“.
REFERENCES
Borcz, M., Kluszczy
´
nski, R., and Bała, P. (2007). BLAST
Application on the GPE/UnicoreGS Grid. In et al., L.,
editor, Euro-Par 2006 Workshops: Parallel Process-
ing, volume 4967 of LNCS, pages 245–253. Springer
Berlin / Heidelberg.
Bui, H., Botten, J., Fusseder, N., Pasquetto, V., Mothe,
B., Buchmeier, M., and Sette, A. (2007). Protein
sequence database for pathogenic arenaviruses. Im-
munome Research, 3.
Fox, G. and Gannon, D. (2006). Special issue: Workflow in
grid systems. Concurrency and Computation: Prac-
tice and Experience, 18(10):1009–1019.
Grose, D., Crouchley, R., van Ark, T., Kewley, J., Allan,
R., Braimah, A., and Hayes, M. (2006). sabreR: Grid-
enabling the analysis of multi-process random effect
response data in R. Proc. Second International Con-
ference on e-Social Science.
Huerta, M., Haseltine, F., Liu, Y., Downing, G., and Seto,
B. (2000). NIH working definition of Bioinformatics
and Computational Biology.
Kluszczy
´
nski, R. and Bała, P. (2008). Supporting NAMD
Application on the Grid using GPE. In et al., W., ed-
itor, PPAM 2007, volume 4967 of LNCS, pages 762–
769. Springer Berlin / Heidelberg.
Kluszczy
´
nski, R. and Bała, P. (2009). Supporting Clustal
Application on the UNICORE Grid. Polish Journal of
Environmental Studies, 18(3B):165–169.
R Development Core Team (2005). R: A Language and
Environment for Statistical Computing. R Founda-
tion for Statistical Computing, Vienna, Austria. ISBN:
3-900051-07-0.
Ratering, R. (2005). Grid Programming Environment
(GPE) Concepts. GPE documentation.
Streit, A. (2009). UNICORE: Getting to the heart of Grid
technologies. eStrategies, Projects, 9th edition, pages
8–9.
Wegenera, D., Sengstag, T., Sfakianakis, S., Rpinga, S., and
Assi, A. (2009). GridR: An R-based tool for scientific
data analysis in grid environments. Future Generation
Computer Systems, 25:481–488.
Wollenberg, K. (2005). Mutual information for protein se-
quence alignments. Package ’aaMI’ for R environ-
ment.
Yu, J. and Buyya, R. (2005). A Taxonomy of Scientific
Workflow Systems for Grid Computing. SIGMOD
Record, 34(3):44–49.
BIOINFORMATICS 2010 - International Conference on Bioinformatics
220