Authors:
H. V. Byelas
;
M. Dijkstra
and
M. A. Swertz
Affiliation:
University Medical Center Groningen and University of Groningen, Netherlands
Keyword(s):
Bioinformatics, Workflow Management system, Data provenance, High performance computing.
Related
Ontology
Subjects/Areas/Topics:
Algorithms and Software Tools
;
Artificial Intelligence
;
Bioinformatics
;
Biomedical Engineering
;
Computational Intelligence
;
Databases and Data Management
;
Next Generation Sequencing
;
Soft Computing
;
Web Services in Bioinformatics
Abstract:
Running bioinformatics analyses in a distributed computational environment and monitoring their executions
has become a huge challenge due to the size of data and complexity of analysis workflows. Some attempts
have been made to combine computational and data management in a single solution using the MOLGENIS
software generator. However, it was not clear how to explicitly specify output data for a particular research,
evaluate its quality or possibly repeat the analysis depending on results. We present here a new version of a
MOLGENIS computational framework for bioinformatics, which reflects lessons learnt and new requirements
from end users. We have improved our initial solution in two ways. First, we propose a new data model,
which describes a workflow as a graph in a relational database, where nodes are analysis operations and edges
are transactions between them. Inputs and outputs of the workflow nodes are explicitly specified. Second,
we have extended the executional logic to t
race data, show how final results were created and how to handle
errors in the distributed environment. We illustrate system applications on several analysis workflows for next
generation sequencing.
(More)