of our knowledge, a question of the visual workflow
representation was not addressed in that project.
After interviewing the final WMS users in our
team, which are researchers and Ph.D. students, we
identified main ”quality” metrics for bioinformatics
workflows, that they are interested in:
• a number of parallel executions of workflow ele-
ments, which is usually the case if the workflow
is developed to run in a computational cluster or
grid;
• sizes of input and output data;
• an analysis execution time;
• tools used in an analysis, their dependencies and
the operation systems used;
• resources required to run an analysis (e.g. CPU,
RAM, storage requirements).
Besides showing these workflow attributes to-
gether with the workflow structure, users want to see
workflow behaviour and evolution. We identified sev-
eral views on workflows, which are most required in
Life sciences. These are:
1. a structural overview of workflow elements show-
ing or hiding ”quality” metrics; and zooming into
to a particular element or a group thereof and its
dependencies combined with ”quality” metrics;
2. an overview of how different parameters influence
the results of analysis; and it is often the case,
when bioinformatics workflows should be re-run
many times with different analysis parameters;
3. a workflow evolution view, showing what ele-
ments were introduced or removed, when and who
made these changes;
4. a run-time execution monitor, which shows the
workflow progress information; and a statistical
overview of workflow runs, success/failure rates,
use of tools and user statistics.
The first three views are the design views and
the last one is the execution view of the workflow.
All these views should be readily understandable and
scalable.
Here, we first give some examples of visualiza-
tions used in the WMS and specific for bioinformat-
ics and in generic WMS (Section 2). Then, we dis-
cuss visualization techniques to show software struc-
ture combined with software metrics (Section 3). Fi-
nally, we present our conclusions and outline poten-
tial directions for future work (Section 4).
2 WORKFLOW VISUALIZATION
IN POPULAR WMS
Galaxy (Blankenberg and Taylor, 2007) is one of the
popular WMS for bioinformatics. It is a web envi-
ronment, where users can create workflows by com-
bining a large variety of bioinformatics tools. Figure
1 shows an example of the proteomics workflow cre-
ated and used by Berend Hoekman.
Figure 1: An example of a workflow visualization in
Galaxy (Blankenberg and Taylor, 2007).
The workflow graph represents the data flow of the
proteomics analysis used in the University Medical
Centre Groningen (UMCG), Netherlands. The graph
edges connect analysis steps from the workflow input
(in the left-top corner) to output (in the left-bottom)
showing the data flow. Input and output data files are
listed in the body of the graph node icons. If you
are not an expert in this workflow, it is difficult to
evaluate a run time for the whole analysis, sizes of
data, complexity of tools configuration etc. There is
no visual difference between the workflow elements
and it is difficult to distinguish those elements that are
data nodes (i.e. workflow inputs/outputs) or process-
ing analysis operations. Furthermore, parallelism in
workflow design/execution can be shown in Galaxy
by creating separate workflow nodes (i.e. one node
per every parallel execution of the workflow element),
that is a good solution if we consider up few parallel
execution. However, this solution does not work with
hundreds or thousands or parallel executions of the
workflow element.
Another popular generic WMS is Taverna (Oinn
and Greenwood, 2005). This is a suite of tools to de-
sign and execute workflows. It allows users to inte-
grate third-party software tools, which are described
as web services, into workflows. An example of a
simple workflow that retrieves a weather forecast for
a specified city is shown in Figure 2.
A workflows is presented as graphs constructed
using the Taverna visual language. This graph (Fig.
2) also shows the data flow from top to bottom. Here,
colours are used to show the nature of graph elements.
We can clearly distinguish between data elements and
VisualizationofBioinformaticsWorkflowsforEaseofUnderstandingandDesignActivities
117