Visualization of Bioinformatics Workﬂows for Ease of Understanding

and Design Activities

H. V. Byelas and M. A. Swertz

Genomics Coordination Center, Department of Genetics, University Medical Center Groningen,

University of Groningen, Groningen, The Netherlands

Keywords:

Bioinformatics: Workﬂow Management System, Life Science Workﬂows, Workﬂow Visualization.

Abstract:

Bioinformatics analyses are growing in size and complexity. They are often described as workﬂows, with the

workﬂow speciﬁcations also becoming more complex due to the diversity of data, tools, and computational

resources involved. A number of workﬂow management systems (WMS) have been developed recently to

help bioinformaticians in their workﬂow design activities. Many of these WMS visualize workﬂows as graphs,

where the nodes are analysis steps and the edges are interactions and constraints between analysis steps. These

graphs usually represent a data ﬂow of the analysis. We know that in software visualization, similar graphs are

used to show a data ﬂow in software systems. However, the WMS do not use any widely accepted standards for

workﬂow visualization, particularly not in the bioinformatics domain. As a result, workﬂows are visualized

in different ways in different WMS and workﬂows describing the same analysis look different in different

WMS. Furthermore, the visualization techniques used in WMS for bioinformatics are quite limited. Here, we

argue that applying some of the visual analytics methods and techniques used in software ﬁeld, such as UML

(uniﬁed modelling language) diagrams combined with quality metrics, can help to enhance understanding and

sharing of the workﬂow, and ease workﬂow analysis and design activities.

1 INTRODUCTION

Software structure has been depicted with design di-

agrams since the very start of computer program-

ming (Diehl, 2007). Furthermore, control-ﬂow

graphs (Goldstine and von Neumann, 1947) were

among the ﬁrst kinds of software diagrams and are

very similar to the data-ﬂow graphs used in the bioin-

formatics domain. Every diagram typically empha-

sises a particular aspect of a software system, such

as a software static model or the time ordering of

messages between software components. Still, many

diagram elements of a diverse nature can occur in

the same diagram. An effective system understand-

ing requires ways to correlate diagram elements and

software metrics, which represent software quality at-

tributes, in a single view.

Visualizations that combine software structure

and software attributes are arguably among the most

universal types of software visualizations, and among

the ﬁrst that were proposed in the history of software

visualization (Diehl, 2007; Spence, 2007). A good

understanding of the structure of a potentially large,

complex, and relatively unfamiliar software system is

best served by visualizing the structure.

Adding attributes such as ”quality” metrics to

this picture helps correlate the various quantita-

tive insights with structural and architectural in-

sights (Lanza and Marinescu, 2006). In some cases,

software designers also identify and use groups of el-

ements in the system analysis without constructing a

separate diagram. These group of elements can also

have their own group-level metrics (Byelas and Telea,

2009).

An increase of size and complexity in bioinfor-

matics analyses means there is now a need to use

advanced visualization techniques to depict analysis

structure and workﬂow ”quality” metrics. If the struc-

ture of the analysis is represented by the graph, both

graph nodes and relations can accommodate several

”quality” metrics. These metrics can also be deﬁned

at different levels of detail, such as groups of graph el-

ements or sub-members of them. Furthermore, work-

ﬂow visualization should be uniﬁed in some way to

improve workﬂow sharing between people and inter-

changing between different WMS. Some attempts to

support workﬂows interoperability were done in the

SHIWA project (SHIWA, 2012), however, to the best

116

V. Byelas H. and A. Swertz M..

Visualization of Bioinformatics Workﬂows for Ease of Understanding and Design Activities.

DOI: 10.5220/0004195301160121

In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms (BIOINFORMATICS-2013), pages 116-121

ISBN: 978-989-8565-35-8

 2013 SCITEPRESS (Science and Technology Publications, Lda.)

of our knowledge, a question of the visual workﬂow

representation was not addressed in that project.

After interviewing the ﬁnal WMS users in our

team, which are researchers and Ph.D. students, we

identiﬁed main ”quality” metrics for bioinformatics

workﬂows, that they are interested in:

• a number of parallel executions of workﬂow ele-

ments, which is usually the case if the workﬂow

is developed to run in a computational cluster or

grid;

• sizes of input and output data;

• an analysis execution time;

• tools used in an analysis, their dependencies and

the operation systems used;

• resources required to run an analysis (e.g. CPU,

RAM, storage requirements).

Besides showing these workﬂow attributes to-

gether with the workﬂow structure, users want to see

workﬂow behaviour and evolution. We identiﬁed sev-

eral views on workﬂows, which are most required in

Life sciences. These are:

1. a structural overview of workﬂow elements show-

ing or hiding ”quality” metrics; and zooming into

to a particular element or a group thereof and its

dependencies combined with ”quality” metrics;

2. an overview of how different parameters inﬂuence

the results of analysis; and it is often the case,

when bioinformatics workﬂows should be re-run

many times with different analysis parameters;

3. a workﬂow evolution view, showing what ele-

ments were introduced or removed, when and who

made these changes;

4. a run-time execution monitor, which shows the

workﬂow progress information; and a statistical

overview of workﬂow runs, success/failure rates,

use of tools and user statistics.

The ﬁrst three views are the design views and

the last one is the execution view of the workﬂow.

All these views should be readily understandable and

scalable.

Here, we ﬁrst give some examples of visualiza-

tions used in the WMS and speciﬁc for bioinformat-

ics and in generic WMS (Section 2). Then, we dis-

cuss visualization techniques to show software struc-

ture combined with software metrics (Section 3). Fi-

nally, we present our conclusions and outline poten-

tial directions for future work (Section 4).

2 WORKFLOW VISUALIZATION

IN POPULAR WMS

Galaxy (Blankenberg and Taylor, 2007) is one of the

popular WMS for bioinformatics. It is a web envi-

ronment, where users can create workﬂows by com-

bining a large variety of bioinformatics tools. Figure

1 shows an example of the proteomics workﬂow cre-

ated and used by Berend Hoekman.

Figure 1: An example of a workﬂow visualization in

Galaxy (Blankenberg and Taylor, 2007).

The workﬂow graph represents the data ﬂow of the

proteomics analysis used in the University Medical

Centre Groningen (UMCG), Netherlands. The graph

edges connect analysis steps from the workﬂow input

(in the left-top corner) to output (in the left-bottom)

showing the data ﬂow. Input and output data ﬁles are

listed in the body of the graph node icons. If you

are not an expert in this workﬂow, it is difﬁcult to

evaluate a run time for the whole analysis, sizes of

data, complexity of tools conﬁguration etc. There is

no visual difference between the workﬂow elements

and it is difﬁcult to distinguish those elements that are

data nodes (i.e. workﬂow inputs/outputs) or process-

ing analysis operations. Furthermore, parallelism in

workﬂow design/execution can be shown in Galaxy

by creating separate workﬂow nodes (i.e. one node

per every parallel execution of the workﬂow element),

that is a good solution if we consider up few parallel

execution. However, this solution does not work with

hundreds or thousands or parallel executions of the

workﬂow element.

Another popular generic WMS is Taverna (Oinn

and Greenwood, 2005). This is a suite of tools to de-

sign and execute workﬂows. It allows users to inte-

grate third-party software tools, which are described

as web services, into workﬂows. An example of a

simple workﬂow that retrieves a weather forecast for

a speciﬁed city is shown in Figure 2.

A workﬂows is presented as graphs constructed

using the Taverna visual language. This graph (Fig.

2) also shows the data ﬂow from top to bottom. Here,

colours are used to show the nature of graph elements.

We can clearly distinguish between data elements and

VisualizationofBioinformaticsWorkflowsforEaseofUnderstandingandDesignActivities

117

Figure 2: An example of a workﬂow visualization in Tav-

erna (Oinn and Greenwood, 2005).

Figure 3: An example of a nested workﬂow in Taverna

(Oinn and Greenwood, 2005).

Taverna services, although, the colour pattern is not

really intuitive.

Workﬂows in Taverna can have conditional

branches and loops, which are not widely used in

bioinformatics, where actual analysis scripts can con-

tain conditional statements as a part of the analy-

sis script. In Taverna, workﬂows can also be nested

into other workﬂows, which makes the workﬂows re-

usable and easier to maintain. An example of a larger

workﬂow with nesting (developed by Eric Vervisch)

with nesting is show in Figure 3. Here, the nested

workﬂow is surrounded by a rectangle and an addi-

tional operation is shown outside of it. This operation

can be e.g. a special input data preparation We can

treat this rectangle as showing one ”quality” metric of

the elements in it, but the analysis properties of the

workﬂow elements can not be seen in such a work-

ﬂow diagram.

3 TECHNIQUES THAT CAN

ENHANCE VISUALIZATION OF

WORKFLOW STRUCTURE

Since there are so many structure-and-attribute vi-

sualizations, we outline here the main common fea-

tures that can be re-used in workﬂow visualizations

for bioinformatics, and we present their strengths or

limitations. From our own experience, most such vi-

sualizations share two design elements:

• structure: software structure is typically depicted

by using a node-and-link graph metaphor, where

nodes are software entities, e.g. functions, classes,

components, or packages, and links are the rele-

vant (sub)set of considered relations, e.g. function

calls, data dependencies, associations, or inheri-

tance relations.

• attributes: software attributes are usually depicted

by mapping them to a visual attribute of the cor-

responding nodes or links in the structure visual-

ization. Visual attributes that can be used to show

software attributes are the position, size, shape,

colour, texture, lighting, line size, and annotations

of diagram elements.

The same node-and-link graph metaphor is used

for workﬂow visualizations, particularly in bioinfor-

matics. Below we discuss which visual attributes can

be re-used in bioinformatics workﬂow visualization.

Figure 4: Using visual mapping to visualize ”quality” met-

rics (Lanza and Ducasse, 2002).

In the lightweight software visualization frame-

work, CodeCrawler (Lanza and Ducasse, 2002),

”quality” metrics are visualized by mapping them to

the colour, height, width, and position of the element

box icons (Figure 4). With these few visual attributes,

it is possible to show, for example, analysis run time,

BIOINFORMATICS2013-InternationalConferenceonBioinformaticsModels,MethodsandAlgorithms

118

sizes of input/output data, and CPU/memory require-

ments, combined with the workﬂow structure. Hence,

the user can immediately get an idea about the re-

quirements for running these workﬂow analysis steps.

Figure 5: An example of a large software system in Code-

Crawler (Lanza and Ducasse, 2002).

An example of a large software system visualized

in CodeCrawler is shown in Figure 5. The system is

divided into several modules, which are surrounded

by rectangles. Here, a user can immediately see the

division and spot outliers, which are shown as bigger

rectangles with or without a colour. The same method

can be applied to show large workﬂows that consist of

many nested smaller workﬂows, and to emphasise the

most computationally intensive steps. Visualization

of both the analysis tool dependencies and the work-

ﬂow structure can be combined into the same diagram

using the method proposed in CodeCrawler method.

Another software visualization and exploration

tool, MetricView (Termeer et al., 2005), combines

traditional UML diagram visualization with metrics

visualization. In contrast to the technique discussed

above, MetricView uses existing and familiar to soft-

ware engineers UML diagrams as a basis for software

structure visualization 6. It follows the given dia-

grams layout and the positions and sizes of the dia-

gram elements are not changed. This has an important

advantage, because changing the diagram layout can

destroy the users ”mental map” and severely reduce

how easily it can be understood; this is a well known

fact in information visualization (see e.g. (Spence,

2006)). MetricView supports the visualization of met-

rics deﬁned on UML diagram elements. Metrics can

have boolean and numeric values; they are shown as

icons, drawn atop of the UML elements for which the

respective metrics are available.

Reusing this technique in workﬂow visualization

allows a smooth transition from the familiar struc-

tural representation of a workﬂow, as used in Galaxy

for example, to enhance the visualization of work-

ﬂow with quality metrics. Metric icons simply take

the space given by the Galaxy workﬂow diagram lay-

out. In other words, metric information is added to

Figure 6: An example of 2D UML class diagram visualized

with MetricView. (Termeer et al., 2005).

diagrams in a non-intrusive way and users keep their

”mental map” of the diagrams they are accustomed to

work with.

The technique proposed in MetricView can usu-

ally be applied to show the attributes that are related

to the whole workﬂow element. Besides visualization

on the workﬂow element level, the metric lens visu-

alization technique (Byelas and Telea, 2008) can be

used to show metrics on members of the workﬂow el-

ements. Let us look at a Galaxy visualization of a

workﬂow element in Figure 7.

Figure 7: An example of workﬂow element visualized with

Galaxy. (Blankenberg and Taylor, 2007).

An example workﬂow element has two inputs (i.e.

APML ﬁle and Experiment design ﬁle) and two out-

puts (i.e. log and expressionset). However, users are

not given any information about the sizes of these in-

puts/outputs or their nature from only the names. The

task becomes even more complex, if users want to see

not a single workﬂow element, but the whole work-

ﬂow or a part of it, and if they want to see the metric-

metric and metric-structure correlations of workﬂow

elements. To achieve this, the metric lens techniques

(Byelas and Telea, 2008) can be reused. It combines a

classical UML viewer with a visualization of method-

level metrics using an enhanced version of the well-

known table-lens technique (Rao and Card, 1994).

The basis of the metric lens technique is a tra-

ditional UML class diagram, which displays all its

VisualizationofBioinformaticsWorkflowsforEaseofUnderstandingandDesignActivities

119

Figure 8: Metric layout options in of metric lens (Byelas

and Telea, 2008).

data members within each class frame (see Figure 8).

Atop of this image, the metrics are displayed follow-

ing a table model, where the rows are methods and the

columns are metrics.

The metric icon table can be placed within the

class frames (Fig. 8 a,b), which yields a compact lay-

out but does not allow users to read the method names,

or on the right side of the class frames (Fig. 8 c),

which does not occlude the method names displayed

but yields a less compact layout. Different zooming

mechanisms allow users to focus on a speciﬁc dia-

grams subsystem, and to smoothly navigate between

seeing the entire contents of each class, as a set of

coloured bar graphs, and seeing the individual signa-

tures and names of methods and members. The same

technique can be applied to navigate through the the

workﬂow graph diagrams to spot the metric distri-

bution of workﬂow element members and any met-

ric value outliers, and help in the task of correlating

such outliers among themselves and with the work-

ﬂow structure.

4 CONCLUSIONS

We have presented a number of techniques that can

be used to enhance visual analyses of workﬂows in

WMS for bioinformatics, such as Galaxy or Taverna.

We have reported some of the differences and lim-

itations in these visualization techniques as used in

these two WMS. We have also shown that there is no

uniﬁed visual representation of workﬂows used in the

bioinformatics domain. However, the same data ﬂow

graphs can be used to describe workﬂows visually.

Recently, we added workﬂow management to the

existing data management built with the MOLGE-

NIS system (Swertz and Jansen, 2007) and (Swertz

and Jansen, 2010) to combine computational and data

management into a single system (Byelas and Swertz,

2011) and (Byelas and Swertz, 2012). We use MOL-

GENIS to auto-generate web-user interfaces for bi-

ologists and program interfaces for bioinformaticians

from a data model described in XML. Having the

database background generated from the model web

Figure 9: Showing workﬂow structure in the MOLGENIS

framework (Byelas and Swertz, 2011).

user interface, it is not surprising that we chose to use

a simple table to show the workﬂow structure 9.

In the future, we will investigate ways of using the

visualization techniques described above for work-

ﬂow visualisation. We are planning to visualise a

workﬂow as a data ﬂow graph as in Galaxy, but we

want to advance its visualization by adding ”quality”

metrics. As the result, we expect to achieve a mul-

tiscale visualization similar to ones, that is used in

geographical data visualization systems, such as e.g.

Google Maps (Google Inc., 2012). In such a way,

workﬂow ”quality” metrics can be shown instead of

photos and temperature in Figure 10.

Figure 10: An example of multiscale visualization from

Google Maps (Google Inc., 2012).

In this paper, we describe techniques which can

enhance visual analysis of workﬂow structure. Be-

sides, we want to enable users to get insight into

workﬂow behaviour (run workﬂows, understand how

parameters inﬂuence their output and reﬁne the pa-

rameter space to achieve desired results) and evolu-

tion (detecting workﬂow changes over time). For

behaviour, we will adapt multidimentional scaling

(Borg and Groenen, 2005) and parallel coordinates

(Inselberg, 2009). For evolution, we will use time-

lines (Grafton and Rosenberg, 2010) visualizations to

show how workﬂow structure and parameters change

in time. Finally, we are planning to validate these vi-

sualization approaches by case studies on real-world

workﬂows, that we use in our analyses.

BIOINFORMATICS2013-InternationalConferenceonBioinformaticsModels,MethodsandAlgorithms

120

REFERENCES

Blankenberg, D. and Taylor, J. (2007). A framework for col-

laborative analysis of encode data: making large-scale

analyses biologist-friendly. Genome Res., 17:6:960 –

Borg, I. and Groenen, P. (2005). Modern Multidimensional

Scaling: theory and applications (2nd ed.). New York:

Springer-Verlag.

Byelas, H. and Swertz, M. (2011). Towards a molgenis

based computational framework. in proceedings of

the 19th EUROMICRO International Conference on

Parallel, Distributed and Network-Based Computing,

pages 331–339.

Byelas, H. and Swertz, M. (2012). Introducing data prove-

nance and error handling for ngs workﬂows within the

molgenis computational framework. in proceedings of

the BIOSTEC BIOINFORMATICS-2012 conference,

pages 42–50.

Byelas, H. and Telea, A. (2008). The metric lens: Visu-

alizing metrics and structure on software diagrams.

in Proceedings of the 16th Working Conference on

Reverse Engineering, Antwerp, Belgium, pages 339–

340.

Byelas, H. and Telea, A. (2009). Visualizing metrics on

areas of interest in software architecture diagrams. in

Proceedings of the Paciﬁc Visualization Symposium,

Beijing, China, pages 33–40.

Diehl, S. (2007). Software Visualization - Visualizing

the Structure, Behaviour, and Evolution of Software.

Springer.

Goldstine, H. and von Neumann, J. (1947). Planning and

coding of problems for an electronic computing in-

strument. Part II, volume I of a report prepared for

the U.S. Army Ord. Dept.

Google Inc. (2012). Google maps. http://maps.google.com/.

Grafton, A. and Rosenberg, D. (2010). Cartographies of

Time: A History of the Timeline. Princeton Architec-

tural Press.

Inselberg, A. (2009). Parallel Coordinates: VISUAL Multi-

dimensional Geometry and its Applications. Springer.

Lanza, M. and Ducasse, S. (2002). Understanding software

evolution using a combination of software visualiza-

tion and software metrics. In Proc. of LMO.

Lanza, M. and Marinescu, R. (2006). Object-Oriented Met-

rics in Practice - Using Software Metrics to Charac-

terize, Evaluate, and Improve the Design of Object-

Oriented Systems. Springer.

Oinn, T. and Greenwood, M. (2005). Taverna: lessons in

creating a workﬂow environment for the life sciences.

Concurrency and Computation: Practice and Experi-

ence, 18:10:1067 – 1100.

Rao, R. and Card, S. (1994). The table lens: Merging graph-

ical and symbolic representations in an interactive fo-

cus+context visualization for tabular information. In

Proc. CHI, pages 222–230. ACM.

SHIWA (2012). Sharing interoperable workﬂows for

large-scale scientiﬁc simulations on available dcis.

http://www.shiwa-workﬂow.eu/.

Spence, R. (2006). Information Visualization. ACM. Press.

Spence, R. (2007). Information Visualization: Design for

Interaction (2

ed.). Prentice Hall.

Swertz, M. and Jansen, R. (2007). Beyond standardization:

dynamic software infrastructures for systems biology.

Nature Reviews Genetics, 8:3:235–43.

Swertz, M. and Jansen, R. (2010). The molgenis toolkit:

rapid prototyping of biosoftware at the push of a but-

ton. BMC Bioinformatics, 11:12.

Termeer, M., Lange, C., Telea, A., and Chaudron, M.

(2005). Visual exploration of combined architectural

and metric information. In Proc. VISSOFT, pages 21–

26. IEEE Press.

VisualizationofBioinformaticsWorkflowsforEaseofUnderstandingandDesignActivities

121