VISUAL ANALYTICS OF MULTIMODAL BIOLOGICAL DATA

Hendrik Rohn, Christian Klukas

Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Germany

Falk Schreiber

Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Germany

Institute of Computer Science, Martin Luther University, Halle-Wittenberg, Germany

Keywords:

Visual analytics, Biological data, Integrative visualization.

Abstract:

Biological data is measured in increasing quantity and quality, resulting in data describing biological systems

from different perspectives. Based on data integration methods, visual data mining and visual analytics can

be used to promote the understanding of combined biological data and facilitate the exploration process. In

this paper a number of view types are presented and integrated into a comprehensive software tool, in order

to support researchers in visualizing ﬂexible combinations of multimodal biological data and to create inte-

grated views on comprehensive datasets spanning multiple “omics” areas. A number of interaction techniques

accompany these views, enabling the efﬁcient exploration of the data.

1 BACKGROUND

1.1 Introduction

Modern data acquisition methods facilitate re-

searchers to obtain data of biological systems in in-

creasing quantity and quality. This data describes

biological systems at different resolutions and from

different perspectives, facilitating a comprehensive

view onto the biological system. Especially of im-

portance are the “omics” areas such as the genome,

proteome and metabolome, which are gathered in ex-

ponentially increasing amounts. In addition, modern

image acquisition methods make it possible to obtain

spatial information, such as volumetric- and image-

based data. Structural and process information such

as metabolic networks is used to describe biological

systems from a mechanistic perspective. As all data

represents different views onto the same object, data

integration methods aim in bringing all available data

of one system together into one application.

Data integration of such diverse data types is an

ongoing research area. For example, it can be im-

plemented by the approach described in (Rohn et al.,

2009). Powerful tools are needed to be able to under-

stand complex and ﬂexible combinations of systems

biological data. These tools are based on advanced

visual data mining and analysis methods which reveal

the relations of real-world-data of biological systems

and are therefore essential for systems biological re-

search. In this paper we present a suitable set of vi-

sualization and interaction methods of combined bio-

logical data enabling researchers to visually analyze,

explore and navigate through combined omics-data,

networks, images and volumes.

1.2 Data Integration

The model for representing the biological data, which

is used for data integration, was described in (Rohn

et al., 2009). It contains four types of biologi-

cal data (called measurements): “simple measure-

ments” representing numeric measurement data ob-

tained for the areas genomics, transcriptomics, pro-

teomics and metabolomics, “images” representing

two-dimensional spatial information, “volumes” rep-

resenting three-dimensional spatial data and “net-

works” describing structural properties of biological

systems. The data model enables to specify annota-

tion information of measurements. This meta data de-

scribes further information about each measurement,

such as experiment coordinator, genotype and species

of the investigated organism, developmental stages

and spatial attributes. By using the annotation, one

256

Rohn H., Klukas C. and Schreiber F..

VISUAL ANALYTICS OF MULTIMODAL BIOLOGICAL DATA.

DOI: 10.5220/0003354202560261

In Proceedings of the International Conference on Imaging Theory and Applications and International Conference on Information Visualization Theory

and Applications (IVAPP-2011), pages 256-261

ISBN: 978-989-8425-46-1

 2011 SCITEPRESS (Science and Technology Publications, Lda.)

a: simple

measurement

c: image

d: volume

b: network

bb dd

ad bc

Figure 1: The structure of the MappingGraph, adapted

from (Rohn et al., 2009). Four nodes contain all integrated

measurements. Measurements may be ﬂexibly combined

into a mapping, represented as a new node in the Mapping-

Graph. For example, “bb” represents the mapping of im-

ages onto images and the node “bd” represents a mapping

of networks onto volumes. Note, that more mappings are

possible, e. g. “abc”.

is able to bring measurements from different experi-

ments into context of each other and explore their re-

lations.

The data integration is mainly based on a graph

structure called MappingGraph (see also Figure 1).

All data integrated in the system is split into the four

measurement types and all measurements of one type

are cumulated in one of four special nodes in the

MappingGraph. By selecting any number of nodes

in the MappingGraph, the user combines the selected

measurements in a so-called mapping. Mappings are

combined measurements and represented as a new

node in the MappingGraph. As such nodes they

may serve as a source for new data mapping proce-

dures. These mappings can be visualized in multi-

farious ways, including interactions to modify view

attributes and manipulate data.

2 METHODS

To be able to explore combined measurements, dif-

ferent views are presented in this section. They are

designed to visualize different combinations of multi-

modal biological data. The views provide several in-

teraction possibilities in order to be able to alter view

properties and manipulate the underlying data.

2.1 3-D View

The 3-D View makes it possible to visualize all four

measurement types in three dimensions.

Figure 2: Screenshot of the 3-D View visualizing a three-

dimensional human brain volume, a two-dimensional PET

image in the human brain and the human glycolysis path-

way in three-dimensional space.

The most computationally demanding visualiza-

tion is to render typical volumetric data sets (< 50

million voxels) at interactive frame rates in three

dimensions. This rendering is achieved based on

SPECTUS3D (McGonigle, 2006), a slice-based vol-

ume renderer (Swan and Yagel, 1993). The rendering

algorithm generates a stack of planes through the vol-

ume in three orthogonal directions and aligns these

planes in the three-dimensional space. Therefore, in-

stead of visualizing single voxels, three orthogonal

aligned pixels represent one voxel. Transparency ef-

fects are applied to the planes and can be changed us-

ing sliders. Besides the general plane transparency,

single planes may be highlighted (by setting the plane

opaque) and cut-offs accomplished (by setting a set

of planes fully transparent). In case of a gray-value

volume, a set of color maps can be applied permit-

ting to highlight interesting regions or to generate an

appealing appearance (Moodley and Murrell, 2004).

Segmented volumes are also supported by highlight-

ing or hiding segments in reaction to user input. These

segments may serve as a backbone for spatial naviga-

tion, e. g. selecting a tissue to trigger the visualization

of the corresponding tissue-speciﬁc pathway. Some

planes may be skipped to achieve higher frame rates

or stretched to implement non-isotropic voxels.

Similar to the planes used for rendering volumet-

ric data, images are visualized by applying the image

data onto a textured plane. Images may be resized on

user request and texture transparency can be applied.

Segmented images work the same way as volumes, as

the user is able to select segments and hide or delete

these segments.

VISUAL ANALYTICS OF MULTIMODAL BIOLOGICAL DATA

257

Networks can also be represented in the 3-D View.

Nodes are implemented as spheres, cuboids or cylin-

ders, whereas edges are represented either by a cone

and a cylinder or, at user choice, as a primitive line.

Both graph-element types support transparency and

changing of colors. The three-dimensional represen-

tation of networks support also visualization of omics

data, similar to the diagrams in the graph view. At the

moment, omics data is mapped to nodes and visual-

ized using embedded diagrams.

All measurement representations may be rotated

and translated as needed. An example screenshot is

shown in Figure 2.

2.2 Graph View

The Graph View visualizes data of the types network

and simple measurement in two dimensions.

In contrast to image based pathway visualization

systems, such as KEGG (Kanehisa and Goto, 2000)

and MAPMAN (Thimm et al., 2004), dynamic edit-

ing of networks is supported. It is possible to con-

struct or edit networks manually with integrated edi-

tor functions. The visualization of experimental data

within the network context is implemented by embed-

ding line or bar charts inside the network nodes or by

positioning these diagrams on top of the graph edges.

The drawing style of the diagrams may be interac-

tively modiﬁed with a number of parameters such as

series colors, the display of range or category labels,

and line widths. Besides networks, the view is able

to visualize experiment data as hierarchies (Klukas

and Schreiber, 2010; Sharbel et al., 2010), by relat-

ing them to functional categories such as Gene Ontol-

ogy (Ashburner et al., 2000) and the KEGG BRITE

hierarchy (Kanehisa et al., 2006). Graphs may be ex-

ported as a website, containing diagrams and click-

able graph-elements, which may link to web-entries

in databases.

The Graph View supports an interaction tech-

nique, similar to the one described in (Klukas and

Schreiber, 2007). There, KEGG pathways may be

collapsed into a pathway overview-node. All edges

to and from these collapsed nodes will then point to

the overview-node, instead of single graph-elements.

Expanding such an overview-node results in replace-

ment of the node by the pathway’s graph-elements

and resetting the edges to the correct elements. In

our case, every network may be collapsed into an

overview-node and expanded again. To improve lu-

cidity, all edges between two networks are bundled

together, similar to the method described in (Gansner

and Koren, 2007; Holten and Wijk, 2009). This edge

bundling facilitates visual tracking of single edges,

Figure 3: Screenshot of the Graph View visualizing net-

works in two dimensions. Networks may be expanded and

collapsed and omics data may be mapped to the graph-

elements. Note the edge bundling caused be expanding

overview-nodes.

Figure 4: Screenshot of the Image View visualizing a seg-

mented barley cross-section in two dimensions. The la-

belﬁeld is blended with the source image and one segment

is highlighted in red. The user also selected a region (ma-

genta) for graphical querying.

but at the same time maintains a good overview of

the general trend of network interconnections.

An example screenshot is shown in Figure 3.

2.3 Image View

The Image View is able to visualize data of the types

volumes and images in two dimensions.

Images are displayed by drawing the pixels di-

rectly onto the screen and may be scaled to ﬁt differ-

ent monitor sizes. Segmentation information display

is supported by utilizing a blending effect between

IVAPP 2011 - International Conference on Information Visualization Theory and Applications

258

the source image and the labelﬁeld image. The user

may choose the blending factor in order to observe

the real image, the labelﬁeld or both at the same time.

This can be used to check the segmentation quality

or to look up the corresponding segment for single

pixels. The Image View is able to handle a stack of

images by providing a slider, which determines the

displayed image, similar to the approach described

by (Abramoff et al., 2004). If the images share for

example a spatial or temporal relation, dragging the

slider helps to catch these relations during the anima-

tion. Volumetric data is represented as a stack of im-

ages, which is generated by traversing the volume in

z-direction.

A special interaction technique is the intuitive

graphical triggering of spatial queries based on seg-

mentation information, similar to (Davidson et al.,

1997): The user is able to select a spatial region of

the image by drawing with the mouse directly onto

the image. All regions covered by this operation are

highlighted and analyzed in order to trigger a query in

the integrated data, resulting in a set of measurements

present in this segment.

An example screenshot of the Image View is

shown in Figure 4.

2.4 Additional Views

Besides the three presented commonly used views

there are a number of other views, which are usually

strongly use case oriented or work only for predeﬁned

measurement combinations. In the following we de-

scribe three of these view types, but many more are

possible.

2.4.1 Brushing View

This view enables users to utilize the interaction tech-

nique brushing (Eick and Wills, 1995) in order to ex-

plore spatial related experimental datasets. It is di-

vided into two parts: One part visualizes a segmented

image, which will be used as the navigational back-

bone. The other part comprises a Graph View, show-

ing a network and associated simple measurements.

The user is able to hover the mouse over the image

segments of interest. The network visualization re-

acts to this events by highlighting or displaying only

data, which was measured in this corresponding seg-

ment. A biological use case for this view is to investi-

gate two-dimensional distribution of metabolic mea-

surements in an interactive way: If biologists are in-

terested in the state of the metabolism during the ex-

position in different oxygenic environments, the two-

dimensional oxygen distribution may serve as navi-

gational backbone for highlighting the corresponding

Figure 5: Screenshot of the Brushing View visualizing a

barley cross-section (together with spatial oxygen distribu-

tion) and a network with mapped measurements. The user

selects oxygen concentrations by hovering the mouse over

the image, triggering the highlighting of measurement data

in the network, which is speciﬁc for the selected oxygen

level. Note that the spatial concentration was discretized

into four speciﬁc oxygen levels, relating to the oxygenic

conditions of the measurement data.

data. An example screenshot for this view is shown

in Figure 5.

2.4.2 Scatterplot View

This view enables users to observe potentially corre-

lated substances. A matrix is build up by adding all

measurements of pairwise substances to each element

of the matrix. These elements are displayed in a well-

known scatterplot visualization, by plotting points for

VISUAL ANALYTICS OF MULTIMODAL BIOLOGICAL DATA

259

Figure 6: Screenshot of the Statistics View visualizing hu-

man gene expression rate values mapped onto a network as

a histogram. The user is able to select parts of the data (red

bars) and this selection will also be applied to the underly-

ing network.

pairwise measurement values. These displayed data

points may have different colors, indicating measure-

ments of different conditions.

2.4.3 Statistics View

The last view is depicted in Figure 6. This view shows

the distribution of graph-element attribute values as a

histogram. This view can be used to visually inspect

graph properties or experimental data mapped onto

networks. An example is the investigation of compre-

hensive gene expression data sets in order to perform

a quality check by recognizing the distribution of the

data, or by selecting and removing outlier values.

3 CONCLUSIONS

We described a number of views, which enables do-

main scientists to visualize and analyze integrated and

ﬂexibly combined biological data of different types.

Interaction techniques were developed to support do-

main scientists to visually explore their data. Many

of the described techniques and views are already im-

plemented in the HIVE add-on for VANTED. The next

version of the add-on will provide users all described

features. A video showing the described views and in-

teraction techniques is available at http://vanted.ipk-

gatersleben.de/hive ivapp11.

The set of visualization and interaction tools are

at the moment used in cooperation with domain ex-

perts, in order to create different integrated views

on datasets consisting of large-scale gene-expression

data, metabolic time-series data, microscopy im-

ages, photographs, volumes derived from NMR Spec-

troscopy and KEGG metabolic pathways. We were

not yet able to exploit the full capabilities of the pre-

sented approaches, as it is hard to ﬁnd comprehensive

experimental datasets of the same origin, biological

material and methods, which would ideally cover all

of the supported data domains at the same time. We

are giving the tools into the hands of researchers in or-

der to overcome this limitation. Based on their com-

ments and experiences in using the system we will

iteratively improve and extend the system as well as

the underlying methods, promoting the realization of

complex biological use cases.

ACKNOWLEDGEMENTS

This work was partly supported by grant BMBF

0315044A

REFERENCES

Abramoff, M., Magelhaes, P., and Ram, S. (2004). Image

processing with ImageJ. Biophotonics International,

11:36–42.

Ashburner, M., Ball, C., Blake, J., Botstein, D., Butler, H.,

Cherry, J., Davis, A., Dolinski, K., Dwight, S., Ep-

pig, J., et al. (2000). Gene ontology: tool for the uni-

ﬁcation of biology. The Gene Ontology Consortium.

Nature Genetics, 25(1):25–29.

Davidson, D., Bard, J., Brune, R., Burgerc, A., Dubreuil,

C., Hill, W., Kaufman, M., Quinn, J., Stark, M., and

Baldock, R. (1997). The mouse atlas and graphical

gene-expression database. Seminars in Cell & Devel-

opmental Biology, 8(5):509–517.

Eick, S. G. and Wills, G. J. (1995). High interaction

graphics. European Journal of Operations Research,

81(3):445–459.

Gansner, E. R. and Koren, Y. (2007). Improved circular lay-

outs. Lecture Notes in Computer Science, 4372:386–

398.

Holten, D. and Wijk, J. J. V. (2009). Force-directed edge

bundling for graph visualization. Computer Graphics

Forum, 28(3):983–990.

Kanehisa, M. and Goto, S. (2000). KEGG: Kyoto encyclo-

pedia of genes and genomes. Nucleic Acids Research,

28(1):27–30.

Kanehisa, M., Goto, S., Hattori, M., Aoki-Kinoshita, K. F.,

Itoh, M., Kawashima, S., Katayama, T., Araki, M.,

IVAPP 2011 - International Conference on Information Visualization Theory and Applications

260

and Hirakawa, M. (2006). From genomics to chemi-

cal genomics: new developments in KEGG. Nucleic

Acids Research, 34:D354–D357.

Klukas, C. and Schreiber, F. (2007). Dynamic exploration

and editing of KEGG pathway diagrams. Bioinfor-

matics, 23(3):344–350.

Klukas, C. and Schreiber, F. (2010). Integration of -omics

data and networks for biomedical research. Journal of

Integrative Bioinformatics, 7(2):112.1–6.

McGonigle, J. (2006). Java and 3D interactive image dis-

play. Master’s thesis, University of Aberdeen.

Moodley, K. and Murrell, H. (2004). A colour-map plu-

gin for the open source, Java based, image process-

ing package, ImageJ. Computers & Geosciences,

30(6):609–618.

Rohn, H., Klukas, C., and Schreiber, F. (2009). Integration

and visualisation of multimodal biological data. Lec-

ture Notes in Informatics, 157:105–115.

Sharbel, T. F., Voigt, M. L., Corral, J. M., Galla, G., Kum-

lehn, J., Klukas, C., Schreiber, F., Vogel, H., and

Rotter, B. (2010). Apomictic and sexual ovules of

Boechera display heterochronic global gene expres-

sion patterns. The Plant Cell, 22(3):655–671.

Swan, E. and Yagel, R. (1993). Slice-based volume ren-

dering. Technical report, The Advanced Computing

Center for the Arts and Design, The Ohio State Uni-

versity.

Thimm, O., Bl

asing, O., Gibon, Y., Nagel, A., Meyer, S.,

uger, P., Selbig, J., M

uller, L. A., Rhee, S. Y., and

Stitt, M. (2004). MAPMAN: a user-driven tool to dis-

play genomics data sets onto diagrams of metabolic

pathways and other biological processes. The Plant

Journal, 37:914–939.

VISUAL ANALYTICS OF MULTIMODAL BIOLOGICAL DATA

261