EVALUATION OF A BUNDLING TECHNIQUE FOR PARALLEL

COORDINATES

Julian Heinrich

, Yuan Luo

, Arthur E. Kirkpatrick

and Daniel Weiskopf

VISUS, University of Stuttgart, Stuttgart, Germany

GrUVi (Graphics, Usability, and Visualization) Lab, School of Computing Science, Simon Fraser University,

Burnaby, Canada

Keywords:

Visualization Techniques, Parallel Coordinates, Cluster Visualization, Evaluation.

Abstract:

We present a controlled user study evaluating the effectiveness of bundled curve representations in parallel-

coordinates plots. Replacing the traditional C

polygonal lines by C

continuous piecewise B´ezier curves

makes it easier to visually trace data points through each coordinate axis. The resulting B´ezier curves can then

be bundled to visualize data with given cluster structures. Our results show that: 1) compared to polygonal

lines, bundled curves are equally capable of revealing correlations between neighboring data attributes; 2) the

geometric cues of bundles can be effective in displaying cluster information.

1 INTRODUCTION

Parallel coordinates are a popular technique for trans-

forming multidimensional data into a 2D image (In-

selberg, 1985; Inselberg and Dimsdale, 1990). The

m-dimensional data items are represented as 2D lines

crossing m parallel axes, each axis corresponding

to one dimension of the original data. This tech-

nique has been incorporated into several data visual-

ization and analysis tools, including XLSTAT

and

GGobi (Cook and Swayne, 2003). However, expe-

rience has shown several problems with the tradi-

tional parallel-coordinates technique. First, the zig-

zagging polygonal lines (or polylines, for short) used

for data representation are only C

continuous. They

generally lose visual continuation across the parallel-

coordinates axes, making it difﬁcult to follow lines

that share a common point along an axis—this is

known as the cross-over problem. Second, when two

or more data points havethe same or similar values for

a subset of the attributes, the corresponding polylines

may overlap and clutter the visualization. This artifact

may occur even for medium-sized datasets with a few

thousand points. Finally, clusters and related internal

structure of the data are not represented in the geom-

etry of the plot, except for implicit visual clustering

based on proximity of polylines at the axes.

Several solutions have been proposed for these pr-

http://www.xlstat.com

oblems. The cross-over problem has been mitigated

by replacing polylines with smooth curves (Theisel,

2000; Graham and Kennedy, 2003; Moustafa and

Wegman, 2006; Yuan et al., 2009; Holten and van

Wijk, 2010) that interpolate the original values at the

axes. Cluster perception in parallel coordinates has

been facilitated using edge bundling (Holten, 2006;

Zhou et al., 2008; McDonnell and Mueller, 2008;

Heinrich et al., 2011b), where curves of the same

cluster are grouped geometrically. In contrast to

the traditional color-coding of clusters, the resulting

curve bundles also reduce visual clutter by freeing up

plot space to provide an overview of the data.

While variants of polylines and curves have been

evaluated (see Table 1), no prior study evaluated the

joint effect of these two features on the perception of

clusters and correlations. To ﬁll this gap, we con-

ducted a controlled user study to compare the effec-

tiveness of polylines and curve bundling with respect

to cluster perception and correlation judgment.

The study showed that curve bundling maintains

the users’ ability to recognize correlation between

data attributes, a traditional strength of parallel co-

ordinates. Furthermore, for revealing clusters to the

user, curve bundling is at least on par with color cod-

ing, the traditional way of representing clusters. Fig-

ure 1 compares the polyline version of parallel coor-

dinates with a version using bundled curves.

594

Heinrich J., Luo Y., E. Kirkpatrick A. and Weiskopf D..

EVALUATION OF A BUNDLING TECHNIQUE FOR PARALLEL COORDINATES.

DOI: 10.5220/0003821205940602

In Proceedings of the International Conference on Computer Graphics Theory and Applications (IVAPP-2012), pages 594-602

ISBN: 978-989-8565-02-0

 2012 SCITEPRESS (Science and Technology Publications, Lda.)

(a) (b)

Figure 1: The cars (Ramos and Donoho, 1983) data displayed as (a) polyline and (b) and bundled plots (Heinrich et al.,

2011b). Data are clustered by number of engine cylinders (4, 6, or 8). In the bundled plot, bundling was β = 0.95 and

cluster centroids were plotted at their projected values on the bundle axis. The adjectives above each value axis indicate the

interpretation of values closer to the axis top.

2 RELATED WORK

In parallel-coordinates visualization, points in m-

dimensional space are represented as lines crossing

m parallel axes in 2D, so there is no inherent limit on

dimensionality. The process of discovering multivari-

ate relations in a dataset is transformed to a 2D pat-

tern recognition problem. Parallel coordinates were

introduced by Inselberg (Inselberg, 1985; Inselberg

and Dimsdale, 1990; Inselberg, 2009), and extended

by Wegman (Wegman, 1990).

Traditional parallel coordinates suffer from sev-

eral problems, especially for large datasets. One is-

sue is the potentially heavy over-plotting of lines, re-

sulting in visual clutter. A proposed remedy is to

replace fully opaque, rasterized lines by a density

representation of the plotted lines (Miller and Weg-

man, 1991; Wegman and Luo, 1997). This idea has

been adopted for frequency plots (Rodrigues et al.,

2003), gray-scale mappings in density plots (Artero

et al., 2004), and high-precision textures in combina-

tion with transfer functions (Johansson et al., 2005).

For continuous data, line density can also be com-

puted analytically using an appropriate reconstruction

kernel (Heinrich and Weiskopf, 2009) or by splat-

ting (Heinrich et al., 2011a).

Table 1: Evaluations of parallel coordinates.

Correlation Cluster

Identiﬁcation

Polylines (Li et al., 2010) (Holten and van

Wijk, 2010)

Curves

This paper (Holten and van

Wijk, 2010)

Bundling

This paper This paper

The cross-over problem for polylines arises when

two or more lines share common points on an axis.

Several authors have solved this by using smooth

curves. Theisel (Theisel, 2000) proposes a cubic

B-splines model, while Graham and Kennedy (Gra-

ham and Kennedy, 2003) choose a quadratic or cu-

bic curve for a particular section depending on the

shape formed by that section and the two adjacent

sections. Moustafa and Wegman (Moustafa and Weg-

man, 2006) build smooth curves by replacing the

piecewise linear interpolation of polylines by inter-

polation via higher-order sinusoidal functions. Oth-

ers (Holten and van Wijk, 2010; Heinrich et al.,

2011b) add a parameter to the spline-based mod-

els (Graham and Kennedy, 2003; Yuan et al., 2009)

to control the amount of smoothing. All these tech-

niques guarantee curve smoothness, alleviating the

cross-over problem by giving different trajectories to

points that intersect at an axis. This allows the an-

alyst to reliably connect the curves on either side of

the axis.

Visual clutter can also be reduced by prepro-

cessing the data with a clustering algorithm (Jain

and Dubes, 1988). The clusters can then be dis-

played by extensions of parallel coordinates (Fua

et al., 1999; Wegman and Luo, 1997; Berthold

and Hall, 2003). Whereas early clustering work

focused on reducing the amount of displayed data

by displaying only markers of entire clusters, re-

cent work has instead focused on displaying all the

data and revealing details of the internal structure of

clusters. Johansson et al. (Johansson et al., 2005)

combine speciﬁc transfer functions for density plots

with feature animation, showing both an overview

of the data and the inner structure of its clusters.

Novotny and Hauser (Novotny and Hauser, 2006)

extend such cluster-based parallel-coordinates visu-

EVALUATION OF A BUNDLING TECHNIQUE FOR PARALLEL COORDINATES

595

alization to additionally display outliers and trends.

Zhou et al. (Zhou et al., 2009) detect clusters by splat-

ting lines and applying a Gaussian weight to proxi-

mate lines.

Other approaches use geometric proximityof lines

or curves to represent clusters in parallel coordi-

nates (Zhou et al., 2008; McDonnell and Mueller,

2008; Heinrich et al., 2011b). Zhou et al. (Zhou

et al., 2008) deform traditional polylines by apply-

ing attracting and repelling forces. By construction,

their method is based on proximity between the ini-

tial polylines and, thus, achieves an implicit, yet ﬁxed

type of clustering. Their method emphasizes the

proximity of the polylines, rather than showing exter-

nally provided clusters. Moreover, their visual clus-

tering is based on a piecewise model: the vicinity

of polylines between two neighboring data axes (or

dimensions) of the parallel-coordinates plot governs

the visual clustering between those two data dimen-

sions; other pairs of neighboring data dimensions are

clustered independently. Therefore, high-dimensional

data is not clustered on a per-data-point level, but

based on pairs of data dimensions. The resulting vi-

sual clustering is thus sensitive to the order of data

dimensions in the parallel-coordinates plot.

Holten introduced edge bundling of tree lay-

outs (Holten, 2006). McDonnell and Mueller (Mc-

Donnell and Mueller, 2008) built on this idea, devel-

oping a geometric, spline-based approach to comput-

ing visual bundling. Their technique targets illustra-

tive parallel-coordinate plots, using visual simpliﬁca-

tion and non-photorealistic rendering techniques such

as silhouette lines, halos, and shadows. Details of

the internal structure of data points within clusters as

well as correlation are not a focus of their research.

Moreover, cluster membership information is based

on color coding.

For the user study conducted in this paper, we use

a complementary, geometry-based visualization of

clusters that we describe in a technical report (Hein-

rich et al., 2011b). This method improves upon

the proximity-based parallel-coordinates techniques

of McDonnell and Mueller (McDonnell and Mueller,

2008) and Zhou et al. (Zhou et al., 2008) in the fol-

lowing ways. First, it makes better use of the avail-

able screen space by re-distributing visually clustered

curves in a uniform way. Therefore, there is much

less overlap in the important parts of the plots—in the

regions between two data axes, where users identify

correlation of data points. In addition, overdraw and

cluttering issues are reduced by this redistribution.

Second, C

continuity of the curved lines across data

axes is guaranteed, addressing the cross-over prob-

lem. For a detailed description of the algorithm and

its parameters to control the visualization process, we

refer to the paper by Heinrich et al. (Heinrich et al.,

2011b).

There have been few previous papers providing

user studies on parallel coordinates. Li et al. (Li et al.,

2010) compare polyline parallel coordinates and scat-

terplots. Lanzenberger et al. (Lanzenberger et al.,

2005) investigate the effectiveness of stardinates and

parallel-coordinates plots applied to an example data

set with psychotherapeuticdata. Henley et al. (Henley

et al., 2007) evaluate scatterplots and parallel coordi-

nates for the task of comparing genomic sequences.

A tiled parallel-coordinates technique for visualizing

time-varying multichannel EEG data is studied by ten

Caat and Roerdink (Caat and Maurits, 2007). Jo-

hansson et al. (Johansson et al., 2008) investigate the

amount of noise that may be present in parallel coor-

dinates such that patterns can still be received. Holten

and van Wijk (Holten and van Wijk, 2010) evaluate

cluster identiﬁcation performance for curved and an-

imated parallel coordinates, among others. While Li

and van Wijk (Li et al., 2010) examined the visualiza-

tion of correlation for linear parallel coordinates and

Holten and van Wijk (Holten and van Wijk, 2010) the

visualization of clusters, our user study aims at eval-

uating the impact of bundling and curves to the judg-

ment of correlation and the detection of clusters.

Finally, there seems to be no literature concerning

the evaluation of bundling at all.

3 USER STUDY

To comparethe effectiveness of polylines and bundled

curves, we performed a user study. Observers were

asked to estimate (a) correlations and (b) the num-

ber of clusters, in datasets represented using polylines

and bundled curves. We expected that bundled curves

would support correlation estimation at least as well

as polylines do, and that bundled curves would sup-

port superior estimation of the number of clusters.

3.1 Overview

In designing the experiment, we were subject to the

constraint that we needed to estimate performance by

analysts skilled both in an underlying domain and at

interpreting parallel coordinates a given type, polyline

or bundled curves. Such users are not merely difﬁcult

to ﬁnd, for the case of bundled curves they do not yet

exist. We addressed this constraint with an approach

often used for visualization user studies. Speciﬁcally

we:

IVAPP 2012 - International Conference on Information Visualization Theory and Applications

596

1. Recruited participants who had little to no experi-

ence with either form of parallel coordinates. We

gave the participants a short tutorial on strategies

for estimating correlationsin parallel plots of each

style. This created a pool of participants equally

skilled at reading both styles, somewhere between

novice and intermediate skill.

2. Used data sets generated solely according to spec-

iﬁed probability distributions, with no underlying

semantics. This ensured that no participant would

be able to apply domain knowledge to interpret

the plots.

We used accuracy as the sole dependent measure

and did not record time. We argue that this untimed

task matches the context in which data analysts typ-

ically use parallel coordinates, taking enough time

to consider their data in depth. This choice empha-

sized that participants take as long as necessary to

make their best estimate. It also minimized fatigue by

allowing participants to rest whenever they wished,

without regard for their score. This choice also elim-

inated the potential confound of different participants

adopting different speed-accuracy tradeoffs, because

accuracy was uniformly emphasized.

Given the limited experience of the participants

with the two styles of plot, we do not believe that

timing data would provide any useful comparison be-

tween the styles. Comparative timing data would

only be informative with testers who were well-

experienced with the methods, working on data sets

for which they had domain expertise.

The curve styles were compared for two tasks, es-

timating correlation and estimating number of clus-

ters. The tasks were performed in a ﬁxed order for

every participant, with participants estimating corre-

lation ﬁrst. This design permits more direct interpre-

tation of the results because all participants performed

each task with a ﬁxed level of prior experience. In

particular, their experience reading plots in the corre-

lation task would carry over to enhance their perfor-

mance in the cluster estimation task.

By contrast, a design that counterbalanced task or-

der would have split participants’ prior experience, in-

creasing the variance and making the results harder to

interpret. Given that a counterbalanced design would

only protect against the case that doing the correla-

tion estimate ﬁrst would differentially advantage one

curve style, a prospect we consider highly unlikely,

we chose a ﬁxed task order for its more straightfor-

ward interpretation.

3.2 Design

The study design was single-factor,two-level, and wi-

thin-subjects. Observers viewed two data series. The

ﬁrst was always the correlation estimation series, the

second the cluster estimation series. Within each se-

ries, line style was a blocked factor, with all trials per-

formed ﬁrst in one line style, then the other. Order

of the two line styles was counterbalanced across par-

ticipants, with participants randomly assigned to the

order. Dependent measures, computed separately for

each series, were the Pearson correlation r between

the actual dataset correlation and the correlation es-

timated by participants, and the Fleiss κ measure of

agreement amongst participants.

The visualization used for the study requires two

parameters to be set. The parameter α deﬁnes the

smoothness of a curve while the bundling strength β

controls the extent to which a curve is pulled towards

the centroid of its cluster (Heinrich et al., 2011b). Be-

fore running the full study, we ran a pilot study with

ﬁve participants to determine the best values of these

parameters. The values β = 0.8 and α = 1/6 achieved

the best balance of correlation detection and cluster

visualization. These values were used for the bundled

plots in both series of trials.

3.2.1 Participants

A convenience sample of 14 participants (9 men,

5 women, ages 23–37) was recruited from graduate

students in computing science and engineering sci-

ence at Simon Fraser University. Of these 14, 2 had

previously used polyline parallel coordinates, 8 had

experience with some form of information visualiza-

tion but had never used parallel coordinates, and 4

had never used any visualization software. Volunteers

were paid CDN$20.

3.2.2 Procedure for the Session

Participants ﬁrst answered a brief series of questions

assessing their level of experience with information

visualization and computers in general. They were

next tested for any color deﬁciencies using a Web-

based test

. All 14 participants had acceptable color

vision. They next read a tutorial on the basic prin-

ciples of parallel coordinates and their instantiation

in polylines and bundled curves. The tutorial deﬁned

correlation and gave examples of positive and nega-

tive correlation using both line types.

Participants then began the ﬁrst series of trials, in

which they estimated correlations. Immediately after

completing that series, participants began the second

series, in which they estimated the number of clusters

http://www.healthcommunities.com/color-vision-

deﬁciency/color-blindness-test.shtml

EVALUATION OF A BUNDLING TECHNIQUE FOR PARALLEL COORDINATES

597

in plots. After completing the second series, they in-

dicated which line style they preferred and answered

a short list of open-ended questions about their expe-

rience during the study.

Participants were allowed to take as long as they

wished on each trial. Total time to complete the ses-

sion varied widely, from 50 to 110 minutes. Most par-

ticipants completed the study in less than 90 minutes.

3.3 First Series: Estimating

Correlations

In the correlation estimation trials, participants

viewed a series of datasets in 2D parallel coordinates

(i.e., with two data dimensions and two main parallel-

coordinates axes), plotted in either polylines or bun-

dled curves. For each plot, the user was asked to

categorize the correlation as “strong negative correla-

tion”, “negative correlation”, “no correlation”, “posi-

tive correlation”, or “strong positive correlation”.

3.3.1 Procedure

Before starting each line style, participants read a

short tutorial on estimating correlations in that style.

For polylines, the tutorial suggested looking for

whether the lines crossed or not, the distribution of

line crossings (whether only in the middle or dis-

tributed throughout the range), and the overall shape

of the plot. For bundled curves, the tutorial suggested

looking at the width of the middle band and the over-

all shape of the plot. For each style, the tutorial pre-

sented example plots of all seven degrees of correla-

tion. To map the seven values of actual correlation to

the ﬁve categories of user response, the tutorial rec-

ommended reporting z values of −1.0 and −0.5 as

both “negatively correlated”, and similarly for +0.5

and +1.0.

Participants began each line style with a training

session. The training session presented one plot of

each correlation level in the given style. Participants

estimated the correlation and were then told which

answers would have been appropriate for the dataset.

Since there were seven levels of correlation but only

ﬁve levels of user response, two possible answers

were suggested for every example. After estimating

all seven practice correlations, a page was displayed

reminding participants of the strategies for estimating

correlation for this plot style. Users pressed a button

to start the ﬁrst experimental trial.

Experimental trials had the same interface as the

practice trials, but provided no feedback about the ac-

tual correlation. When the participant was satisﬁed

with their estimate for the current trial, they pressed a

button to start the next.

3.3.2 Trial Data

Three groups of seven datasets were generated, each

with n = 40 data pairs. The pairs were generated from

normally distributed random series x and y, selected to

ensure that each set of 40 pairs had the given correla-

tion coefﬁcient. Each group of datasets had exactly

one set for each level

z = −1.5, −1.0, −0.5, 0.0, +0.5, +1.0, +1.5,

where z is the Fisher transform of the correlation.

These were the same levels of correlation used in prior

work (Li et al., 2010). One group of datasets was

always used for the training phase. The remaining

14 datasets were each used twice, once for each line

style. Within each line style, order of datasets varied

randomly for each participant.

The bundled curve representationrequired two ad-

ditional parameters for each data point: the directions

of the line leaving each axis. These are not required

for polyline plots, where the direction of the line leav-

ing an axis is independent of the direction the line en-

tered that axis from the other side. However, the exit

direction of bundled curves is affected by the direc-

tion of the line entering from the other side, due to

the C

continuity requirement. Pilot tests showed that

if all curves entered the axes at a constant horizon-

tal direction, observers used the consistent bending

of curves at the axes as a cue to estimate correlation.

Since this cue would not occur in actual use of bun-

dled curves, which would in fact enter their axes at

varying angles, the direction at which each curve en-

tered each axis was randomly perturbed. This random

perturbation likely made correlation detection slightly

more difﬁcult for bundled curves than it would be in

practice, where entry to the axes would vary but not

be random.

Figure 2 illustrates example datasets with all seven

different correlation coefﬁcients used for the exper-

imental trials. The top row shows polyline plots and

the bottom row shows bundled curveplots. The Fisher

transform of correlation is shown below each plot.

3.4 Second Series: Estimating Clusters

In the cluster estimation trials, participants viewed

a series of clustered datasets in parallel coordinates

ranging from two to six dimensions, plotted in either

polylines or bundled curves. Clustering was indicated

by color (for polyline plots) and bundling (for bundled

curve plots). Color Brewer (Harrower and Brewer,

2003) was used to deﬁne effective color maps for the

IVAPP 2012 - International Conference on Information Visualization Theory and Applications

598

Polylines

Curves

−1.5 −1.0 −0.5

+0.5 +1.0 +1.5

Figure 2: One of the three sets of plots used in the correlation estimation series. The correlation, speciﬁed in Fisher z, is

shown below each plot.

Colored

poly-

lines

Bundled

curves

3 5 7

Figure 3: Representative plots used in the cluster estimation series: polyline plots with color coding of clusters (top row), and

bundled curve plots as described in (Heinrich et al., 2011b) (bottom row). From left to right, the data sets are g160 (number

of k-means clusters k = 3), iris (k = 5), g200 (k = 7), and netperf (k = 9).

polyline plots. For each plot, the user was asked to

estimate the number of clusters.

3.4.1 Procedure

Before starting the series, participants read a descrip-

tion of how clusters are represented in both line styles.

They then began working with either polyline plots or

bundled curve plots, depending upon which order had

been assigned. They practiced estimating the num-

ber of clusters in three trial plots, with ﬁve, three, and

eight clusters. Figure 3 shows typical examples of

such plots. After each training trial, the correct num-

ber of clusters was reported. After the three training

trials, a page redisplayed the three datasets and the

number of clusters in each. Users pressed a button to

start the ﬁrst experimental trial.

Experimental trials had the same interface as the

training trials, but provided no feedback about the ac-

tual number of clusters. After entering their estimate

for the clusters, participants pressed a button to move

on to the next trial. Once they had completed a series

in one line style, they did the training and experimen-

tal trials for the next style.

3.4.2 Trial Data

Trial datasets were created from three real-world and

three synthetic datasets (Table 2). The real-world

datasets are popular test datasets, taken from the

Xmdv Web page

. The synthetic datasets were gener-

ated by sampling normally distributed series, selected

to ensure the required correlation across each dimen-

sion. Each of the 6 datasets was then clustered by

the k-means technique into k = 3, 5, 7, and 9 clusters.

This series of 24 datasets was plotted using both line

styles. Within each series, the order of trials varied

randomly for each user.

Table 2: Datasets for the Cluster Estimation Series (d is the

number of dimensions, n is the number of data points).

Name d n Source

iris 4 150 Botany

netperf 6 179 Computer Science

htong 4 365 Earth Science

g40 2 40 Synthetic

g160 3 160 Synthetic

g200 5 200 Synthetic

3.5 Results

Figure 4 shows the distribution of participants’ re-

sponses for the correlation estimation series. There

http://davis.wpi.edu/xmdv/datasets.html

EVALUATION OF A BUNDLING TECHNIQUE FOR PARALLEL COORDINATES

599

was a strong linear correlation between partici-

pants’ estimates and actual correlation for polylines

and bundled curves (both r = 0.90). Considering

the estimates for positive and negative correlations

separately, estimates for negative correlations were

stronger (r = 0.75 for polylines, r = 0.79 for bundled

curves, difference of the equivalent z-scores ∆z =

0.10) than for positive correlations (r = 0.55 for poly-

lines, r = 0.39 for bundled curves, ∆z = −0.21).

Agreement amongst participants was moderate

(κ = 0.43 for polylines, κ = 0.41 for bundled curves).

The results for polylines are comparable to those of

Li and van Wijk (Li et al., 2010).

For all comparisons the one-sided width of a 95%

conﬁdence interval, which is entirely determined by

the sample size of 14, is ∆z

95%

= 0.59. All the ∆z

scores presented above were substantially within this

bound, indicating that none of the differences was sta-

tistically signiﬁcant.

Actual correlation (Fisher z ) Actual correlation (Fisher z )

User estimated z

(a) Polyline plots. (b) Curve plots.

Figure 4: Distribution of responses for the estimated cor-

relations of 2D parallel coordinates for polyline plots and

bundled curve plots. Circle radius represents the frequency

with which participants estimated a correlation strength for

each actual correlation.

Figure 5 shows the distribution of participants’ re-

sponses for the cluster estimation series. The overall

correlation is strong for both line styles (r = 0.92 for

polylines, r = 0.96 for bundled curves, ∆z = 0.36).

The correlations were much stronger for datasets with

three or ﬁve clusters (r = 0.98 for both line styles)

than those with seven or nine clusters (r = 0.41 for

polylines, r = 0.68 for bundled curves, ∆z = 0.39).

As with the correlation estimation series, all ∆z values

were substantially below 0.59, indicating that none of

the differences was statistically signiﬁcant.

Agreement amongst participants for cluster esti-

mation was slightly higher for bundled curves (κ =

0.65) than for polylines (κ = 0.56). Each line style

had higher agreement than their corresponding levels

for correlation estimation.

3.6 Discussion

The results for the two series of plots demonstrate im-

Number of clusters Number of clusters

User estimate of number

(a) Colored polyline plots. (b) Bundled curve plots.

Figure 5: Distribution of responses for the estimated num-

ber of clusters in parallel-coordinates plots in two line

styles. Circle radius represents the frequency with which

participants estimated a dataset to have a given number of

clusters.

portant strengths of the bundled curve representation.

The correlation estimation series demonstrates that

correlation is as readily recognizable when parallel

coordinates are rendered in bundled curves as when

rendered in polylines. This result is not obvious.

Polyline plots provide a clear focal point for estimat-

ing correlations: the width of the center region is an

excellent indicator of correlation, with strong nega-

tive correlations producing a narrow center region and

strong positive correlations producing a wide center

region. In contrast, bundled curve plots by deﬁnition

draw the curves into one or more narrow center re-

gions. The width of those regions is only mildly deter-

mined by the correlation of the dataset. Yet bundled

curves nonetheless provided sufﬁcient cues (width of

center region, shape of lines) that participants could

estimate correlation from bundled curves as readily

as from polylines.

The cluster counting series demonstrates that

viewers could identify clusters through their bundles.

This is not surprising, as bundling provides a strong

cue of cluster identity. Participants likely determined

cluster membership by looking at the bundle axes,

where bundling has its strongest effect. In effect, a

bundled curve plot uses different regions to geomet-

rically represent the spread and the clustering of the

dataset. The spread of values for a cluster is rep-

resented at the value axis. The cluster identity of a

datum is represented at the bundle axis. In contrast,

polylines provide no geometric representation of clus-

ter membership, so it must be represented using a dif-

ferent cue, color. Whereas polylines provide only cor-

relation information in the inter-axis regions, bundled

curves use that region to display correlation, number

of clusters, and cluster membership—a much more

effective use of the space.

The geometric representation of cluster and distri-

IVAPP 2012 - International Conference on Information Visualization Theory and Applications

600

bution must be simultaneous if the analyst is to com-

pare the distributions of the different clusters. The

bundling and C

continuity of bundled curves are es-

sential for this comparison to occur, for these fea-

tures allow the viewer to be aware of both clusters

and distribution simultaneously. Bundling exploits

the Gestalt principle of proximity, visually grouping

the lines of a cluster in the middle of the plot. C

con-

tinuity exploits the Gestalt principle of continuity to

maintain this visual grouping on the value axes, where

the distribution is represented. This allows the viewer

to compare the distributions of different clusters. As a

secondary beneﬁt, the C

continuity allows this clus-

ter identiﬁcation to be maintained across value axes,

the membership reinforced at each bundle axis.

The same bundling strength was used for both

the correlation and the cluster counting series. This

demonstrates that each task can be achieved without

sacriﬁcing the other.

4 CONCLUSIONS AND FUTURE

WORK

The user study conducted in this work supports the

following conclusions: Firstly, curve bundling is ef-

fective in displaying clustering information purely

based on geometry. Secondly, with a properly cho-

sen bundling strength, bundled curve plots retain the

same strength as polyline plots in revealing correla-

tions between visualized variables. Hence one of the

core aspects of analysis using parallel coordinates car-

ries over using bundling.

The high effectiveness of curved plots compared

with polyline plots was not obvious. The results of

our user study might trigger further perceptual in-

vestigations of variants of parallel-coordinates plots.

It could be the case that other forms of parallel-

coordinates plots might be even more effective than

bundled curves—not only for cluster visualization but

other applications.

ACKNOWLEDGEMENTS

In parts, this work was supported by DFG within SFB

716 / D.5.

REFERENCES

Artero, A. O., de Oliveira, M. C. F., and Levkowitz, H.

(2004). Uncovering clusters in crowded parallel coor-

dinates visualizations. In IEEE Symposium on Infor-

mation Visualization, pages 81–88. IEEE Computer

Society.

Berthold, M. R. and Hall, L. O. (2003). Visualizing fuzzy

points in parallel coordinates. IEEE Transactions on

Fuzzy Systems, 11:369–374.

Caat, M. T. and Maurits, N. M. (2007). Design and evalu-

ation of tiled parallel coordinate visualization of mul-

tichannel EEG data. IEEE Transactions on Visualiza-

tion and Computer Graphics, 13(1):70–79.

Cook, D. and Swayne, D. F. (2003). Interactive and Dy-

namic Graphics for Data Analysis: With Examples

Using R and GGobi. Springer.

Fua, Y.-H., Ward, M. O., and Rundensteiner, E. A. (1999).

Hierarchical parallel coordinates for exploration of

large datasets. In IEEE Visualization, pages 43–50.

IEEE Computer Society.

Graham, M. and Kennedy, J. (2003). Using curves to en-

hance parallel coordinate visualisations. In Interna-

tional Conference on Information Visualization (IV),

pages 10–16. IEEE Computer Society.

Harrower, M. A. and Brewer, C. A. (2003). Color-

Brewer.org: an online tool for selecting color schemes

for maps. The Cartographic Journal, 40(1):27–37.

Heinrich, J., Bachthaler, S., and Weiskopf, D. (2011a).

Progressive splatting of continuous scatterplots and

parallel coordinates. Computer Graphics Forum,

30(3):653–662.

Heinrich, J., Luo, Y., Kirkpatrick, A. E., Zhang, H., and

Weiskopf, D. (2011b). Evaluation of a bundling tech-

nique for parallel coordinates. Technical Report Com-

puter Science TR-2011-08, Visualization Research

Center, University of Stuttgart.

Heinrich, J. and Weiskopf, D. (2009). Continuous parallel

coordinates. IEEE Transactions on Visualization and

Computer Graphics, 15(6):1531–1538.

Henley, M., Hagen, M., and Bergeron, R. D. (2007). Eval-

uating two visualization techniques for genome com-

parison. In International Conference on Information

Visualization (IV), pages 551–558. IEEE Computer

Society.

Holten, D. (2006). Hierarchical edge bundles: Visualiza-

tion of adjacency relations in hierarchical data. IEEE

Transactions on Visualization and Computer Graph-

ics, 12(5):741–748.

Holten, D. and van Wijk, J. J. (2010). Evaluation of cluster

identiﬁcation performance for different PCP variants.

Computer Graphics Forum, 29(3):793–802.

Inselberg, A. (1985). The plane with parallel coordinates.

The Visual Computer, 1(2):69–92.

Inselberg, A. (2009). Parallel Coordinates: Visual Multidi-

mensional Geometry and Its Applications. Springer,

New York.

Inselberg, A. and Dimsdale, B. (1990). Parallel coordinates:

A tool for visualizing multi-dimensional geometry. In

IEEE Visualization, pages 361–378. IEEE Computer

Society.

Jain, A. K. and Dubes, R. C. (1988). Algorithms for Clus-

tering Data. Prentice-Hall, Upper Saddle River, NJ.

EVALUATION OF A BUNDLING TECHNIQUE FOR PARALLEL COORDINATES

601

Johansson, J., Forsell, C., Lind, M., and Cooper, M. (2008).

Perceiving patterns in parallel coordinates: determin-

ing thresholds for identiﬁcation of relationships. In-

formation Visualization, 7(2):152–162.

Johansson, J., Ljung, P., Jern, M., and Cooper, M. (2005).

Revealing structure within clustered parallel coordi-

nates displays. In IEEE Symposium on Information

Visualization, pages 125–132. IEEE Computer Soci-

ety.

Lanzenberger, M., Miksch, S., and Pohl, M. (2005). Ex-

ploring highly structured data: a comparative study of

stardinates and parallel coordinates. In International

Conference on Information Visualization (IV), pages

312–320. IEEE Computer Society.

Li, J., Martens, J.-B., and van Wijk, J. J. (2010). Judging

correlation from scatterplots and parallel coordinate

plots. Information Visualization, 9(1):13–30.

McDonnell, K. T. and Mueller, K. (2008). Illustrative

parallel coordinates. Computer Graphics Forum,

27(3):1031–1038.

Miller, J. J. and Wegman, E. J. (1991). Construction of

line densities for parallel coordinate plots. In Buja,

A. and Tukey, P., editors, Computing and Graphics in

Statistics, pages 107–123. Springer, New York.

Moustafa, R. and Wegman, E. (2006). Multivariate con-

tinuous data – parallel coordinates. In Unwin, A.,

Theus, M., and Hofmann, H., editors, Graphics of

Large Datasets: Visualizing a Million, pages 143–

156. Springer, New York.

Novotny, M. and Hauser, H. (2006). Outlier-preserving

focus+context visualization in parallel coordinates.

IEEE Transactions on Visualization and Computer

Graphics, 12(5):893–900.

Ramos, E. and Donoho, D. (1983). 1983 ASA data exposi-

tion dataset. In CMU Dataset Archive. CMU.

Rodrigues, Jr., J. F., Traina, A. J. M., and Traina, Jr., C.

(2003). Frequency plot and relevance plot to enhance

visual data exploration. In Symposium on Computer

Graphics and Image Processing (SIBGRAPI), pages

117–124. IEEE Computer Society.

Theisel, H. (2000). Higher order parallel coordinates.

In Workshop on Vision, Modeling, and Visualization,

pages 415–420.

Wegman, E. (1990). Hyperdimensional data analysis using

parallel coordinates. Journal of the American Statisti-

cal Association, 411(85):664.

Wegman, E. J. and Luo, Q. (1997). High dimensional clus-

tering using parallel coordinates and the grand tour.

In Classiﬁcation and Knowledge Organization, pages

93–102.

Yuan, X., Guo, P., Xiao, H., Zhou, H., and Qu, H.

(2009). Scattering points in parallel coordinates. IEEE

Transactions on Visualization and Computer Graph-

ics, 15(6):1001–1008.

Zhou, H., Cui, W., Qu, H., Wu, Y., Yuan, X., and Zhuo,

W. (2009). Splatting the lines in parallel coordinates.

Computer Graphics Forum, 28(3):759–766.

Zhou, H., Yuan, X., Qu, H., Cui, W., and Chen, B. (2008).

Visual clustering in parallel coordinates. Computer

Graphics Forum, 27(3):1047–1054.

IVAPP 2012 - International Conference on Information Visualization Theory and Applications

602