Leaf Glyph

Visualizing Multi-dimensional Data with Environmental Cues

Johannes Fuchs, Dominik J

ackle, Niklas Weiler and Tobias Schreck

University of Konstanz, Universit

atsstr. 10, 78462 Konstanz, Germany

Keywords:

Glyph Visualization and Layout, Nature-inspired Visualization, Leaf Shape, Multi-dimensional Data Analy-

sis, Data Aggregation.

Abstract:

In exploratory data analysis, important analysis tasks include the assessment of similarity of data points, label-

ing of outliers, identifying and relating groups in data, and more generally, the detection of patterns. Speciﬁ-

cally, for large data sets, such tasks may be effectively addressed by glyph-based visualizations. Appropriately

deﬁned glyph designs and layouts may represent collections of data to address these aforementioned tasks. Im-

portant problems in glyph visualization include the design of compact glyph representations, and a similarity-

or structure-preserving 2D layout. Projection-based techniques are commonly used to generate layouts, but

often suffer from over-plotting in 2D display space, which may hinder comparing and relating tasks.

We introduce a novel glyph design for visualizing multi-dimensional data based on an environmental metaphor.

Motivated by the humans ability to visually discriminate natural shapes like trees in a forest, single ﬂowers in

a ﬂower-bed, or leaves at shrubs, we design a leaf-shaped data glyph, where data controls main leaf properties

including leaf morphology, leaf venation, and leaf boundary shape. We also deﬁne a custom visual aggregation

scheme to scale the glyph for large numbers of data records. We show by example that our design is effectively

interpretable to solve multivariate data analysis tasks, and provides effective data mapping. The design also

provides an aesthetically pleasing appearance, which may help spark interest in data visualization by larger

audiences, making it applicable e.g., in mass media.

1 INTRODUCTION

Glyph-based data visualization has a long tradition

in Information Visualization research and application.

The basic idea in glyph visualization is to map data

properties to visual properties of some appropriately

designed visual structure. By the interplay of the dif-

ferent visual properties, each glyph then represents a

data record. Many data records can be compared by

appropriately laid out glyph displays. Glyph visual-

ization, like other areas in Information Visualization,

can be considered both a science and an art. Speciﬁ-

cally, the design of glyphs may be inspired intuitively

by common, well-known shapes or icons. For ex-

ample, Chernoff faces were inspired by face proper-

ties, and sticky ﬁgures by abstraction of human body

shapes.

A subset of the designs studied in Information Vi-

sualization to date has been inspired by nature. For

example, tree structures have inspired hierarchical

node-link diagrams. As another example, the notion

of information landscapes or terrains is also borrowed

from nature. There is reason to believe that the hu-

man visual sense, due to long evolutionary processes,

is highly trained in recognizing, distinguishing and

comparing natural forms. These visual recognition

processes typically work well even in low illumina-

tion conditions, or in presence of partial occlusion of

natural objects. By background knowledge and expe-

rience, humans are able to efﬁciently recognize natu-

ral shapes, also often in cases where only parts of the

shape or their boundary are visible.

Based on this motivation, we investigate the de-

sign space for leaf shapes as natural metaphors for

data glyphs. From observing leaves in nature, it is

clear that there is a large variability in the different

types and forms of leaves that exist. Overall leaf

shape, shape boundary, and shape interior all com-

prise several visual parameters that can in principle,

be used to map data to generate glyphs. To the best

of our knowledge, this is the ﬁrst work to systemat-

ically study the design space of leaf-based glyph vi-

sualization, and identify an encompassing set of leaf

variables to map data to. In conjunction with appro-

priate glyph layouts (based e.g., on projection), and

visual aggregation techniques, effective and intuitive

195

Fuchs J., Jäckle D., Weiler N. and Schreck T..

Leaf Glyph - Visualizing Multi-dimensional Data with Environmental Cues.

DOI: 10.5220/0005292801950206

In Proceedings of the 6th International Conference on Information Visualization Theory and Applications (IVAPP-2015), pages 195-206

ISBN: 978-989-758-088-8

 2015 SCITEPRESS (Science and Technology Publications, Lda.)

data displays can be realized. Our rationale for us-

ing leaf-based data visualization is two-fold. First,

the design space is large, giving ample opportunities

for the visualization expert to map data variables to

visual variables. As will be discussed, our variable

space amounts to more than 20 different visual vari-

ables than can be controlled. While we have not for-

mally evaluated the effectiveness of these variables or

their combinations, we presume this is a large design

space from which appropriate effective selections can

be found. Second, we propose that nature-inspired de-

signs, by their potential aesthetic appearances and fa-

miliarity, can be suited to spark interest in visual data

analysis for wider audiences, e.g., for use in mass me-

dia. Also, it resonates well with visualization of envi-

ronmental data, as has been previously demonstrated,

e.g., by a respective infographic used by OECD (see

Section 2.2).

The remainder of this paper is structured as fol-

lows. In Section 2, we discuss glyph-based and

nature-inspired data visualization approaches. Sec-

tion 3 deﬁnes the design space for leaf glyphs, based

on identiﬁcation of main visual leaf properties which

are candidates for data mapping. Then, in Section 4,

we deﬁne several visual aggregation schemes to scale

2D glyph layouts for large numbers of data points.

Section 5 then applies our design to several data sets.

By exemplary data analysis cases, we demonstrate the

principal applicability of our approach. Finally, Sec-

tion 6 summarizes our work and outlines future re-

search in the area.

2 RELATED WORK

Our work extends the design space of two existing

branches of research by introducing a compact data

representation making use of environmental cues.

The related work is, therefore, split into two parts.

The ﬁrst part covers the area of space efﬁcient visual-

ization techniques, namely, data glyphs. The second

part addresses research using environmental cues to

convey data. We do not address research in the area of

computer graphics, since this work mainly focuses on

photo-realistic representation of the environment. We

refer the interested reader to a summary work about

this topic by Deussen and Lintermann (Deussen and

Lintermann, 2005).

2.1 Glyphs

In the literature, there exists a large variety of glyph

designs. Elaborate summaries can be found in (Borgo

et al., 2012) (Ward, 2008). To come up with a

comprehensive categorization we make use of Ward’s

classiﬁcation of data glyphs (Ward, 2008). In his re-

search he distinguishes between three different ways

a data point can be mapped to a glyph representation.

First, Many-to-One Mapping: All data dimen-

sions and their respective value are mapped to a com-

mon visual variable. Therefore, these designs can be

systematically created by choosing the most effective

visual variable for a certain task. Additional guidance

is given by Cleveland et al. with a ranking of visual

variables (Cleveland and McGill, 1984). Well-known

examples making use of a position/length encoding

are star glyphs (Siegel et al., 1972), whisker and fan

plots (Pickett and Grinstein, 1988)(Ware, 2012), or

proﬁle glyphs (Du Toit et al., 1986). The designs just

differ in their layout of the dimensions (i.e., circular

or linear) and some minor variations like the pres-

ence or absence of a surrounding contour line. Other

glyph designs make use of color encodings to repre-

sent the data value. Clock glyphs (Kintzel et al., 2011)

map the dimensions in a radial fashion, whereas pixel-

based glyph designs (Levkowitz and Herman, 1992)

layout the dimensions linearly. Of course, color can-

not convey the data as accurate as a position/length

encoding (Fuchs et al., 2013), however, for certain

tasks like spotting outliers the color encoding is a

reasonable choice. There is even a design mapping

the data values to the angle of its rays. Sticky ﬁg-

ures (Pickett and Grinstein, 1988) use the visual vari-

able orientation, which is not so accurate in commu-

nicating exact data values. However, when used as an

overview visualization the designs convey individual

shapes, which are perceived as a whole nicely approx-

imating the underlying data point.

Second, One-to-One Mapping: Each dimension

is mapped to a different visual variable. Probably,

the most well-known representations here are Cher-

noff faces (Chernoff, 1973). The single data values

are mapped to face characteristics, like the size of the

nose or the angle of the eyebrows. Other more ex-

otic designs are bugs (Chuah and Eick, 1998) (chang-

ing the shape, length or color of wings, tails and

spikes), or hedgehogs (Klassen and Harrington, 1991)

(manipulating the spikes by changing the orientation,

thickness and taper). The major drawback of these

kinds of glyph representations is that they are often

sensitive to the order by which the data dimensions

are mapped to visual variables. Variation of the order

could signiﬁcantly change the ﬁnal glyph representa-

tion and its visual perception by users. Additionally,

measuring differences between single dimension val-

ues within a data point is typically a difﬁcult task, as

the analyst has to compare different kinds of visual

variables with each other (e.g., compare length with

IVAPP2015-InternationalConferenceonInformationVisualizationTheoryandApplications

196

saturation or angle, etc.)

Third, One-to-Many Mapping: The dimensions

are represented by two or more visual variables. This

redundant mapping can be useful to strengthen the

perception of individual dimensions. For example, in

star or proﬁle glyphs the dimensions can be addition-

ally encoded by coloring the single data rays. Clock

glyphs can make use of an additional length encoding

for the single colored slices to encode the underlying

data values more accurately.

2.2 Environmental Cues

Visualizations making use of environmental cues

need not necessarily be glyph representations. Ste-

faner uses an abstract tree layout to show the edit-

ing history of Wikipedia entries represented as single

branches (Stefaner, 2014a). The branches grow to the

right whenever people decided to delete an article or

to the left in the other case. The resulting tree nicely

summarizes 100 articles with the longest discussion

whether to keep them or not. Another tree-based ap-

proach in combination with leaves visualizes poems

in a more artistic way (M

uller, 2014). The branches

of the tree are invisible just dealing as an anchor point

to arrange the glyphs. Each word in the poem is rep-

resented with a leaf glyph and attached along the tree

structure. The work is not eligible of representing the

text data accurately but tries to illustrate a creative

unique picture or ﬁngerprint of the underlying poem.

A more data-driven glyph design is the botanical

tree (Kleiberg et al., 2001), which again uses a 3D

tree layout to represent hierarchical information. The

single nodes are represented as fruits. The authors ar-

gue that people can more easily identify single nodes

in this visualization compared to a more abstract rep-

resentation because they are used to detect fruits or

leaves on shrubs or trees. A 2D visualization us-

ing a botanical tree metaphor are so-called Contact-

Trees (Sallaberry et al., 2012) which show relation-

ships in data, e.g., contacts between persons. The

branches consist of single lines representing an at-

tribute in the data, e.g., a longer line refers to an older

tie between people. Finally, fruits or leaves are added

to the tree according to some data property, e.g., the

kind of relation between people (friends, co-workers

etc.) However, the fruits and leaves are highly ab-

stract representations (mainly colored dots) and their

shape does not change according to some data charac-

teristics. The OECD’s Better Life Index visualization

(Stefaner, 2014b), on the other hand, systematically

changes the appearance of the single ﬂower glyphs

used to represent data. Stefaner uses such environ-

mental cues to visualize multi-dimensional data about

country characteristics. Each country is represented

by one ﬂower. The petals encode the different eco-

nomic branches with varying sizes and lengths for the

corresponding values. The ﬂowers are arranged ac-

cording to their weighted rank across all dimensions.

People can change the layout by changing the weights

of the dimensions or simply focusing on just one di-

mension.

We contribute to this body of existing work with

the deﬁnition of a highly detailed leaf glyph, which

closely follows the main morphological and func-

tional variations among leaves. It is able to effectively

map data variables. We also provide a custom aggre-

gation scheme to scale leaf layouts for large number

of records.

3 ENVIRONMENTAL GLYPH

According to Biological literature, leaves may be cat-

egorized by their function or usage in the environment

(Beck, 2010). For our purposes, we divide leaves ac-

cording to their shape (or morphology). The overall

appearance of a leaf consists of the combination of

(1) the overall shape type, (2) the boundary details,

and (3) the leaf venation. We consider these three as-

pects as the main dimensions for controlling the leaf

glyph by mapping data. As a result we come up with

a design space structured along the overall leaf shape,

which we discuss next.

3.1 Leaf Shape Design Space

Following Palmer who pointed out: “Shape allows a

perceiver to predict more facts about an object than

any other property” (Palmer, 1999), this visual vari-

able should be used for the most important data di-

mension. In the environment, there exists a nearly

endless amount of different leaf shapes since each leaf

is unique. However, it is possible to distinguish leaves

according to their overall shape (Deussen and Linter-

mann, 2005). A ﬁrst categorization can be done be-

tween conifer and deciduous leaves.

Conifer leaves can be found for example at ﬁr

or pine trees and have a thin long needle-like shape.

Therefore, they do not offer much space for a vena-

tion pattern, which we want to use later for mapping

additional attributes (e.g., Acicular leaves). Since the

differences in shape are quite small for the different

kinds of this group and the provided area is limited

due to the distorted aspect ratio, we do not consider

them in our design space.

Deciduous leaves cover a large group of different

LeafGlyph-VisualizingMulti-dimensionalDatawithEnvironmentalCues

197

shapes and can again be further divided into four sub-

categories (Deussen and Lintermann, 2005).

Pinnate and palmate compound leaves are shapes,

which consist of several smaller leaﬂets attached to a

shared branch (e.g., Alternate, or Odd and Even Pin-

nate leaves etc.). In order to avoid any misinterpreta-

tion between single leaﬂets at a branch and individual

leaves, we discard this group from our ﬁnal design

space. However, these kinds of leaves seem an appro-

priate representation to visually summarize multiple

data points where one leaﬂet corresponds to a single

leaf.

Lance-like leaves have a parallel venation and are

thin and long, similar to conifer leaves. Therefore,

it is difﬁcult to distinguish different kinds of these

leaves since the differences in the overall shape are

limited. Like the conifer leaves, we do not keep them

in our design space because of the limited area to map

a venation pattern, and because of possible confusion

of different lance-like shapes.

Leaves with net veins or reticulate venation pat-

terns encompass the largest group of deciduous leaves

with a big diversity in shape. We restrict ourselves

to the most common leaf shapes for this category

to avoid misinterpretation of intermediate structures,

which could not clearly be distinguished. Addition-

ally, we focus on leaves with a big surface to show ve-

nation patterns and small stems to save space. Leaves

similar to Flabellate, Unifoliate, etc. will, therefore,

not be considered.

The most important requirement for shapes in vi-

sualizations is that they should be easily distinguish-

able. Therefore, our ﬁnal design space covers el-

liptic (e.g., Ovate, Obtuse, Obcurdate etc.), circu-

lar (e.g., Orbicular), triangular (e.g., Deltoid), arrow-

like (e.g., Hastate, Spear-shaped etc.), heart-like (e.g.,

Cordate, Deltoid etc.), two variations of tear-drop like

(e.g., Acuminate, Cuneate etc.), wave-like (e.g., Pin-

natisect), and star-like (e.g., Palmate, Pedate, etc.)

shapes. Figure 1 illustrates the nine different leaf

shape categories covered by our design space. In Sec-

tion 5 we will introduce a heuristic to map data points

to leaf shapes, based on the idea of representing outly-

ing points by the more jagged leaf shapes; conversely,

non-outlying points will be represented by the more

regular or smooth leaf shapes.

Figure 1: Leaf Shapes: Selected from our overall design

space, these are the shapes used in our ﬁnal glyph design.

From left to right: Elliptic, circular, triangular, arrow-like,

heart-like, tear drop up, tear drop down, wave-like, and star-

like shapes.

We take these categories as a starting point and

further extend them by mapping additional attribute

dimensions to the width and the height of the glyph,

scaling the overall shape. Therefore, similar shapes

according to a certain data characteristic can look

different because of the varying aspect ratio. How-

ever, the individual shape categories can still be dis-

tinguished (Figure 2). Because of this decision, we

will deviate from the precise environmental reference,

where leaves typically show a homogeneous aspect

ratio. However, we thereby are able to encode ad-

ditional data dimensions. Note that we do not want

to represent leaves as accurate as possible (or even

photo realistic), but use their expressiveness to visu-

alize data.

Figure 2: Leaf Scaling: The Palmate leaf shape is scaled

using either the width (middle), or the height (right) of the

glyph. Even after scaling, the glyph can still be recognized

as a star-like leaf, although the precise environmental refer-

ence to the Palmate leaf is reduced.

3.2 Leaf Boundary Design Space

Basically, the boundary (or margin) of a leaf can be

described as either serrated or unserrated. Unserrated

boundaries have a smooth contour adapting to the

overall leaf shape. Serrated boundaries are toothed

with slight variations depending on the size of teeth,

their arrangement along the boundary, and their fre-

quency. Of course, there are more detailed differ-

ences and variations in nature. However, especially

in overview visualizations (the major domain of data

glyphs), distinguishing between small variations of

the contour line of a leaf shape is nearly impossible.

We therefore focus on just the two main boundary cat-

egories of teethed or smooth (serrated or unserrated).

For mapping data values to the leaf boundary, we dis-

tinguish between a smooth and a toothed contour line

and vary the width, height, and frequency of the teeth

according to the underlying data value (Figure 3).

3.3 Leaf Venation Design Space

We also control the leaf venation pattern as to map

additional data variables to the glyph. Several main

leaf venation patterns exist, which differ in their over-

all structure within the leaf. A rough distinction can

IVAPP2015-InternationalConferenceonInformationVisualizationTheoryandApplications

198

Figure 3: Leaf Boundary: Modifying the boundary in our

design is realized by changing the frequency, the height, or

the width of the boundary serration (teeths). Combinations

of these three variables are possible and increase the ex-

pressiveness of the glyph. The ﬁgure illustrates all possible

combinations for low, middle, and high data values for an

elliptically shaped leaf glyph.

be made between single, not intersecting (e.g., Par-

allel), paired (e.g., Pinnate), or net-like (e.g., Reticu-

late) veins. The venation is perceived as an additional

texture for the glyph and further increases the glyph

expressiveness. Since it is hard to ﬁnd a natural order

within this texture, we propose to use the venation

type for visualizing qualitative (or categorical) data,

similar than the overal leaf shapes discussed in Sec-

tion 3.1. Within a given venation type, we may also

encode numeric data. This works as follows. Gen-

erally, the leaf is split in the middle by a main vein,

with small veins growing from there in a given direc-

tion (angle). For mapping numerical data, we may ei-

ther control this angle of the veins branching out from

the main vein. An alternative is to control the number

of veins shown on the surface Figure 4. As a result,

we come up with a venation texture able of encoding

categorical and numerical data.

Figure 4: Leaf Venation: The texture for the venation sys-

tem can either be created by mapping data values to the an-

gle or frequency of the veins separately, or by combining

the two. The ﬁgure illustrates all possible combinations for

low, middle, and high data values for a wave-like leaf shape.

3.4 Summary

Besides modifying the leaf shape given by morphol-

ogy, boundary and venation, further dimensions can

be assigned to the color hue or saturation of the glyph.

Of course, the designer has to pay attention to the

contrast between the venation texture and the back-

ground color. Additionally, orientation of the glyph

in the display can be used to encode further numeric

information. We draw a short stem to each leaf shape,

showing its orientation. Finally, it is also possible to

modify the stem’s width or height as well.

This represents a comprehensive design space for

mapping data to leaf glyphs, controlled by 12 cate-

gorical and 14 numeric parameters, summing up to

26 variables altogether (see Table 1 for an overview

of all variables.) We propose this design space as a

toolbox from which the designer may select visual

variables as appropriate. The number of 26 param-

eters is considered more a theoretical upper limit of

data variables that we can show. We expect not all vi-

sual parameters in this design space to be of the same

expressiveness; but some variables may be more ef-

fective than others, and may not all be orthogonal to

each other. Careful choice should be done in selected

and prioritizing the variables. An option is of course

always, to redundantly code data variables to different

glyph variables, to emphasize perception of important

data variables. In Section 5, we will illustrate by prac-

tical examples, how glyph variables can be combined

to form data displays.

4 LEAF GLYPH AGGREGATION

When visualizing large data sets, leaf glyphs are

prone to overlap in the display, reducing the effective-

Figure 5: Main principles of aggregating point data in a

scatterplot. In (a), point data is visualized in a scatterplot.

The point data is represented in three different dimensions:

small, medium, and large. Differences and data values can

barely be identiﬁed since the visualization is cluttered. To

overcome this issue, we apply transparency in (b), partially

solving the issue of clutter. In (c), grid-based aggregation is

applied. All points that fall within the same cell are aggre-

gated. (c.1) shows a prototype-oriented aggregation: Points

are stacked in order to be able to distuinguish them. (c.2)

shows abstraction by visual aggregation: Points are aligned

along a line.

LeafGlyph-VisualizingMulti-dimensionalDatawithEnvironmentalCues

199

Table 1: Summary of the parameters of our glyph design. It comprises 14 numeric and 12 categorical variables, which form

the theoretic upper limit for the expressiveness of our glyph. Note that in practice, these variables are expected to not all be

orthogonal, and comprise different perceptional performance, depending also on the data.

Leaf Design Numeric Variables Categorical Variables

Shape 2 (x/y scale) 9 (selected morphologies)

Boundary 3 (frequency, width, height of teeth) –

Venation 2 (number, angle of child veins) 3 (parallel, paired, net)

Other 8 (hue, saturation, orientation, x/y position, stem width/height) –

Sum 15 12

ness of perceiving data from individual glyphs. Fig-

ure 5 outlines possible solutions by example of a scat-

terplot. The scatterplot (a) visualizes point data us-

ing three different point dimensions: small, medium,

and large. An increasing amount of visualized points

produces signiﬁcant clutter resulting in perceptional

problems – the user is not able to distinguish between

data points properly. We point out three different ag-

gregation techniques: (b) Alpha Compositing, (c.1)

Prototype Generation, and (c.2) Abstraction. First,

we apply transparency in (b) to provide a visually

pleasing representation that also reveals differences

between data points. In some cases, the application

of transparency is not enough. For example, if mul-

tiple data points share the same position, the opac-

ity might sum up until no difference is perceivable.

Therefore, we propose in (c) two different aggrega-

tion techniques that build on top of transparency and

the application of a grid-based aggregation. Speciﬁ-

cally, we place a user-deﬁned grid on top of the visu-

alization. All data points sharing the same cell are ag-

gregated. In (c.1), all included data points are stacked

so that the different dimensions can still be perceived.

In contrast, (c.2) creates a new representation instead:

All included data points are aligned in a clutter-free

manner along a line.

These effects can at the same time be perceived in

nature: leaves can overlap or coincide with others. We

adapt the proposed aggregation techniques and extend

them in order to ﬁnd a representative aggregate glyph

which summarizes multiple leaf glyphs.

In Figure 6 and Figure 7 we point out the applica-

tion of the aggregation techniques – Alpha Composit-

ing, Prototype Generation, and Abstraction – with re-

spect to nature. We next explain them in terms of their

counterpart in nature, and apply them to our visualiza-

tion of leaf glyphs.

4.1 Alpha Compositing

We use Alpha Compositing (Porter and Duff, 1984)

to reveal details on overlapping glyphs by applying

transparency. This technique describes the process of

Figure 6: Aggregation by Alpha Compositing. When

multiple leaves overlap or coincide, we are not able to dis-

tinguish properly between their shapes and related charac-

teristics. To overcome this issue, we propose to apply alpha

compositing. It reveals details by applying transparency to

the leaves.

combining multiple, separately rendered images in or-

der to provide a transparent appearance. The result of

the application of transparency to the glyphs is shown

in Figure 6.

As mentioned in Section 3, different leaf shapes

and characteristics need to be taken into account. In

nature, leaves own the characteristic that even when

multiple leaves overlap, we perceive differences due

to their diverse shape and color. To support this, we

apply transparency to the leaves. Figure 6 presents the

ﬁrst results. The application of transparency works

well, in our experience, for a limited amount of leaf

glyphs. When too many leaves overlap, perceptional

problems can arise: Since the transparency also ag-

gregates, from a certain extent on, the glyphs can be-

come occluded and not be distinguishable anymore.

For this reason, we propose two additional aggre-

gation techniques we observed in nature: Prototype

Generation and Abstraction.

4.2 Prototype Generation

As mentioned above, transparency might not be

enough when aggregating multiple glyphs. Therefore,

we propose to additionally generate a prototype glyph

that aggregates the characteristics of all considered

glyphs. We apply a grid to the image space and ag-

gregated all leaves whose calculated center point falls

IVAPP2015-InternationalConferenceonInformationVisualizationTheoryandApplications

200

Figure 7: Grid-based Aggregation. We apply a grid to the visualization and calculate the center point of each leaf glyph, and

aggregate all glyphs whose center points coincide within the same cell. Two different aggregations can be used: Prototype

Generation and Abstraction. The ﬁrst determines a representative glyph for the corresponding cell in the form of a median

glyph or a bouquet glyph. The second creates (similar to what we observe in nature), a branch with multiple leaves based on

the attributes of the considered leaves.

into the same grid cell; the cell dimensions are user

deﬁned. The glyph representing such cell can either

be a representative of a statistical concept such as the

median value of all coinciding glyphs or a bouquet

which combines different leaf glyph types (analogous

to different ﬂower types). Figure 7 shows the result of

both techniques, visualization of the median as well as

the visualization in form of a bouquet. For both tech-

niques, the transparency is preserved in order to be

able to distinguish between different attribute values

that determine the shape of a leaf glyph.

Our ﬁrst proposed prototype is the representation

of the median. We therefore create a new leaf glyph

that has a simple appearance by means of its shape.

We use the median venation, margin, and shape in or-

der to describe a set of leaves that coincide in one cell.

Similar to a bouquet, we derive our second pro-

posed prototype by combining and aligning all con-

tained leaf glyphs. First, all leaf glyphs sharing the

same shape are stacked using transparency as de-

scribed in Section 4.1. Second, stacked leaf glyphs

are aligned in a radial manner according to their

shape. This means, while in the ﬁrst step glyphs are

stacked according to their shape, in the second step

they are radially moved and aligned according to the

shape classes as pointed out in Section 3. As a result,

we get a representation similar to a bouquet.

4.3 Abstraction by Visual Aggregation

Based on the grid aggregation, we need to address

issues that emerge when too many glyphs fall into

one cell. Prototype generation may fail, if too many

glyphs along too many different shapes are aggre-

gated, and the visualized prototype may then suffer

from clutter. Therefore, we propose abstraction by

visual aggregation. We describe the new visual rep-

resentation for an aggregated set of glyphs. Similar

to growth characteristics of leaves we observe in na-

ture, this aggregation technique represents an aggre-

gated set of leaf glyphs as a new branch with multiple

leaves on it. All leaf glyphs are aligned side-by-side

along a branch according to Figure 7.

5 ILLUSTRATIVE APPLICATION

We deﬁned an encompassing scheme to generate leaf

glyph-based data visualizations for large data sets.

We implemented the above described designs in an

LeafGlyph-VisualizingMulti-dimensionalDatawithEnvironmentalCues

201

interactive system. We here exemplify results we

obtained with three data sets. These results aim to

show the principle applicability. Note that a thorough

comparison against alternative glyph designs and user

testing remain to be done in future work.

5.1 Forest Fire

The forest ﬁre data set is available in the UCI ma-

chine learning repository (Cortez and Morais, 2007)

and called forest ﬁre. It contains data about burned

areas of forests in Portugal on a daily basis for one

year. Additionally, weather information is included,

e.g., temperature, humidity, rain and wind conditions

at respective points in time. This data set does not

contain any categorical data which could be mapped

to the leaf shape. Therefore, we initially clustered the

data points with the DBSCAN algorithm (Han et al.,

2011) and assign local or global outliers to different

glyph shapes (Figure 8). Our idea is to map outliers to

the more jagged leaf shapes, while non-outlier points

get mapped to more regular or smooth shapes, thereby

providing a ﬁrst visual assessment of the degree of

outlyingness for the data. Our analysis task is to ﬁnd

similarities between burned areas to be able to predict

ﬁres due to certain weather conditions.

First, we applied alpha compositing as an aggre-

gation technique to get a rough idea of the data (Fig-

ure 9). We used one glyph for each data point and po-

sitioned them according to their temperature (y-axis)

and humidity (x-axis) value in a common scatterplot

layout. The orientation of the leaves illustrates the

wind strength and color hue/saturation is used to en-

code the time (i.e., month) of the data point (i.e., green

refers to the ﬁrst half of the year (spring and summer),

red to the second half (autumn and winter)). The

amount of rain is mapped to the margin, and the ve-

nation pattern. The overall size of the glyph encodes

the area of burned forest land after a logarithmic nor-

malization.

Figure 9 clearly shows three clusters of data points

separated by color (i.e., month). Most forest ﬁres

occur in the summer time (May - September) repre-

sented by yellow leaves. This cluster ranges from low

to high temperature and humidity values, showing a

visual correlation between the two. It seems that most

leaves are pointing to the left indicating low wind con-

ditions. A single maple leaf at the upper right corner

represents an outlier, which is surrounded by smaller

leaves pointing in the opposite direction. If we have a

closer look at this data point we can see that the mar-

gin is smooth (no rain), the wind is strong (oriented

to the right) and the temperature is high (y-position).

With this understanding of the data, it is plausible that

Figure 8: Shape Categories: Based on the results of the

clustering we assign different leaf shape templates accord-

ing to the data characteristics.

the burned forest area is so large. Low rain, high tem-

perature, and strong winds all support the spread of a

forest ﬁre. Another interesting ﬁnding is the outlier

highlighted with the label 1. Compared to the other

leaves, this is the only glyph with a highly serrated

margin encoding a high amount of rain. It is interest-

ing to see that the area of burned forests is relatively

high although it rains a lot. Perhaps the higher wind

strength is a possible reason, however, rain does not

prevent bigger ﬁres to happen.

For the other half of the year (red and green col-

ors) the temperature is higher with lesser forest ﬁres,

which is a surprising fact. However, the size of these

leaves especially in winter times (colored red) are rel-

atively big and are oriented to the right (strong wind

conditions). This visual correlation between tempera-

ture, wind condition and the size of burned areas is an

expected ﬁnding since the wind is most often respon-

sible for spreading ﬁre in a certain direction.

IVAPP2015-InternationalConferenceonInformationVisualizationTheoryandApplications

202

Figure 9: Forest Fire Data Set: We applied alpha compositing for aggregation to get a ﬁrst overview of the data set. We

used the following mapping to represent the multi-dimensional data: Shape

= local/global outlier, y-position

= temperature,

and x-position

= humidity, color hue/saturation

= time (i.e., month), size

= area of burned forests, venation and margin

= rain,

orientation

= wind.

Since we now understand the overall structure of

the data, we switch to an alternative aggregation tech-

nique to better understand the highly cluttered area

(Figure 10). Because of our prototype generation, we

loose the orientation of the glyphs and, therefore, the

wind condition. In the highly cluttered area in the

middle of the plot, several different maple leaf shapes

are now visible. They refer to outliers detected by our

previous clustering algorithm. It is interesting to see

that the temperature for these data points is relatively

low with nearly no rain and mixed humidity. Typical

indicators for ﬁre, like high temperature, low humid-

ity and high wind strengths seem not to be the main

reason for the large burned forest areas. Perhaps other

factors, e.g., the area or the coverage of ﬁre stations,

might be explaining factors here.

Of course, these ﬁndings would need to be sub-

stantiated by additional data considerations. Further

information, e.g., the amount of ﬁremen ﬁghting the

ﬁre, the exact kind and amount of trees, or the time

until the ﬁre was recognized are important side fac-

tors not covered within the data. However, with our

new glyph approach we were able to easily identify

timely patterns, outliers, and similar behavior of data

points.

5.2 Iris and Seeds

Figure 11 illustrates two well-known data sets (i.e.,

iris and seeds) from the UCI machine learning repos-

itory as an infographic representation. For both data

sets, an initial k-means clustering is performed based

on the number of classes within the data set. The clus-

ters are then mapped to unique leaf shapes and pro-

jected to 2D space by Principal Component Analysis

(PCA). As a last step the data dimensions are mapped

to leaf glyph properties providing insights of the data.

Due to the projection, some classes can already be

distinguished. However, additionally assigning the

clusters to different shapes helps to characterize the

data more easily.

By mapping all data dimensions to glyph features,

LeafGlyph-VisualizingMulti-dimensionalDatawithEnvironmentalCues

203

Figure 10: Forest Fire Data Set: We applied a prototype aggregation technique to reveal insights to the highly cluttered areas

in the plot. Interesting to note are the relatively big outlier leaf shapes, which were not visible beforehand.

it is possible to extract more detailed information. In

the seeds data set, there is a visual correlation be-

tween orientation (length of the grain) and venation

frequency (width of the grain). The same thing is true

for the color hue (asymmetry coefﬁcient) and the y-

position (1st principal component). The size (com-

pactness) seems to slightly reﬂect the x-position (2nd

principal component).

The iris data set is clearly divided into two differ-

ent clusters by performing a PCA projection. How-

ever, the data contain three classes, which are mapped

to the shape by performing a k-means clustering. The

visualization clearly shows two classes within the sin-

gle cluster on the left. There seems to be a high corre-

lation between the sepal height and length, which are

mapped to the height and length of the glyph respec-

tively. Since no leaf shape gets rescaled, the ratio be-

tween the two is read similar. Within the three classes,

there is an almost equal distribution of the petal length

mapped to the color hue. Finally, the orientation rep-

resents the petal width, which highly correlates to the

x-position (2nd principal component).

6 CONCLUSION AND FUTURE

WORK

We introduced Leaf Glyph, a novel glyph design in-

spired by an environmental metaphor. Due to its nat-

ural and pleasing appearance, we expect users are

likely to be able to discriminate data by shape and

properties. The glyph is based on a naturally promi-

nent shape, which should connect well to human per-

ception, supposedly also under conditions of partial

overlap. We systematically structured the leaf glyph

design space. Speciﬁcally, we mapped data to the

main properties of the leaf glyph: leaf morphology,

leaf venation, and leaf boundary. Furthermore, we de-

ﬁned a custom visual aggregation to scale the glyph

for large numbers of data records with respect to its

counterpart in nature. Finally, we exempliﬁed the ap-

plicability and effectiveness of our approach in a mul-

tivariate data analysis task.

This work is only the ﬁrst step in studying the

effectiveness of nature-oriented data visualization.

While we believe leaf glyphs can form intuitive and

IVAPP2015-InternationalConferenceonInformationVisualizationTheoryandApplications

204

Figure 11: Infographic Representation: The well-known iris and seeds data sets from the UCI machine learning repository

are visualized using a 2D projection, and an appropriate mapping of data dimensions to leaf shape characteristics.

effective data glyphs, more thorough evaluation is

needed. Speciﬁcally, we want to compare the leaf

glyph against alternative glyphs from the literature,

such as Chernoff faces, and pixel-oriented glyphs.

This should also include user-studying of effective-

ness and efﬁciency of the technique. We also believe

our approach is aesthetically pleasing and may spark

interest by a wider audience, for use, e.g., in mass me-

dia communication. The leaf glyph may by design,

ﬁt well to visualization of environment survey data.

Also, this should be evaluated by qualitative consid-

eration.

As a next step, we will combine our multi-

dimensional leaf glyph representation with related

botanical tree metaphors to extend the design space

with a hierarchical layout. We think the combination

of the two will support people with no computer sci-

ence background more easily in understanding com-

plex data structures due to the environmental refer-

ence. We further will test this in a controlled envi-

ronment against more abstract representations such as

TreeMaps, etc.

ACKNOWLEDGEMENTS

This work has been supported by the Consensus

project and has been partly funded by the European

Commission’s 7th Framework Programme through

theme ICT-2013.5.4 ICT for Governance and Policy

Modelling under contract no.611688.

REFERENCES

Beck, C. B. (2010). An introduction to plant structure and

development: plant anatomy for the twenty-ﬁrst cen-

tury. Cambridge University Press.

Borgo, R., Kehrer, J., Chung, D. H., Maguire, E.,

Laramee, R. S., Hauser, H., Ward, M., and Chen, M.

(2012). Glyph-based Visualization: Foundations, De-

sign Guidelines, Techniques and Applications. In Pro-

ceedings of Eurographics, pages 39–63. Eurograph-

ics.

Chernoff, H. (1973). The use of faces to represent points in

k-dimensional space graphically. Journal of the Amer-

ican Statistical Association, pages 361–368.

Chuah, M. C. and Eick, S. G. (1998). Information rich

glyphs for software management data. Computer

Graphics and Applications, IEEE, 18(4):24–29.

Cleveland, W. and McGill, R. (1984). Graphical perception:

Theory, experimentation, and application to the devel-

opment of graphical methods. Journal of the Ameri-

can Statistical Association, pages 531–554.

Cortez, P. and Morais, A. d. J. R. (2007). A data mining

approach to predict forest ﬁres using meteorological

data.

Deussen, O. and Lintermann, B. (2005). Digital design of

nature. Springer.

Du Toit, S. H., Steyn, A. G. W., and Stumpf, R. H. (1986).

Graphical Exploratory Data Analysis. Springer-

Verlag, New York.

Fuchs, J., Fischer, F., Mansmann, F., Bertini, E., and Isen-

berg, P. (2013). Evaluation of Alternative Glyph De-

signs for Time Series Data in a Small Multiple Setting.

In Proceedings Human Factors in Computing Systems

(CHI), pages 3237–3246. ACM.

Han, J., Kamber, M., and Pei, J. (2011). Data Mining: Con-

LeafGlyph-VisualizingMulti-dimensionalDatawithEnvironmentalCues

205

cepts and Techniques. Elsevier Ltd, Oxford, 3rd edi-

tion.

Kintzel, C., Fuchs, J., and Mansmann, F. (2011). Moni-

toring Large IP Spaces with Clockview. In Proceed-

ings Symposium on Visualization for Cyber Security,

page 2. ACM.

Klassen, R. V. and Harrington, S. J. (1991). Shadowed

hedgehogs: A technique for visualizing 2d slices of

3d vector ﬁelds. In Proceedings of the 2nd conference

on Visualization’91, pages 148–153. IEEE Computer

Society Press.

Kleiberg, E., van de Wetering, H., and van Wijk, J. (2001).

Botanical visualization of huge hierarchies. In In-

formation Visualization, 2001. INFOVIS 2001. IEEE

Symposium on, pages 87–94. IEEE.

Levkowitz, H. and Herman, G. (1992). Color scales for im-

age data. Computer Graphics and Applications, IEEE,

12(1):72–80.

uller, B. (2014). Poetry on the road.

http://www.esono.com/boris/projects/poetry05/.

Retrieved July 2014.

Palmer, S. E. (1999). Vision science: Photons to phe-

nomenology, volume 1. MIT press Cambridge, MA.

Pickett, R. M. and Grinstein, G. G. (1988). Iconographic

Displays for Visualizing Multidimensional Data. In

Proceedings of the Conference on Systems, Man, and

Cybernetics, volume 514, page 519. IEEE.

Porter, T. and Duff, T. (1984). Compositing digital images.

In Proceedings of the 11th Annual Conference on

Computer Graphics and Interactive Techniques, SIG-

GRAPH ’84, pages 253–259, New York, NY, USA.

ACM.

Sallaberry, A., Fu, Y.-C., Ho, H.-C., and Ma, K.-L. (2012).

Contacttrees: Ego-centered visualization of social re-

lations. Technical report.

Siegel, J., Farrell, E., Goldwyn, R., and Friedman, H.

(1972). The Surgical Implications of Physiologic

Patterns in Myocardial Infarction Shock. Surgery,

72(1):126.

Stefaner, M. (2014a). The deleted. http://notabilia.net/. Re-

trieved July 2014.

Stefaner, M. (2014b). Oecd better life index.

http://moritz.stefaner.eu/projects/oecd-better-life-

index/. Retrieved July 2014.

Ward, M. (2008). Multivariate Data Glyphs: Principles and

Practice. Handbook of Data Visualization, pages 179–

198.

Ware, C. (2012). Information Visualization: Perception for

Design. Morgan Kaufmann, Waltham.

IVAPP2015-InternationalConferenceonInformationVisualizationTheoryandApplications

206