Real-time Intelligent Clustering for Graph Visualization

Lionel Martin

1,2

and G´eraldine Bous

EPFL, Lausanne, Switzerland

RTI, SAP Research, Sophia Antipolis, France

Keywords:

Graph Visualization, Social Networks, Graph Clustering, Machine Learning, User Interaction.

Abstract:

We present a tool for the interactive exploration and analysis of large clustered graphs. The tool empowers

users to control the granularity of the graph, either by direct interaction (collapsing/expanding clusters) or

via a slider that automatically computes a clustered graph of the desired size. Moreover, we explore the

use of learning algorithms to capture graph exploration preferences based on a history of user interactions.

The learned parameters are then used to modify the action of the slider in view of mimicking the natural

interaction/exploration behavior of the user.

1 INTRODUCTION

Business Intelligence, (Social) Network Analysis or

Biology are just a few examples of domains where

graphs are used as data structures to capture the rela-

tional structure in the underlying data. To fully take

advantage of graphs, i.e. to allow analysts and sci-

entists to gain insight into the data, it is necessary to

provide user-friendly graph analysis and visualization

tools.

Developing graph visualization tools that are si-

multaneously simple, intuitive and ﬂexible (i.e. con-

ﬁgurable) is a challenge as users have different re-

quirements that translate as constraints in the graph

exploration process. For large graphs, displaying the

entire dataset leads to clutter and information over-

load, which impedes the analysis targeted by the end-

user; on the other hand, displaying a simpliﬁed (ﬁl-

tered or clustered) graph without allowing the user to

interactively conﬁgure the constraints (e.g. in ﬁlters)

or the level of detail in the display (e.g. the num-

ber of clusters) is simply too restrictive to meet the

requirements of most users. Over the years, many

authors have proposed approaches and techniques to

address the challenge of bringing together simplicity

and ﬂexibility in tools for graph analysis and visu-

alization; the reader may refer to (von Landesberger

et al., 2011) for a recent review. Some proposals fo-

cus on the graph representation itself, showing that

node-link diagrams are more convenient for sparse

graphs (Ghoniem et al., 2005), while adjacency ma-

trices (Elmqvist et al., 2008) or hybrid representa-

tions (Henry and Fekete, 2006; Henry et al., 2007) are

more convenient for dense graphs. In parallel, many

proposals focus on interactive exploration techniques

to address the challenge of visual information over-

load. Simple techniques like zooming, distortion or

panning are helpful in visualizing large graphs (Card

et al., 1999). Another possibility is to allow users to

interactively reduce the amount of information dis-

played. Most interactive information reduction tech-

niques belong to two major axes that are clustering,

e.g. (Henry and Fekete, 2006; Archanbault et al.,

2008; Archanbault et al., 2002), and ﬁltering, e.g.

(Heer and Boyd, 2005; Elmqvist and Fekete, 2010).

While the latter removes ‘uninteresting data’ to re-

duce the amount of information displayed, the for-

mer simply aggregates similar entities together into

‘visual containers’, called clusters. In addition, cer-

tain authors also allow to ﬁlter according to a degree

of interest function (van Ham and Perer, 2009) which

reﬂects the exploration focus of the user.

In this paper, we present a tool for the interactive

exploration and analysis of large clustered graphs.

The tool empowers users to control the granularity

of the graph, either by direct interaction (collaps-

ing/expanding clusters) or via a slider that automati-

cally computes a clustered graph of the desired ‘size’.

More precisely, the slider allows the user to deﬁne and

control the visual entity budget (Elmqvist and Fekete,

2010), which reﬂects the number of edges and nodes

displayed; for a given budget, a clustered graph with

a matching number of edges and nodes is calculated

and displayed. To ensure continuity and coherence

471

Martin L. and Bous G..

Real-time Intelligent Clustering for Graph Visualization.

DOI: 10.5220/0004305504710480

In Proceedings of the International Conference on Computer Graphics Theory and Applications and International Conference on Information

Visualization Theory and Applications (IVAPP-2013), pages 471-480

ISBN: 978-989-8565-46-4

 2013 SCITEPRESS (Science and Technology Publications, Lda.)

Figure 1: User Interface of our visualization tool.

in the display, i.e. to ensure that the graph displayed

with budget y is the graph that would be displayed

with budget x < y plus a certain number of expanded

clusters, we automatically calculate an exploration se-

quence for a hierarchically clustered graph. This se-

quence is then mapped to the slider; interacting with

the latter then allows having a quick overview of the

graph at different levels of detail in a user-friendly

manner. To further enhance user-experience and en-

able customization, we explore the use of learning

algorithms to capture graph exploration preferences

based on a history of user interactions. The tool

records user interactions to learn a degree of interest

function. This learned function is then used to mod-

ify the exploration sequence (and hence the action of

the slider) in view of mimicking the natural interac-

tion/exploration behavior of the user.

This paper is structured as follows. After an

overview of related work in section 2, we present our

tool in more detail in section 3. Section 4 addresses

the learning mechanism and details the results. Fi-

nally, we conclude in section 5 with challenges for

the future.

2 RELATED WORK

During the last decades, graph clustering has been

an intensively researched ﬁeld proposing now a very

large choice of methods for various kinds of graphs

(the reader may refer to (Fortunato, 2010) for a com-

plete presentation of the different methods). Cluster-

ing techniques can be classiﬁed into two main cate-

gories: structure and attribute-based algorithms. Ap-

proaches in the former group use the structural prop-

erties of graphs to decide how vertices and edges

should be clustered. Examples include methods based

on the connectivity of nodes (Kernighan and Lin,

1970; Suaris and Kedem, 1988), the shortest paths

between nodes (Wu et al., 2004) and several other

measures like e.g. edge-betweenness centrality or

modularity (Girvan and Newman, 2002; Radicchi

et al., 2004; Newman, 2004; Duch and Arenas, 2005;

Blondel et al., 2008). On the other hand, attribute-

based clustering algorithms use the attributes of nodes

and edges to deﬁne clusters (Shneiderman and Aris,

2006; Wattenberg, 2006; Elmqvist et al., 2008).

Mixed approaches, taking both the graph structure

and its attributes into account, have also been pro-

posed (Archanbault et al., 2008).

Aside from the ‘algorithmic aspect’ of graph clus-

tering, many contributions propose tools or frame-

works for graph analysis and visualization that enable

users to interact with the graphs, for example (Henry

and Fekete, 2006; Archanbault et al., 2008; Archan-

bault et al., 2002). More recently, tools allowing users

to interactively ‘manage’ the data itself have been pre-

sented. The work of (Heer and Perer, 2011) describes

a system for the modeling, transformation and visual-

ization of multidimensional heterogeneous networks.

IVAPP2013-InternationalConferenceonInformationVisualizationTheoryandApplications

472

In parallel, (Liu et al., 2011) propose a tool to ex-

tract networks from tabular data which also includes

an interactive visual interface supporting operations

like aggregation (binning, grouping, etc.) or projec-

tion to generate different views of the data at different

levels of granularity.

Our work is closest in spirit to (Liu et al., 2011) in

that we propose a tool for graph (network) visualiza-

tion that implements projection and aggregation oper-

ations, including clustering with Louvain’s algorithm

(Blondel et al., 2008). In addition to direct interactive

exploration of the graph through expand / collapse op-

erations, our tool also extends the work of (Liu et al.,

2011) and other (previously cited) work on interactive

graph clustering by allowing the user to manage the

visual budget (Elmqvist and Fekete, 2010), i.e. the

number of nodes and edges displayed. The visual

budget is controlled by a simple slider ‘mapped’ to

a hierarchically clustered graph. Interacting with the

slider thus allows to automatically expand/collapse

clusters and thereby to quickly gain an overview of

the graph with little effort. The most innovative as-

pect of our proposal lies in the use of machine learn-

ing algorithms, which, based on a history of user in-

teractions, learn both how the graph should be (hi-

erarchically) clustered and in what sequence vertices

and edges should be expanded / collapsed. Learned

parameters are then used to derive a new graph explo-

ration sequence which is mapped back to the slider.

3 TOOL AND USE-CASE

DESCRIPTION

Traditional structure-based graph clustering algo-

rithms use predeﬁned measures, like e.g. edge-

betweenness centrality, to produce clustered graphs.

The disadvantage of such predeﬁned measures is that

the clustered graphs they generate may not reﬂect or

meet user preferences and requirements. In an in-

teractive framework, users may however explore the

graph and search for the information or subgraphs that

they are speciﬁcally interested in. Our goal is to pro-

vide users with all tools needed to efﬁciently explore

the graph and, in addition, to learn from user interac-

tions to recommend ‘views’ and ‘exploration paths’

of the graph that match user interest.

3.1 Overview

In this section we focus on the graph exploration

alone (learning is addressed later). Our prototype for

graph analysis is shown in ﬁgure 1. It is composed of

an interactive graph visualization panel, a menu for

Figure 2: Projection (Liu et al., 2011).

selecting data aggregation operators (clustering, ﬁl-

tering, grouping and projection) in the top right corner

and a slider to control the visual budget just below.

The use case considered in this paper is a so-

cial network extracted from an online forum (the

SAP Community Network). More precisely, we an-

alyze the graph that arises from the reply-structure

between messages. Forums have a natural hierar-

chical structure, as every forum contains threads,

which in turn contain the messages; messages in turn

have a unique author. Hence, two distinct hierar-

chical structures (forum-thread-message and author-

message) are linked together by the replies between

messages. From the author-message perspective, the

reply structure induces a bipartite graph. It is there-

fore convenient to calculate projections (Latapy et al.,

2008) of one set (i.e. hierarchy) over the other. The

projection of a bipartite graph is a graph whose ver-

tices belong to one set and are connected by edges

if and only if they shared a neighbor in the original

bipartite graph (see ﬁgure 2). This can expose corre-

lations in the data (e.g. author-author connectivity).

In addition, it also is of interest to exploit the natu-

ral hierarchies contained in the data to group nodes in

a semantically meaningful way. Both approaches are

implemented in our tool.

The disadvantage of semantic grouping and pro-

jection is that the graphs they generate are usually

highly connected graphs, which are difﬁcult to vi-

sualize due to the high number of edges. There-

fore, the tool also offers the possibility to cluster

the graph. Our tool implements Louvain’s cluster-

ing method (Blondel et al., 2008), which is based on

the graph modularity measure (Newman and Girvan,

2004), as it is a fast and efﬁcient algorithm. This

method has the advantage of producing hierarchically

clustered graphs, which can be used for interactiveex-

ploration by expanding/collapsing clusters, a feature

which is also offered in our graph analysis tool.

3.2 Entity Budget Interactions

Interactive graph exploration as described above can

be a tedious and time-consuming process. It is there-

fore convenient to provide users with a simple way

to visualize a hierarchically clustered graph at dif-

ferent levels of granularity. With this goal in mind,

our tool offers the possibility to use a slider to con-

Real-timeIntelligentClusteringforGraphVisualization

473

trol the amount of information displayed using a pre-

cise measure. The maximum of information that can

be displayed is the entire graph; the minimum is a

maximally clustered graph, i.e. a single cluster node

(the root node of the hierarchically clustered graph)

for each connected component of the graph. Consider

a graph G containing a total of |E| edges and |V| ver-

tices. For a clustered graph G

, |E

| and |V

| denote the

number of visible edges and nodes, respectively. The

measure that reﬂects the amount of entities in graph

is calculated as

| + β|V

|E| + β|V|

, (1)

where β =

max

−1)

and N

max

denotes the maxi-

mum number of neighbors in the graph. Now that

the measure and the notion of ‘maximum’ and ‘min-

imum’ graphs have been deﬁned, the intermediate

steps must be considered: which graph should be as-

sociated to the measure I

? Indeed, the slider will

simply help the user to choose the proportion of in-

formation to display given the measure associated to

it; however, there may be more than one way to par-

tially expand a hierarchically clustered graph to match

the measure I

. It is thus necessary to deﬁne how this

measure should be interpreted, both in terms of which

entities should be displayed and also concerning the

proportion of nodes and edges.

‘Navigating’ from the minimal to the maximal

graph with expand operations can be done in several

ways. For example, it is possible to expand clusters

in either a purely ‘depth-ﬁrst’ or ‘breadth-ﬁrst’ fash-

ion, but anything in between is also possible. More-

over there are distinct partially expanded graphs that

may have the same number of entities. Since there

are several ways to reach a given quantity of infor-

mation, there are also potentially different intermedi-

ate steps leading to the same conﬁguration (see ﬁgure

3). Hence, there are many distinct sequences of ex-

pand operations that can lead from the minimum to

the maximum graph. The slider should however only

be mapped to one sequence, which we call the default

sequence. Many parameters can be considered to de-

termine the order in which the different elements are

displayed using the slider. For example, parameters

related to the size of the nodes, the strength of the

links, the number of neighbors, and so on.

With a default exploration sequence deﬁned, in-

creasing the proportion of information is the same

as applying several (or sometimes just one) expand

operations on the minimal graph. The continuity of

the measure mapped to the slider is thus an impor-

tant consideration: the measure is a continuous scale,

but expand operations may cause large ‘jumps’ in the

Figure 3: Two different sequences leading to the same con-

ﬁguration.

number of items displayed. Indeed, if a step in the se-

quence corresponds to expanding a large cluster con-

taining ∆I entities, the measure will increase drasti-

cally at once. However, when the user interacts with

the slider, she may very well move it from value I

to I

i+1

= I

+ δ · ∆I, with 0 < δ < 1, i.e. within the

gap of the graph corresponding to value I

and the one

corresponding to value I

+ ∆I. When this occurs, our

exploration algorithm shows only part of the informa-

tion contained in a cluster with respect to the value

δ. Overall, the process to expand the graph with a

change in the slider from position I

to an entity bud-

get EB (with EB > I

) is the following:

• clusters are opened according to the default se-

quence G

, G

i+1

, ... as long as EB is not exceeded.

In other words, if the k

graph of the sequence

satisﬁes I

< EB < I

k+1

, then graph G

is dis-

played entirely.

• to allocate the missing part of the entity budget

EB − I

, part of the (k + 1)

graph is shown as

well. This is achieved by expanding the next clus-

ter (which differentiates G

k+1

from G

) partially,

i.e. showing as many entities as necessary to reach

EB and leaving the rest hidden within the cluster.

On the other hand, when the slider is placed at a value

EB < I

, the opposite is done: clusters are closed fol-

lowing the sequence in reverse order until reaching

graph G

k+1

satisfying I

< EB < I

k+1

with k + 1 ≤ i;

the expanded cluster that differentiates G

k+1

from G

is then partially closed to match EB.

The challenge is that the user can also interact di-

rectly with the graph to expand and collapse nodes.

If the user ﬁrst uses the slider to partially expand

the graph and then starts interacting directly with the

graph (performing expand / collapse operations) the

default sequence may not be further applicable, as,

through her interactions, the user is likely to deﬁne

sequence different from the default one. For exam-

ple, assume that she uses the slider to display 44%

of the information and then expands the three small-

est node clusters displayed, C

, C

, and C

. The new

amount of information displayed is 46% (see ﬁgure

IVAPP2013-InternationalConferenceonInformationVisualizationTheoryandApplications

474

(a) Default sequence before interaction, next action after 44% is openingC

(b) Alternative sequence deﬁned by the user after the expansion of C

, C

and C

in gray can be or not be identical to the one of the old default sequence.

(d) Modiﬁcations of everything beyond 46% to improve the next explo-

rations for the user.

Figure 4: Modiﬁcations of the default sequence according

to user interactions.

4(b)). Assume the default sequence’s next operation

after reaching 44% was to expand another cluster, C

than the ones interactively expanded, modifying the

measure to 46% as well (see ﬁgure 4(a)). If the user

now moves the slider to 48%, the heuristic that de-

ﬁnes the order of the operations cannot ‘know’ that

the cluster C

is not expanded and that it cannot ex-

pand C

, C

or C

anymore. We thus need to deﬁne

a way to smoothly go from the conﬁguration before

and the conﬁguration after user interactions. In this

example, we need at least to redeﬁne the default se-

quence between 44% and 48% (see ﬁgure 4(c)). But

even if it seems still valid to consider the old part of

the sequence between 0% and 44% and beyond 48%,

nothing forbids changing it as well (see ﬁgure 4(d)).

Our solution to this problem is the following.

When the user explores graph G

in a different or-

der from that deﬁned in the default sequence, the

latter is reordered such that the partial sequence de-

ﬁned by the user is ‘inserted’ after G

. More pre-

cisely, if the sequence is ..., G

, G

i+1

, ..., G

, ...

with i + 1 < k < l and at G

the user expands the

clusters that lead to G

and then G

, the default se-

quence becomes ..., G

, G

i+1

, .... Note that this

does not pose inconsistency problems: the sequence

in which clusters are opened is arbitrary provided no

cluster is opened before any of its parent vertices, i.e.

before any of the clusters in which a cluster is itself

contained. Since this principle cannot be violated by

manual exploration, redeﬁning the default sequence

by capturing user interaction is perfectly consistent

with the hierarchically clustered graph.

4 INTERACTIVE LEARNING

As we have just shown, a hierarchically clustered

graph can be explored in many distinct ways. To map

one exploration sequence among the many possible to

the ‘exploration slider’ thus involves a default choice

which can be based on several criteria. In the previous

section, this choice was made on the basis of two con-

siderations: ﬁrst, the exploration sequence must be

sufﬁciently ﬁne-grained to be mapped to an (approxi-

mately) continuous measure; second, it must be possi-

ble to modify subsequences of the default sequence in

case the user interacts directly with the graph. Nev-

ertheless, the choice of the default sequence is still

highly arbitrary and it is thus convenient to introduce

more criteria (i.e. constraints) to further reduce the

set of feasible sequences. Most importantly, it is pos-

sible to introduce constraints that reﬂect user prefer-

ences or patterns in the graph exploration behavior.

By keeping track of user interactions with the graph,

it is possible to ‘learn’ the exploration behavior of the

user with machine learning algorithms. The rules ac-

quired through learning can then be used to update

the default exploration sequence in such a way that it

infers the natural behavior of the user.

4.1 Learning Framework

(Clustered) vertices and edges in a graph have differ-

ent characteristics that can be integrated as parameters

in an interest (or preference) function. The character-

istics (i.e. criteria) we consider are structural:

• the number of neighbors of a vertex;

• the number of entities in a cluster (which reﬂects

the size of a node in the clustered graph);

• the depth of the hierarchy in a clustered node and

• ‘edge width’ (which reﬂects its weight).

From a record of user interactions (which evolves in

real-time as the user explores the graph), it is possible

to deduce and quantify which (combination of) cri-

teria best reﬂect user behavior. In what follows, we

detail the model and the learning framework.

We denote the set of possible actions as A =

, a

, . . . , a

}. Each of these actions corresponds to

collapsing or expanding a speciﬁc node. When the

user chooses to perform action e.g. a

, we can de-

duce that action a

is preferred to all the other avail-

able actions, i.e. a

≻ a

, where i = 1, . . . , n and ≻

is the notation for ‘preferred to’. Since action a

performed on a speciﬁc node, we can deduce that the

speciﬁc characteristics of the node are what made this

node relevant for the user. The interest function (also

known as ‘utility’ or ‘value function’ in the decision

Real-timeIntelligentClusteringforGraphVisualization

475

theoretic contexts) is a function u : A → R that satis-

ﬁes a

≻ a

⇔ u(a

) ≥ u(a

) for all a

, a

∈ A , i 6= k

(Mehta, 1998; Aleskerov et al., 2007). In other terms,

u(a

) is the ‘value’ of action a

as perceived by the

user (Keeney and Raiffa, 1976). The utility function

is a parametric function that aggregates (i.e. takes into

account) all the criteria listed above. More precisely,

u(a

) is a function of the marginal utilities u

which model the value of action a

on criterion j. The

most frequently used aggregation model is the addi-

tive model, which leads to

u(a

) =

∑

j=1

) ∀a

∈ A , (2)

where J is the total number of criteria. With this

model, the utility of an action is thus the sum of its

marginal utilities. To deﬁne the marginal utilities,

we use a simple linear model. Before we detail the

equations, we must ﬁrst take into account the fact that

some criteria should be minimized (i.e. the smaller,

the better for the user) or maximized (the larger, the

better). Let

(

−1 if criterion j is minimized and

1 otherwise.

(3)

Marginal utility functions are then deﬁned as

) = w



1− s

+ s

− min

max

−min



, (4)

where w

is the weight of criterion j, a

is the ac-

tual (non-subjective) value of action i on criterion

j and [min

, max

] is the domain of criterion j (i.e.

∈ [min

, max

], ∀i, j). For example, for the crite-

rion ‘number of entities in a cluster’ and a cluster i

containing 20 entities, we would have a

= 20.

The weights are the parameters that we seek to

estimate. Every interaction of the user can be trans-

lated into a constraint a

≻ a

⇔ u(a

) ≥ u(a

) using

the model described above; the goal of the learning

procedure is to determine the weights in such man-

ner that as many constraints as possible are satis-

ﬁed. To learn the weights, we apply the UTA method

(Jacquet-Lagr`eze and Siskos, 1982; Siskos and Yan-

nacopoulos, 1985) which consists in solving the fol-

lowing linear optimization problem:











min

∑

∈A

σ(a

)

s.t. u(a

) + σ(a

) ≥ u(a

) + σ(a

) if a

≻ a

∑

= 1

≥ 0 ∀ j

σ(a

) ≥ 0 ∀a

∈ A ,

(5)

where the σ(a

) are error-variables assigned to each

action a

∈ A . By minimizing the sum of the error

variables, this linear program determines the weights

that allow most closely matching the preference con-

straints derived from user interactions. For more de-

tails on this and other topics in UTA and other related

methods, the reader may refer to (Siskos et al., 2005;

Bous et al., 2010).

A ﬁnal remark is due. In decision theoretic con-

texts, it is known in advance whether a criterion

should be maximized or minimized, i.e. the value of

is given (the decision maker provides this informa-

tion ‘orally’ to the analyst). In our context, however,

this information must be interpreted from user inter-

actions directly. To estimate the signs, we deﬁne a set

of points in the neighborhood of the origin (in utility-

function space) and compute the distance to the con-

straints for those points. Then we take the signs of the

point in the set that minimizes the total error (sum of

the errors).

In the following section, we describe the ‘learning

procedure’, i.e. how user interactions are recorded

and interpreted in view of applying the learning algo-

rithm.

4.2 Learning Process

Learning from user interactions implies storing the

preference constraints and solving the optimization

problem discussed in the previous section. Figure 5

gives an overview of the different steps of the real-

time learning process: ﬁrst, every interaction is trans-

lated into preference constraints (on the basis of the

criteria deﬁned in the previous section), which must

be stored. The set of constraints is then used to deter-

mine whether criteria should be minimized or maxi-

mized (sign detection); next, both the constraints and

the signs are used to calculate the weights of the inter-

est function by the resolution of (5). Once the interest

function has been ‘learned’, it is used to deﬁne the

default sequence for the exploration of the graph with

the slider.

In addition to these steps, we emphasize that

learning and updating the ‘interest function’ of the

user in real-time raises several questions. For in-

stance, how frequently should the interest function be

updated? And, how should the record of user inter-

actions (i.e. the set of constraints) be managed and

updated?

With respect to the record of user interactions,

there are several possible solutions. The ﬁrst and sim-

plest one is to keep all constraints generated, but this

has the disadvantage to lead to a very high number

of constraints, which may pose storage problems. Al-

IVAPP2013-InternationalConferenceonInformationVisualizationTheoryandApplications

476

Figure 5: Diagram of the real-time learning process.

ternatively, it is possible to reinitialize the set of con-

straints on a regular basis, e.g. for each graph ex-

plored or each use of the software. More advanced

update procedures are possible, but, for testing and

experimentation purposes, we chose to reinitialize the

history at each use of the software.

For a given record of user interactions, the next

challenge is to decide which constraints should be

used in the optimization problem (i.e. we here distin-

guish between storage and optimization issues). As

the number of constraints grows, the error variables

in the optimization problem may increase. This is

typically the case if the interaction pattern changes

over time. For example, the user may start exploring

a graph with a few ‘random’ actions before starting

to show a consistent exploration behavior. Large er-

ror variables can thus be considered as indicators re-

ﬂecting that the set of preference constraints may no

longer be relevant and require an update. However,

determining which constraint needs to be removed

from the record in order to maximally reduce the ob-

jective function of (5) is a combinatorial problem and

its exact resolution is difﬁcult to implement in a real-

time learning environment. To solve this problem,

we therefore implemented and tested several heuris-

tic methods.

A fast and simple heuristic to address the prob-

lem is to simply remove the constraints that trigger

the highest error variables. Our experiments show that

this heuristic approach yields good results (see ﬁgure

6(a)) compared to the optimal solution (computed by

exhaustive enumeration). In our implementation, as

soon the objective function of (5) exceeds a certain

threshold, the heuristic method is applied (one or sev-

eral times) to bring the value of the objective function

below the threshold.

A second heuristic method tested was to remove

the oldest constraints. The idea here is that, if the in-

teraction pattern has changed, then old constraints are

obsolete and should thus be removed. However, this

technique does not work well as it does not signiﬁ-

cantly reduce the objective function. The reason for

this is that, when the change in the interaction pattern

occurs, it may take many more constraints to exceed

the threshold of the objective function. In our experi-

ments, this method required removing more than half

of the constraints for a given threshold in order to re-

duce the value of the objective function signiﬁcantly

(see ﬁgure 6(b)).

A third approach is to deﬁne a lifetime for each

constraint. As soon as a constraint is generated, it

is included in the optimization problem for a limited

number of steps (counted in user interactions) only.

The intuition behind this technique is that, if the user

is consistent and continues to explore the graph with

the same goals, then similar constraints should ‘natu-

rally’ reappear regularly. This method not only pro-

duces good results (see ﬁgure 6(c)), but it also allows

to solve the storage and memory problem evoked ear-

lier in this section. While this solution is the most

technically and conceptually satisfying, ‘preference

constraint management’ is an important component of

systems which learn from user interaction; therefore

it is important to test this and other methods with a

larger pool of users to evaluate which approaches are

most ‘natural’ to users. We leave this question open

for further research.

5 DISCUSSION

Our main goal in this investigation was to analyze the

feasibility and technical requirements of an ‘intelli-

gent visual analysis system’ for complex data struc-

tures on the basis of ‘simple’ controls. The combina-

tion of ﬁelds like Visual Analytics and Artiﬁcial In-

telligence are still at a pioneering stage and the chal-

lenges left to address and topics that still have to be in-

vestigated are many. The model, the learning set and

the interpretation of man-machine interactions are the

three ingredients that ultimately deﬁne the ‘behavior’

and quality of a learning-based system. Therefore, it

Real-timeIntelligentClusteringforGraphVisualization

477

(a) Test on constraint suppression that shows that the removal of the constraint with the

largest slack variable (in green) is one of the best solutions to minimize the objective func-

tion (optimum is in red).

(b) Evolution of the objective function when removing the constraints one by one from the

oldest to the most recent.

5 steps that follow their apparition.

Figure 6: Experimental results of constraint management heuristics.

is necessary to experiment with several models and

‘constraint management techniques’ in order to un-

derstand how real-time learning systems for visual an-

alytics should be designed to better reﬂect user behav-

IVAPP2013-InternationalConferenceonInformationVisualizationTheoryandApplications

478

ior and maximize user satisfaction.

In addition to the ‘conceptual’ and system-design

oriented challenges, it is worthwhile to address the

‘behavioral aspects’ of graph exploration, which also

play a key role in a system designed to learn from

man-machine interactions. For instance, it is worth

investigating whether preferred interactions differ

from graph to graph or application area or whether

they are particular to a user. Moreover, in depth ex-

perimental evaluations would allow to analyze the ef-

ﬁciency of exploration strategies, as well as to deter-

mine how and when ‘good strategies’ should be rec-

ommended to users in view of avoiding conﬁnement

to systematic routine exploration mechanisms.

Finally, many technical challenges have to be ad-

dresses as well. In addition to a formalization of the

methods and techniques we presented here, many di-

rections for future research exist. To name a few, we

cite the interest function described used in our inves-

tigation, which is only based on criteria related to the

structure of the graph; a relevant extension is to in-

troduce attribute-based criteria as well. In addition,

it is worthwhile to further analyze how sequences of

interactions should be interpreted for real-time ma-

chine learning algorithms. Indeed, a single action on

a graph may not necessarily reﬂect user intention. In

other terms, certain goals of the user may require a se-

quence of actions to be met. The challenge is then not

only to deﬁne a model capable of modeling prefer-

ences on such sequences, but ultimately also to detect

or interpret them in what is otherwise nothing but a

long list of interaction events.

6 CONCLUSIONS

In this paper we presented a new tool developed to

understand and improve user experience in the explo-

ration of graphs. The tool empowers users to con-

trol the granularity of the graph, either by direct inter-

action (collapsing/expanding clusters) or via a slider

that automatically computes a clustered graph of the

desired size. Moreover, we explored the use of learn-

ing algorithms to capture graph exploration prefer-

ences based on a history of user interactions. The

learned parameters are then used to modify the action

of the slider in view of mimicking the natural interac-

tion/exploration behavior of the user.

Our work is a ﬁrst step toward the use of machine

learning algorithms to deﬁne the actions associated

to simple interactive controls, like sliders, for the ex-

ploration of complex data structures like graphs. We

show that such an approach is technically feasible and

encourage further research in this direction in view

of bringing graphs and graph analysis closer to users.

In general, visual analysis systems designed to learn

from user interactions with the goal of enhancing user

experience deserve more attention and have many fas-

cinating research challenges to offer.

ACKNOWLEDGEMENTS

The authors gratefully acknowledge the valuable

comments of the anonymous referees, which helped

to improve the initial version of this manuscript.

REFERENCES

Aleskerov, F., Bouyssou, D., and Monjardet, B. (2007).

Utility Maximization, Choice and Preference.

Springer, Berlin.

Archanbault, D., Munzner, T., and Auber, D. (2002). Tug-

graph: path-preserving hierarchies for browsing prox-

imity and paths in graphs. In Proceedings of IEEE

Paciﬁc Visualization Symposium, page 113120.

Archanbault, D., Munzner, T., and Auber, D. (2008).

Grouseﬂocks: Steerable exploration of graph hierar-

chy space. IEEE Transactions on Visualization and

Computer Graphics, 14:900913.

Blondel, V. D., Guillaume, J.-L., Lambiotte, R., and Lefeb-

vre, E. (2008). Fast unfolding of communities in large

networks. Journal of Statistical Mechanics: Theory

and Experiment, 2008:P10008.

Bous, G., Fortemps, P., Glineur, F., and Pirlot, M. (2010).

ACUTA: A novel method for eliciting additive value

functions on the basis of holistic preference state-

ments. European Journal of Operational Research,

206:435–444.

Card, S., MacKinlay, J., and Schneiderman, B. (1999).

Readings in Information Visualization: Using Vision

to Think. Morgan Kaufmann, Burlington.

Duch, J. and Arenas, A. (2005). Community detection in

complex networks using extremal optimization. Phys-

ical Review E, 72:027104.

Elmqvist, N., Do, T.-N., Goodell, H., Henry, N., and Fekete,

J.-D. (2008). Zame: Interactive large-scale graph vi-

sualization. In Proceedings of the IEEE Paciﬁc Visu-

alization Symposium, pages 215 –222.

Elmqvist, N. and Fekete, J.-D. (2010). Hierarchical aggre-

gation for information visualization: Overview, tech-

niques and design guidelines. IEEE Transactions on

Visualization and Computer Graphics, 16:439–454.

Fortunato, S. (2010). Community detection in graphs.

Physics Reports, 486:75 – 174.

Ghoniem, M., Fekete, J.-D., and Castagliola, P. (2005). On

the readability of graphs using node-link and matrix-

based representations: A controlled experiment and

statistical analysis. Information Visualization, 4:114–

135.

Real-timeIntelligentClusteringforGraphVisualization

479

Girvan, M. and Newman, M. E. J. (2002). Com-

munity structure in social and biological networks.

Proceedings of the National Academy of Sciences,

99:78217826.

Heer, J. and Boyd, D. (2005). Vizster: visualizing online so-

cial networks. In Proceedings of the IEEE Symposium

on Information Visualization, page 3239.

Heer, J. and Perer, A. (2011). Orion: A system for model-

ing, transformation and visualization of multidimen-

sional heterogeneous networks. In Proceedings of the

2011 IEEE Conference on Visual Analytics Science

and Technology (VAST), pages 51 –60.

Henry, N. and Fekete, J.-D. (2006). Matrixexplorer: a

dual-representation system to explore social networks.

IEEE Transactions on Visualization and Computer

Graphics, 12:677–684.

Henry, N., Fekete, J.-D., and McGufﬁn, M. (2007). Node-

trix: a hybrid visualization of social networks. IEEE

Transactions on Visualization and Computer Graph-

ics, 13(6):1302 –1309.

Jacquet-Lagr`eze, E. and Siskos, Y. (1982). Assessing a set

of additive utility functions to multicriteria decision-

making: the UTA method. European Journal of Op-

erational Research, 10:151–164.

Keeney, R. L. and Raiffa, H. (1976). Decisions with multi-

ple objectives: Preferences and value tradeoffs. Wi-

ley, New York.

Kernighan, B. W. and Lin, S. (1970). An efﬁcient heuristic

procedure for partitioning graphs. Bell System Techni-

cal Journal, 49:291307.

Latapy, M., Magnien, C., and Vecchio, N. D. (2008). Basic

notions for the analysis of large two-mode networks.

Social Networks, 30:31 – 48.

Liu, Z., Navathe, S., and Stasko, J. (2011). Network-based

visual analysis of tabular data. In Proceedings of the

2011 IEEE Conference on Visual Analytics Science

and Technology (VAST), pages 41 –50.

Mehta, G. B. (1998). Preference and utility. In Barber`a, S.,

Hammond, P. J., and Seidl, C., editors, Handbook of

Utility Theory, volume 1, pages 1–47. Kluwer, Dor-

drecht.

Newman, M. (2004). Detecting community structure in net-

works. The European Physical Journal B - Condensed

Matter and Complex Systems, 38:321–330.

Newman, M. E. J. and Girvan, M. (2004). Finding and eval-

uating community structure in networks. Physical Re-

view E, 69:026113.

Radicchi, F., Castellano, C., Cecconi, F., Loreto, V., and

Parisi, D. (2004). Deﬁning and identifying com-

munities in networks. Proceedings of the National

Academy of Sciences, 101:2658–2663.

Shneiderman, B. and Aris, A. (2006). Network visualiza-

tion by semantic substrates. IEEE Transactions on Vi-

sualization and Computer Graphics, 12:733 –740.

Siskos, Y., Grigoroudis, E., and Matsatsinis, N. (2005).

UTA methods. In Figueira, J., Greco, S., and Ehrgott,

M., editors, Multiple Criteria Decision Analysis:

State of the Art Surveys, pages 297–344. Springer,

Berlin.

Siskos, Y. and Yannacopoulos, D. (1985). UTASTAR: an

ordinal regression method for building additive value

functions. Investigaao Operacional, 5:39–53.

Suaris, P. R. and Kedem, G. (1988). An algorithm

for quadrisection and its application to standard cell

placement. IEEE Transactions on Circuits and Sys-

tems, 35:294–303.

van Ham, F. and Perer, A. (2009). Search, show context, ex-

pand on demand: supporting large graph exploration

with degree-of-interest. IEEE Transactions on Visual-

ization and Computer Graphics, 15:953960.

von Landesberger, T., Kuijper, A., Schreck, T., Kohlham-

mer, J., van Wijk, J. J., Fekete, J.-D., and Fellner,

D. W. (2011). Visual analysis of large graphs: State-

of-the-art and future research challenges. Computer

Graphics Forum, 30:17191749.

Wattenberg, M. (2006). Visual exploration of multivariate

graphs. In Proceedings of the SIGCHI conference on

Human Factors in computing systems, pages 811–819.

Wu, A. Y., Garland, M., and Han, J. (2004). Mining scale-

free networks using geodesic clustering. In Proceed-

ings of the tenth ACM SIGKDD international confer-

ence on Knowledge discovery and data mining, pages

719–724.

IVAPP2013-InternationalConferenceonInformationVisualizationTheoryandApplications

480