work origin and destination represents, respectively,
the place of birth and place of death of notable people
in the history. The directionality is represented with
color interpolation (red-blue for origin-destination).
One of the concerns of origin-destination visualiza-
tion is the representation of directionality of edges,
particularly when dealing with bidirectional flows.
Recent work (Holten and van Wijk, 2009) has pre-
sented six different ways of edge directionality repre-
sentation (tapered, dark-to-light, light-to-dark, arrow,
curved, and green-to-red) and compared the reading
performance of each technique. This study suggests
that the tapered method is advantageous in most situa-
tions, unlike curved representation which is the worst
of all cases. In any cases, the representation of bidi-
rectional data is still challenging, due to additional vi-
sual information added to each edge.
Direct visualization of large volumes of OD data
generates high degrees of visual clutter. In these
cases a reduction strategy known as edge bundling
can be applied, which is characterized not only by
graph simplification, but also by the revelation of
principal streams of flow. Holten introduced edge
bundling for compound graphs. His work consisted of
routing edges through a hierarchical layout using B-
Splines (Holten, 2006). There are several variations
of edge bundling starting with force-directed (Holten
and Van Wijk, 2009) up to sophisticated kernel den-
sity estimation strategies (Hurter et al., 2012). Gener-
ally, edge bundling consists of drawing similar edges
on the same path, i.e. edges that are related in geom-
etry and direction are routed along the same path.
In the geographic context OD representation as a
rule refers to the flow visualization (also known as
flow maps), which is deeply rooted in the history of
information visualization. Early examples, such as
wine exports from France, produced by Minard (Tufte
and Graves-Morris, 1983, page: 25), represents quan-
tity as well as direction of wine exports encoded by
the thickness of the corresponding edges, which dis-
join from the parent edge. The work of Phan et al.
(Phan et al., 2005) describes an automated approach
to the generation of flow maps using a hierarchical
clustering algorithm, given a series of nodes and flow
data. Generally, in geographic context flow visual-
ization refers to the representation of amounts of any
type of variables that move from one location to an-
other (e.g. migrations, transportation of goods, etc.).
The advantage of flow maps is that they reduce visual
clutter by merging edges. However, they present a
series of of problems, such as the perception of direc-
tionality of flow, when large amounts of bidirectional
OD data is considered.
3 DATA DESCRIPTION
Our dataset consists of 278GB of information about
customer purchases in 729 supermarkets and hyper-
markets in Portugal in a time span of 24 months
(from May, 2012 until April, 2014), including the
geo localization of 682 supermarkets, as well as the
regions of the country they belong to. The dataset
comprises approximately 2.86 billions of transactions
where each transaction has the following attributes:
customer card id, amount spent, product designation,
quantity of the purchased products and the date and
time of the transaction. It is important to note that
several individuals may hold the same customer card
with an unique client id (e.g. members of a family).
The dataset has a total of 6.6 Million unique card ids.
Before the extraction of transitions among super-
markets we first compute their geographical clusters.
The reason for that is because the majority of super-
markets belong to shopping centers which are consid-
ered as a unique geographical location. In this case
the DBSCAN algorithm (Ester et al., 1996) was ap-
plied with the parameters of 0 for K and 0.01 for ep-
silon. As a result 304 clusters were obtained, where
the extracted locations are the centroids of the clusters
of supermarkets (each centroid will be referenced as
a single supermarket for the sake of simplicity).
With the clusters computed we proceed to ex-
tract transitions as follows: first the data is aggre-
gated by day (24 hours); then for each client the
sequence of transitions is computed by excluding
subsequences of repeated places. For example, let
X = (A,A,B,B,B,C) be the sequence of supermarkets
where a client made transactions. So, the transition
sequence would be X
tr
= (t
1
(A,B),t
2
(B,C)).
4 ARC REPRESENTATION
Our first approach was based on direct representation
of the data. The transition sequence is directly en-
coded by edges, that represents the link between the
origin-destination supermarket, as well as the num-
ber of clients that transitioned. The directionality of
the edge is represented based on the combination of
taped and curved methods, due to the bidirectionally
of data. Since arc-based approach usually do not rep-
resent directionality, the thickness of arcs in our visu-
alization increase as they approach their destination,
resembling the trajectory of a projectile or a comet.
The asymmetrical curve gives a more natural sense
of direction. Arcs where also used because they re-
duce visual clutter when compared with straight lines
methods.
ArcandSwarm-basedRepresentationsofCustomer'sFlowsamongSupermarkets
301