workflow, 2) the captured performance data itself can
be extremely large, heterogeneous, and collected in a
streaming way, and 3) the cross-platform framework
that handles data management to consume data scale,
provides data aggregation, and supports real-time vi-
sualization, exploration and interaction.
In this paper, on the way to satisfy these require-
ments, we present a proof-of-conceptframework with
data acquisition and handling, as well as a few visual
representation designs for workflows. Our framework
is built with offline analysis use cases, but our design
does not rely on the offline mechanism, thus is exten-
sible to online. Our contributions are as following:
• An improvement on TAU instrumentation method
to capture parallel workflows.
• A web-based framework connecting different
types of performance data into one linked display
with a variety of visual representations.
• A few level-of-detail visual methods enhancing
the data exploration.
The remainder of this paper is structured as fol-
lows: Section 2 summarizes related works, Section 3
discusses our use case, Section 4 introduces the pro-
posed framework, and Section 5 concludes the work.
2 RELATED WORKS
The general purpose of performance evaluation in-
cludes: the global comprehension, problem detection
and diagnosis (Isaacs et al., 2014). Performance visu-
alization is therefore designated to fulfill these goals.
At a minimum, the design of the visualization must be
able to show the big picture of the program execution.
When an interesting area is targeted, users must nar-
row down the region and mine more detailed informa-
tion. Moreover, comparative study looking for corre-
lation or dependency must be supported. For problem
detection, abnormal behaviors can be highlighted in
ways that allow users to identify them easily.
Current visualization works can be grouped by
their applications in four contexts: hardware, soft-
ware, tasks and application (Isaacs et al., 2014).
Specifically, it includes a few types of data: 1) an
event table summarizing the start and end time of all
function calls, 2) the message passing among cores,
3) profiling of certain metrics spent in each part of the
code on each computing core, and 4) the call paths.
Therefore, we only summarize the existing works that
are commonly applied to our data types. Other works
such as the visualization for network, system mem-
ory usage, or system logs for multicore clusters can
be found in (Isaacs et al., 2014).
Figure 1: Trace timeline visualization examples: (a) Vampir
timeline showing the execution on all processes (Kn¨upfer
et al., 2008), (b) Vampir timeline for one process with de-
tailed function entry and exit (Kn¨upfer et al., 2008), (c) the
timeline of Jumpshot, and (d) the advanced visualization for
focused thread comparison (Karran et al., 2013).
2.1 Trace Visualization
Tracing measurement libraries record a sequence of
timestamped events such as the entry and exit of func-
tion calls or a region of code, the message passing
among threads, and job initiation of an entire run. A
common practice is to assign the horizontal axis to the
time variable, and the vertical axis to the computation
processes or threads. Different approaches are usually
variations of Gantt charts.
Vampir (Kn¨upfer et al., 2008) and Jump-
shot (Jumpshot, 2014) provide two examples of this
kind of visualization. Generally, overview of the
whole time period is first plotted. Then users can
select interested area to reveal more detailed events
happened during the selected period. Different func-
tions or regions of code are colorized, and the black
(yellow for Jumpshot) lines indicate message passing
such as shown in Fig. 1. In addition, advanced visual-
ization tools such as SyncTrace (Karran et al., 2013)
provide a focus view showing multiple threads as sec-
tors of a circle. The relationships between threads
are shown with aggregated edges similar to chord di-
agram. Those tools can only handle small scale data.
2.2 Profile Visualization
Profiling libraries measure the percentage of the met-
ric e.g. time spent in each part of the code. Profile
does not typically include temporal information, but
can quickly identify key bottlenecks in a program.
Stacked bar charts, histogram, and advanced visual-
ization in 3D are commonly used to give a compar-
ative view of the percentage of time or other met-
ric spent for different functions. ParaProf (ParaProf,
2014) is one example of this kind of visualization
as shown in Fig. 2. It also supports the comparison
of certain function calls in different execution runs.