has blue-green-red sequence from the low end to the
high end has a positive correlation with the weight
axis. A glance at the display reveals that multiple
variables including base, length, width, engine size,
horsepower, and price show positive correlations
with weight. For instance, the heavy cars tend to be
costly and large. Any axis which carries the red-
green-blue sequence from the bottom to top is
negatively correlated with the weight axis. The
variables city mpg and highway mpg are related to
weight negatively. As expected, heavy cars tend to
give low mileage per gallon. The colourful axis
stripes for bore, stroke, ratio, and rpm are not
comparable with the reference or any other axis
stripe. So, they appear to be random variables in the
automobile dataset. We can also compare an axis
with any other non-reference axis (Figure 6). For
instance, the axis stripes for city mpg and highway
mpg show very similar color pattern, confirming
their strong positive relationship. Both mileage
variables are negatively correlated with price and
horsepower. Thus, we are not limited to compare
only two specific axes at a time. We can compare
many more axes at a glance to the parallel
coordinates display.
Our approach also helps assess the strength of
the correlation. In order to verify the visually
detected correlations in the automobile data, we
calculate the Pearson correlation coefficients for all
axis pairs. The coefficients calculated with respect to
weight are 0.78, 0.88, 0.87, 0.86, 0.76 and 0.84 for
base, length, width, engine size, horsepower and
price, respectively, thus confirming our finding of
positive correlations. The variables city mpg and
highway mpg take the coefficients of -0.78 and -
0.82, respectively, with respect to weight
(confirming observed negative correlation). The
coefficient is 0.97 between two mpg variables. This
strong positive correlation is consistent with the
color similarity between the two axis stripes and
nearly parallel data lines connecting them (Figure 6).
However, a substantial color mix or overlap on
the non-reference axes means that correlations are
either weak or random. It is difficult to correctly
detect such color mix-up because the data point
drawn last determines the final color at a particular
location. For instance, the price axis shows the blue
segment at the lower end, which is squeezed a lot
and appears to have some mix-up with green
segment when compared to the reference weight
axis. This means that most price values are low
(blue data points) and some of them are overwritten
by the green data points. In order to reveal such
overlapping, we blend the colors of two or three data
values mapping to the same location on the axis
stripe (Figure 7). For the three-color reference
sequence considered here, we now see more colors
on the non-reference axes. The lower end of price
axis appears in blue (corresponding to low weight
data points) and then changes to cyan, representing
overlap between low (blue) and mid (color) weight
data points. The price axis stripe shows a nearly
blue-cyan-green-yellow-red sequence (expect some
scattered colors out of the sequence). The reference
color pattern is mostly followed by the price axis
except some overlap occurring between successive
color sections. The inference of correlations thus
remains mostly valid.
Figure 7: Color-mapped axis stripes with color blending.
Cyan, yellow and magenta colors appear for overlapping
data points of different reference colors.
5 COMBINING AXIS STRIPES
AND DISTRIBUTION PLOTS
In the color-mapped axis enhancement scheme, we
focus on how the data items from multiple subsets
defined with respect to the reference variable appear
on all numerical axes. Each data item is tagged with
its subset color. Since many data points may fall into
the same location, displaying the color of the last data
item or blending the colors of all belonging data items
does not show information on data frequency or
density for that location. It is important to explore
how these data subsets are scattered along each axis
and how this distribution influences the assessment of
inter-dimensional relationships. For this, we combine
the histogram and colourful axes layout.
We first make the axis stripes
wide in order
to accommodate histogram bars. Each bar is divided
into the same number of sections (with the same color
sequence assigned) as done for the reference axis.
Thus, we have color-stacked bars within the axis
stripe (Figure 8). For the automobile example, each
Effective Visual Exploration of Variables and Relationships in Parallel Coordinates Layout
247