of statistics in fields of science.
When the hypotheses on which the statistical tests
are based are not verified, no statistical test can be
naively applied to the data in order to perform a quan-
titative analysis; one must then resort to a qualitative
approach to the problem. The method for data presen-
tation described in this paper is thus based on graphi-
cal representations of the data, especially in the forms
of histograms. The readers are thus expected to exert
their best judgement when comparing multiple, judi-
ciously presented such figures in order to draw the
appropriate conclusions regarding the performance of
the algorithms. This method also aims at conveying
as much information as both the usual average and
standard deviation table and the statistical test table
without making any assumption regarding the distri-
bution of the data, while occupying about the same
amount of space. Finally, it is believed to be readable
at a glance.
5 COMPARISONS TO OTHER
DATA PRESENTATIONS
To illustrate the effectiveness of our visualization
method we present a comparison of four evolution-
ary algorithms that was computed for an earlier
work (Weber et al., 2010). The data consists of four
stochastic optimization methods, with two baseline
algorithms (1 and 2) and two proposed improvements
(3 and 4). The tests consists of a set of ten typi-
cal functions, commonly used in the field, in 500 di-
mensions. To evaluate our visualisation method, we
present the same data in three formats: as the aver-
age and standard deviation in Table 4, as a statistical
comparison in Table 6, and in our preferred format in
Table 5. The first comparison is between the averages
in Table 4 and the histograms in Table 5. A cursory
comparison between Table 4 and Table 5 reveals that
the required print areas needed for both tables more
or less equal, leading to the conclusion that replacing
the numerical table with a graphical one is feasible
within the strict page limits imposed by many pub-
lishers. Moreover, the histograms table is composed
of self-sufficient tiles and can, unlike the numerical
table, be laid out more flexibly. The data can for ex-
ample be presented as a square table, as a long column
on the side of the page or even as separate blocks near
the explanatory text of the article.
One claim could however be made in favor of
average and standard deviation tables: they present
the numerical data precisely and in an absolute way,
which is not accomplished by the histogram repre-
sentation. This is naturally true, but what is the im-
portance of knowing the exact value of the average?
Reasonably, this level of precision could be necessary
only when making a comparative study but, as argued
before, averages and standard deviations are not suf-
ficient for this purpose. Since fitting all the numeri-
cal data in a printed article is infeasible and distract-
ing, the only reasonable recourse to rely on the repro-
ducibility of science and to re-compute the numbers
for the tests. Alternatively, one can publish the gath-
ered data in its entirety outside of the article.
To evaluate the work, the reader is instructed to
first study the Table 4. Casual study reveals, mostly
due the bold font, that Method 4 is likely to be the best
candidate. At this point we make a claim: there are
four functions for which this might not be the case.
How long does it take to see which ones they are?
This simple test clearly illustrates the fact that reading
this table is difficult.
In contrast we observe Table 5. We instantly see
that in many cases Method 4 has produced results
closer to the optimum than other methods, with the
closest competitor being Method 3. Method 2 seems
to be in general not competitive compared to the other
methods and Method 1 is in the competition but los-
ing. In four cases (functions 4, 6, 8 and 10), we
see significant overlap, which confirms the result of
the Mann-Whitney U test in Table 6, indicating that
for Functions 8 and 10, Method 4 is not perform-
ing significantly better than Method 1. The same
test indicates however that on Function 4, Method 4
is outperforming Method 1 whereas the distributions
are mostly overlapping. This might be caused be a
long tailed and skewed histogram for Method 4 which
causes Mann-Whitney-U test to give an counterintu-
itive result. These examples therefore illustrate the
fact that our visualisation effectively conveys at least
the same information as the Mann-Whitney U test, as
well as information complementary to the test and its
limits.
The visualisation shows several other points of in-
terest, that are not evident in either standard devia-
tion table or statistical test. Method 1 seems to have
a rather robust behaviour. Although it rarely com-
petes in the best solution quality, it seems to reliably
achieve a certain level of fitness, which is most evi-
dent in Functions 2, 4, and 8. Method 4 works the
opposite way, having a wide distribution and some-
times finding excellent results and yet at times failing
badly. When considering repeated experiments, there
is little use of running Method 1 again to improve the
result, but running number 4 several times could be
very beneficial. In some cases, some algorithms have
their data entirely in the “dump bin”. This is the au-
thor’s way of visually claiming that those algorithms
SPARKLINE HISTOGRAMS FOR COMPARING EVOLUTIONARY OPTIMIZATION METHODS
273