from which 2 are randomly selected and presented in
the table.
In order to explain the difference that appears us-
ing the common approach and the two versions of the
DSC, one example where the result from the common
approach and both versions of the DSC differs is ran-
domly selected and presented in detail. The first com-
bination from the Table 1 is selected, which is a com-
parison between the algorithms GP5-CMAES, Sifeg,
and BSif. For the analysis, the Friedman test was se-
lected. In the case of the common approach, the null
hypothesis is rejected, while when the DSC approach
is used, the null hypothesis is not rejected. The rank-
ings obtained by the Friedman test using the common
approach with averages and both versions of the DSC
ranking scheme are presented in Table 2. Comparing
the rankings obtained by the common approach and
the DSC ranking scheme, the difference between the
rankings that appears by using them can be clearly ob-
served. To explain the difference, separate problems
are discussed in detail.
In Figure 2 the cumulative distributions (the step
functions) and the average values (the horizontal
lines) obtained from the multiple runs for different
functions of the three algorithms are presented. In
this figure details about the function, f
7
, are pre-
sented. The rankings obtained using the common ap-
proach with averages are 1.00, 2.00, and 3.00, and
they are different because all of them have different
averages. The rankings obtained using the two ver-
sions of DSC ranking scheme, by using KS test and
AD test, are 1.00, 2.50, and 2.50. The DSC ranking
scheme used the cumulative distributions in order to
assign the rankings of the algorithms. From the fig-
ure, one may assume that there is no significant dif-
ference between the cumulative distributions of Sifeg
and BSif, but they differ from the cumulative distribu-
tion of GP5-CMAES. This result is also obtained by
using the two-sample KS and AD test. The p-values
obtained for the pairs of algorithms are 0.00 (GP5-
CMAES, Sifeg), 0.00 (GP5-CMAES, BSif), and 0.07
(Sifeg, BSif), by using the KS test, while the p-values
obtained for the same pairs of algorithms by using the
AD test are 0.00 (GP5-CMAES, Sifeg), 0.00 (GP5-
CMAES, BSif), and 0.02 (Sifeg, BSif). Because mul-
tiple pairwise comparisons are made, these p-values
are further corrected by using the Bonfferoni correc-
tion. In this case, the transitivity of the matrix M
7
0
is
satisfied, so the set of all algorithms is split into two
disjoint sets {GP5-CMAES}, and {Sifeg, BSif}, and
the rankings are defined using Equations 4 and 5.
In Figure 2(c) the results for the function, f
21
,
are presented. The rankings obtained using the com-
mon approach with averages are 1.00, 2.00, and 3.00,
and they are different because all of them have differ-
ent average. The rankings obtained using the DSC
ranking scheme by using the KS test and AD test
are 2.00, 2.00, and 2.00. From the figure, it is not
clear if there is a significant difference between the
cumulative distributions of GP5-CMAES, Sifeg, and
BSif. To check this, the two-sample KS test and
AD test are used. The p-values obtained for the
pairs of algorithms are 0.38 (GP5-CMAES, Sifeg),
0.07 (GP5-CMAES, BSif), and 0.38 (Sifeg, BSif),
and 0.41 (GP5-CMAES, Sifeg), 0.02 (GP5-CMAES,
BSif), and 0.29 (Sifeg, BSif), respectively. Because
multiple pairwise comparisons are made, these p-
values are further corrected using the Bonfferoni cor-
rection. In this case, the transitivity of the matrix M
21
0
is satisfied, but the set of all algorithms is not split into
disjoint sets because all algorithms belong to one set,
{GP5-CMAES, Sifeg, BSif}.
In Figure 2(b), the results for the function, f
18
, are
presented. This example is interesting because both
versions of the DSC ranking scheme that use differ-
ent criteria for comparing distributions, the KS test
and AD test, give different results. For the function
f
18
, the rankings obtained by the common approach
are 1.00, 2.00, and 3.00. The rankings obtained by
the DSC ranking scheme with KS test are 1.00, 2.50,
and 2.50, while the AD test are 1.00, 2.00, and 3.00.
So the two different criteria used by the DSC rank-
ing scheme give different results. The p-values ob-
tained by using the KS test for the pairs of algorithms
are 0.00 (GP5-CMAES, Sifeg), 0.00 (GP5-CMAES,
BSif), and 0.03 (Sifeg, BSif). Because multiple pair-
wise comparisons are made, these p-values are further
corrected by using the Bonfferoni correction. In this
case, the transitivity of the matrix M
18
0
is satisfied, so
the set of all algorithms is split into two disjoint sets
{GP5-CMAES}, and {Sifeg, BSif}. The p-values ob-
tained by using the AD test for the pairs of algorithms
are 0.00 (GP5-CMAES, Sifeg), 0.00 (GP5-CMAES,
BSif), and 0.01 (Sifeg, BSif). Because multiple pair-
wise comparisons are made, these p-values are further
corrected using Bonfferoni correction. In this case,
the transitivity of the matrix M
18
0
is not satisfied, so
the algorithms obtain their rankings according to their
averages. So, the two different criteria give different
results. This result is important when we compare al-
gorithms on one problem (function), while it does not
influence the result when we are performing multiple-
problem analysis. Even more, when we are compar-
ing algorithms on one problem, it is better to use AD
test because it is more powerful and it can better de-
tect differences than the KS test when the distributions
vary in shift only, in scale only, in symmetry only, or
that have the same mean and standard deviation but