A Boltzmann Multivariate Estimation of Distribution Algorithm for
Continuous Optimization
Ignacio Segovia-Dom´ınguez, S. Ivvan Valdez and Arturo Hern´andez-Aguirre
Department of Computer Science, Center for Research in Mathematics, Guanajuato, Mexico
Keywords:
Boltzmann distribution, Estimation of Distribution Algorithm, Optimization.
Abstract:
This paper introduces an approach for continuous optimization using an Estimation of Distribution Algorithm
(EDA), based on the Boltzmann distribution. When using the objective function as energy function, the Boltz-
mann function favors the most promising regions, making the probability exponentially proportional to the
objective function. Using the Boltzmann distribution directly for sampling is not possible because it requires
the computation of the objective function values in the complete search space. This work presents an ap-
proximation to the Boltzmann function by a multivariate Normal distribution. Formulae for computing the
mean and covariance matrix are derived by minimizing the Kullback-Leibler divergence. The proposed EDA
is competitive and often superior to similar algorithms as it is shown by statistical results reported here.
1 INTRODUCTION
The Estimation of Distribution Algorithms
(M¨uehlenbein
1
et al., 1996) are optimization methods
based on estimating and sampling a probability
distribution. The aim is to favor the most promising
regions assigning them the highest probability values.
In fact, the most promising regions are unknown
and have to be discovered during the optimization
process. The main goal of the EDA is to pose the
probability mass around the optima. The strategy,
without loss of generality for a maximization pro-
cess, is to reinforce the sampling in regions with
the maximum objective function or fitness function
values, and disregard the regions with the minimum
values. The most common scheme for continuous
optimization, using EDAs, is to use a multivari-
ate or univariate Normal distribution (Larra˜naga,
2002; Larra˜naga et al., 2000; Dong and Yao, 2008;
Segovia-Dominguez et al., 2013). The parameters of
the Normal density are estimated by using maximum
likelihood estimators (MLEs) over the selected
set, which is determined by the truncation method;
usually half of the population with the worst objective
value is truncated. Nevertheless, these approaches
have shown a competitive performance, some evident
issues can be noticed in the strategy just mentioned:
The truncation selection hides the fitness land-
scape assigning to the selected individuals the
same importance in the parameter estimation. In
consequence, the search distribution parameters
are estimated as if the regions represented by the
selected set were equally good in fitness values.
It is a well-known issue that the variance in esti-
mation of distribution algorithms is often less than
required, hence, the MLE variance estimator is
not the most adequate for searching the optimum
(Shapiro, 2006; Grahl et al., 2007).
The Boltzmann distribution has been largely used
in optimization; in the Estimation of Distribution Al-
gorithms (EDAs) context researchers have proposed
different approaches such as the BEDA (M¨uhlenbein,
2012; Mahnig and Muhlenbein, 2001; M¨uhlenbein
et al., 1999; M¨uhlenbein et al., 1999). This is a gen-
eral framework for Boltzmann distribution based es-
timation of distribution algorithms, where practical
EDAs have been derived from. For example, the FDA
(Muhlenbein and Mahnig, 1999), the Yun peng et al.
proposal (Yunpeng et al., 2006) and Valdez et al. pro-
posal. (Valdez et al., 2013) demonstrate this. The
unifying characteristic of this approach is that they
intend to equip the stochastic search algorithm with
an engine which favors the most promising regions.
The better the objective function is in a region, the
more intense the sampling must be. The Boltzmann
distribution is used to achieve this purpose.
The Gibbs or Boltzmann distribution of a fitness
function g(x) is defined by:
p(x) :=
Z
X
exp(βg(x))
Z
(1)
251
Segovia-Domínguez I., Valdez S. and Hernández-Aguirre A..
A Boltzmann Multivariate Estimation of Distribution Algorithm for Continuous Optimization.
DOI: 10.5220/0005079902510258
In Proceedings of the International Conference on Evolutionary Computation Theory and Applications (ECTA-2014), pages 251-258
ISBN: 978-989-758-052-9
Copyright
c
2014 SCITEPRESS (Science and Technology Publications, Lda.)
As can be noticed in Equation (1) the objective
function is used as an energy function directly. In
practical approaches, the Boltzmann distribution can-
not be used directly for sampling because it is nec-
essary to know the objective function in the whole
domain. That is the reason why the parameters of
a density function are computed by minimizing a
measure between the parametric distribution and the
Boltzmann distribution, for instance, the Kullback-
Leibler divergence (Ochoa, 2010; Yunpeng et al.,
2006; Valdez et al., 2013).
There are remarkable challenges to consider for
the designing of EDAs based on the Boltzmann dis-
tribution:
To choose an adequate β parameter in Equation
(1). Usually β depends on the time or is dy-
namic during the optimization process. The pro-
cess which controls the β updating each gener-
ation is called the annealing schedule. The an-
nealing schedule can be used to manage the ex-
ploration and convergence of the algorithm.
To derive robust parameter estimators for the ap-
proximated distribution. Some approaches (Hu
et al., 2012; Yunpeng et al., 2006) have derived
formulae for estimating parameters of a distribu-
tion which approximate the Boltzmann density, by
weighting the population or selected set by expo-
nential functions, similar to Equation (1). Even
though competitive results are obtained, the pro-
posals often suffer from premature convergence
because the exponential function drastically leads
the probability mass to suboptimal positions. This
behavior can be avoided by manipulating the β
value, but it is not simple to determine how to
do it, as the second option is to obtain formulae
which do not impact the estimators as drastically.
The last two issues are also related with the men-
tioned variance reduction which is a common is-
sue in EDAs (Shapiro, 2006).
Our proposal intends to tackle the challenges just
mentioned. Accordingly, this paper presents a novel
algorithm, based on the approximation of the Boltz-
mann function by a Normal multivariate distribution,
which introduces the following features:
Two proposals of annealing schedules to update
the β value.
Formulae which are robust or at least are not im-
pacted as drastically as the exponential function
used in other approaches.
The novel annealing schedules tackle the variance
reduction problem; hence, these are mechanisms
to avoid the premature convergence of the algo-
rithm.
The organization of the paper is as follows: Sec-
tion 2 presents the derivationof the formulae for com-
puting the parameters of the Normal multivariate dis-
tribution, Section 3 introduces the Boltzmann Estima-
tion of Multivariate Normal Algorithm (BEMNA), as
well as two annealing schedules used in it. Section
4 is devoted to testing the proposed EDA on well-
known test functions. Also, a comparison to another
algorithm from literature is performed. Finally, Sec-
tion 5 provides some concluding remarks.
2 APPROXIMATING THE
BOLTZMANN DISTRIBUTION
BY THE NORMAL
MULTIVARIATE
DISTRIBUTION
This section introduces the formulae to estimate the
mean vector and covariance matrix, it is to say~µ
and
Σ
, of a Normal multivariate density which approx-
imate the multivariate Boltzmann P
x
density, given
a set of samples ~x
(1)
,~x
(2)
,...,~x
(N)
which are observa-
tions of a random vector
~
X. Let
~
X be a random vector
such that
~
X Q
x
, where Q
x
= Q(x,µ,Σ) is the multi-
variate Normal density as shown in Equation (2). The
corresponding Boltzmann density is in Equation (3).
Q
x
=
1
p
(2π)
d
|Σ|
exp
1
2
(~x~µ)
t
Σ
1
(~x~µ)
(2)
P
x
=
exp(βg(~x))
Z
(3)
The procedure for finding the parameters of Q
x
which best approximate P
x
consist in minimizing
a measure of dissimilarity between density func-
tions; similar to previous works (Yunpeng et al.,
2006) (Valdez et al., 2013). Here, the Kullback-
Leibler Divergence presented in Equation (4), K
QP
=
D
KL
(Q
x
||P
x
), is used as statistical measure between
probability distributions.
K
QP
=
Z
Q
x
log
Q
x
P
x
d~x (4)
The minimization of K
QP
for finding the optimal
parameters [~µ
,Σ
] can be stated as shown in Eq. (5).
[~µ, Σ] = argmin{K
QP
} (5)
Notice that K
QP
can be rewrite as
ECTA2014-InternationalConferenceonEvolutionaryComputationTheoryandApplications
252
K
QP
=
Z
Q
x
logQ
x
d~x
Z
Q
x
logP
x
d~x
= H(Q
x
)
Z
Q
x
logP
x
d~x
=
1
2
log((2πe)
d
|Σ|)
Z
Q
x
logP
x
d~x,
(6)
where the term H(Q
x
) means the entropy of the mul-
tivariate Normal density (Cover and Thomas, 2006).
In order to find the parameters which minimize the
Kulback-Leibler Divergence, the partial derivatives
are calculated; Equations (7) and (8).
δK
QP
δ~µ
=
Z
δQ
x
δ~µ
logP
x
d~x
=
Z
Q
x
[Σ
1
(~x~µ)]logP
x
d~x
(7)
δK
QP
δΣ
=
1
2
δlog(|Σ|)
δΣ
Z
δQ
x
δΣ
logP
x
d~x
=
1
2
Z
Q
x
[Σ
1
(~x~µ)(~x~µ)
t
Σ
1
]logP
x
d~x
+
1
2
Z
Q
x
Σ
1
logP
x
d~x
1
2
Σ
1
(8)
The optimal estimates for the mean vector and co-
variance matrix are obtained by making the deriva-
tives equal to 0, as in Equations (9) and (10), and
solving for~µ and Σ respectively.
0 =
δK
QP
δ~µ
0 =~µ
Z
Q
x
logP
x
d~x
Z
Q
x
~xlogP
x
d~x
0 =~µβE
Q
[g(
~
X)] ~µlogZ E[g(
~
X)
~
X]β + E[
~
X]logZ
~µ =
E
Q
[g(
~
X)
~
X]
E
Q
[g(
~
X)]
(9)
The following equivalences were used to obtain
Equations (9) and (10):
logP
x
= βg(~x) logZ,
R
Q
x
~xd~x = E
Q
[
~
X],
R
Q
x
(~x ~µ)(~x ~µ)
t
d~x =
E
Q
[(
~
X ~µ)(
~
X ~µ)
t
], and other similar equations.
As
~
X Q
x
then E
Q
[
~
X] =~µ and E
Q
[(
~
X ~µ)(
~
X
~µ)
t
] = Σ.
0 =
δK
QP
δΣ
0 =
Z
Q
x
(~x~µ)(~x~µ)
t
logP
x
d~x
Z
Q
x
logP
x
d~xΣ+ Σ
Σ =
E
Q
[g(
~
X)(
~
X ~µ)(
~
X ~µ)
t
]
E
Q
[g(
~
X)] 1/β
(10)
Finally, for estimating the parameters using the
observations ~x
(1)
,~x
(2)
,...,~x
(N)
of the random vari-
able
~
X, a numerical stochastic approximation by the
Monte Carlo method is computed, as shown in Equa-
tions (11) and (12). These two equations are the esti-
mators that approximate the parameters of the search
distribution.
~µ
=
N
i=1
g(~x
(i)
)~x
(i)
N
i=1
g(~x
(i)
)
(11)
Σ
= r
e
·
N
i=1
g(~x
(i)
)(~x
(i)
~µ)(~x
(i)
~µ)
t
(12)
where
r
e
=
N
i=1
g(~x
(i)
)
N
β
!
1
(13)
2.1 A Note About the derived Formulae
In Equations (11) and (12) the estimators use weights
defined by
g(~x
(i)
)
N
i=1
g(~x
(i)
)
. In other words, the weighted es-
timators are computed by using weights proportional
to the objective function value of each individual in
the selected set. In contrast, with similar approaches
(Hu et al., 2012; Yunpeng et al., 2006), there are some
advantages of these derivations:
A proportional weight of the estimators avoids
drastic changes when the individuals considerably
differ in the objective function value. It is to
say that if a new individual with a large objec-
tive value is sampled, the exponential weights can
concentrate the probability mass around this sin-
gle individual, leading the algorithm to premature
convergence. This is less desirable when using
proportional weights.
A second advantageis that the minimum variance,
which is bounded by a β = , is not 0 for our ap-
proach, which is a significant advantage consid-
ering that naturally EDAs suffer from premature
convergence and variance reduction (Hu et al.,
2012; Yunpeng et al., 2006).
ABoltzmannMultivariateEstimationofDistributionAlgorithmforContinuousOptimization
253
The energy function g(~x) must be positive or equal
to zero in the domain. However, the objective func-
tion F (~x) might be negative, considering that this
is a maximization/minimization problem. In order
to construct a valid energy function for the contin-
uous minimization problem, throughout this paper
the g(~x
i
) value is computed as g(~x
i
) = F (~x
i
)
min(F (~x
i
)) + 10
12
.
2.2 Annealing Schedule 1
As seen in Equations (11), (12) and (13), the β
value only affects the covariance matrix computa-
tion. The grade of impact of β over the covari-
ance is highly related with
N
i=1
g(~x
(i)
). Accordingly,
N/β <
N
i=1
g(~x
(i)
) must remain in order to maintain
a positive variance in the diagonal of the covariance
matrix, on the other hand if N/β <<
N
i=1
g(~x
(i)
) then
its effect is diminished. As a consequence, the es-
timator reaches the minimum variance when β
because N/β 0. An interesting remark about this
last setting is that the Normal distribution with such a
minimum variance is not similar to a Dirac δ, while
the corresponding Boltzmann distribution actually is.
Considering the arguments stated above, assume a β
value as shown in Equation (14).
β = N/((1 γ)
N
i=1
g(~x
(i)
)), (14)
where 0 < γ < 1 in order to fulfill the requirements
discussed above. Hence, Equation (12) is rewritten as
Equation (15).
Σ
= α
N
i=1
g(~x
(i)
)(~x
(i)
~µ)(~x
(i)
~µ)
t
N
i=1
g(~x
(i)
)
(15)
For α = 1/γ, recalling that 0 < γ < 1, in conse-
quence 1 < α < . The first schedule proposed is to
set α according historical improvements as follows:
For the minimization case:
if (F
(t)
best
< F
(t1)
best
) α = 1.1α
else α = 0.9α
if (α > 2) α = 2
if (α < 1) α = 1
This schedule increases the covariance matrix val-
ues if an improvement in the objective function of
the elite individual F
best
is detected. Otherwise, the
covariance matrix at least gets its minimum values
(β = ). On the other hand, we prevent having scal-
ing factors greater than 0 in order to control the ex-
ploration. The reader can notice that α = 1, which is
equivalent to β = , indeed it is a weighted estimator
of the covariancematrix using weights proportionalto
the objective value. Hence, this captures the structure
of the population favoring the most promising solu-
tions.
2.3 Annealing Schedule 2
The second annealing schedule uses α = 1/γ, where
γ will be modified in a linear way with the improve-
ments. This shows a difference versus the previous
proposal because the first annealing schedule in-
creases/decreases the α value in an exponential way.
Another difference between the two schedules is that
the first schedule uses the improvements over the
best objective value found so far, while this second
schedule verifies the improvements over the selected
set. Notice that updating β in Equation (13) is equiv-
alent to updating α in Equation (15) and equivalent
to updating γ considering that α = 1/γ. According
to the arguments in the Subsection above, 0 < γ 1,
γ = 1 corresponds to the minimum variance (β = ).
The updating of γ proceeds as follows:
Let M
s
be the number of selected individuals that
are preserved from the current generation to the
next one, throughout this paper known as the sur-
vivor individuals. Please note that the maximum
number of survivor individuals is the sample size
S
c
.
Define a number of partitions of the interval [0,1]
as n
p
. For our experiments n
p
= 30.
If M
s
/S
c
> 0.5 then γ = γ 1/n
p
, otherwise γ =
γ+ 1/n
p
.
If γ < 1/n
p
then γ = 1/n
p
. If γ > 1 then γ = 1.
3 THE BOLTZMANN
ESTIMATION OF
MUTIVARIATE NORMAL
ALGORITHM
The Boltzmann Estimation of Normal Multivariate
Algorithm (BEMNA) is presented in Figure 1. The
BEMNA starts with a random population, then it is
evaluated and half of the population with the best indi-
viduals are selected. The selected set objective func-
tion is used to compute the weights for the parameter
computation,~µ and Σ then are computed using the se-
lected set variable values and the weights. Finally,
using the computed parameters a set of individuals is
simulated. From the second generation in advance the
new selected set is computed by using the current se-
lected set and the simulated one.
ECTA2014-InternationalConferenceonEvolutionaryComputationTheoryandApplications
254
BEMNA
1. t 0. Initialize N
pop
, N
sel
, S
c
and α/γ according to
the annealing schedule.
2. P
(0)
Uniformly distributed population, where
P
(0)
= {~x
1
,...,~x
N
sel
}.
3. Evaluate F (~x
i
) and g(~x
i
).
4. Compute the estimates ~µ
and Σ
of P
(t)
, by equa-
tions (11) and (15).
5. P
(t)
S
Sampling S
c
individuals from Q(x;~µ
,Σ
).
6. Evaluate F (~x
j
) and g(~x
j
) of P
(t)
S
.
7. Update α/γ according to the annealing schedule.
8. P
(t+1)
The best N
sel
individuals from P
(t)
P
(t)
S
.
9. t t + 1.
10. If stopping criterion is not reached return to step 4.
11. Return the elite individual in P
(t)
as the best approx-
imation to the optimum.
Figure 1: Pseudo-code of the BEMNA.
3.1 Detailed Steps for the First
Annealing Schedule
The recommended population size for this schedule is
of N
pop
= 15d where d is the number of dimensions.
Initial α = 1. The number of selected individuals in
Steps 2 and 8 is N
sel
= N
pop
/2. The parameter esti-
mation for the multivariate Normal in Step 4 is done
as explained in Section 2. In Step 5 the new sample is
actually P
(t)
S
(which is different in the second sched-
ule) of size N
pop
. To update the α value the annealing
schedule is performed as explained in Subsection 2.2.
Please remember that for each α there is a correspond-
ing β and vice versa.
A possible issue well known in Normal multivari-
ate EDAs, is that the covariance matrix could present
negative eigenvalues, due to numerical errors (Dong
and Yao, 2008). In such a case we apply the follow-
ing repairing scheme:
Let L be the matrix of eigenvectors of Σ by
columns, and Λ a diagonal matrix with the corre-
sponding eigenvalues, in decreasing order, and d the
number of dimensions.
while Λ
d,d
< 0
λ = Λ
d,d
For i = 1..d
Λ
i,i
= Λ
i,i
+ λ
Σ = LΛL
t
Decompose Σ in L and Λ
The repairing method is not utilized most of the
time, and it is quite rare that it performs more than 1
iteration to fix the covariance matrix.
3.2 Detailed Steps for the Second
Annealing Schedule
For the second annealing schedule the population size
is N
pop
= (d + 3)(1+d
0.7
), while the sample size is
S
c
= 2(1 + d
0.7
); taken from an empirical test. For
this schedule the γ parameter is updated. Remember
that there is a direct relation among β, α and γ; one of
them can define the others. The initial gamma value
is γ
0
= 0.5 1/n
p
. In this schedule the number of se-
lected individuals is the same as the population size
N
sel
= N
pop
. In the first generation, the complete pop-
ulation becomes the selected set, as we do not have
more individuals. In Step 5 a new sample P
(t)
S
of size
S
c
is simulated. In Step 7 the annealing schedule is
performed as explained in Subsection 2.3 for updat-
ing γ. Finally in Step 8, a new selected set P
(t+1)
of
size N
sel
is obtained.
Table 1: Test problems (Larra˜naga, 2002; Valdez et al.,
2013). All of them are minimization problems. For ap-
plying the BEMNA these are converted to maximization
and translated to positive as follows: g(~x) = F (~x)
min(F (~x)) + 1 × 10
12
. Where F (~x) is the objective
function and g(~x) is the energy function used in Figure 1.
The minimum fitness value of all problems is 0 except for
F
12
where F
12
= d(d + 4)(d 1)/6.
F Name Domain
F
1
Sphere [10, 5]
d
F
2
Tablet [10, 5]
d
F
3
Ellipsoid [10, 5]
d
F
4
Cigar [10, 5]
d
F
5
Cigar Tablet [10, 5]
d
F
6
Different Powers [10, 5]
d
F
7
Parabolic Ridge [10, 5]
d
F
8
Sharp Ridge [10, 5]
d
F
9
Griewank [600, 600]
d
F
10
Ackley [32.768, 16.384]
d
F
11
Rosenbrock [10, 5]
d
F
12
Trid [d
2
, d
2
]
d
F
13
Brown [1, 4]
d
F
14
Levy Montalvo 1 [20, 10]
d
F
15
Levy Montalvo 2 [20, 10]
d
F
16
Levy 8 [20, 10]
d
F
17
Pinter [20, 10]
d
4 EXPERIMENTS
This section provides statistical results from well-
known test problems, see Table 1. The BEMNA, de-
scribed in Figure 1, is tested in three different ways:
1) using the annealing schedule 1, 2) using the anneal-
ing schedule 2, and 3) comparing our proposal against
a previous version with uncorrelated variables, taken
ABoltzmannMultivariateEstimationofDistributionAlgorithmforContinuousOptimization
255
from literature. The following subsections discuss
each of these experiments.
Table 2: Statistical results in 30 dimensions of 11 test prob-
lems, from 15 independent executions. First row: objective
values reached, and second row: number of function evalu-
ations. If the obtained fitness value reaches the desired pre-
cision, i.e. F F
< 1× 10
6
, then this cell is boldfaced.
(a) Schedule 1. (b) Schedule 2.
(a)
F Best Worst Mean Median SD
F
1
6.32e-7 9.97e-7 8.73e-7 8.69e-7 1.03e-7
7.77e4 9.92e4 8.59e4 8.53e4 6.01e3
F
2
7.23e-7 9.95e-7 8.98e-7 9.55e-7 9.57e-8
7.63e4 1.05e5 9.01e4 9.20e4 7.92e3
F
3
5.58e-7 9.95e-7 8.59e-7 9.13e-7 1.25e-7
1.08e5 1.24e5 1.18e5 1.20e5 5.29e3
F
4
3.96e-7 9.71e-7 8.28e-7 8.67e-7 1.61e-7
1.28e5 1.65e5 1.48e5 1.48e5 1.07e4
F
5
7.28e-7 9.87e-7 8.82e-7 8.99e-7 7.29e-8
1.20e5 1.44e5 1.33e5 1.33e5 7.24e3
F
6
2.84e-7 9.84e-7 7.51e-7 8.28e-7 2.04e-7
3.64e4 5.21e4 4.27e4 4.31e4 4.07e3
F
7
0.00 0.00 0.00 0.00 2.58e-7
1.06e5 1.30e5 1.17e5 1.14e5 6.74e3
F
8
0.00 0.00 0.00 0.00 0.00
1.81e5 2.20e5 2.00e5 2.01e5 1.03e4
F
9
7.10e-7 9.96e-7 8.46e-7 8.33e-7 9.64e-8
9.56e4 1.23e5 1.11e5 1.12e5 7.63e3
F
10
7.28e-7 9.80e-7 9.13e-7 9.39e-7 6.57e-8
1.40e5 1.81e5 1.58e5 1.58e5 1.00e4
F
11
8.97e-4 1.35e-1 2.71e-2 9.13e-3 3.94e-2
3.00e5 3.00e5 3.00e5 3.00e5 0.00
F
11
4.62e-7 9.95e-7 8.82e-7 9.53e-7 1.47e-7
3.19e5 3.60e5 3.41e5 3.38e5 1.34e4
(b)
F Best Worst Mean Median SD
F
1
7.44e-7 9.97e-7 9.31e-7 9.51e-7 7.81e-8
9.97e4 1.02e5 1.01e5 1.01e5 6.21e2
F
2
8.25e-7 9.88e-7 9.01e-7 8.96e-7 5.15e-8
7.15e4 7.34e4 7.26e4 7.28e4 6.61e2
F
3
8.14e-7 9.99e-7 9.32e-7 9.40e-7 5.44e-8
6.09e4 6.31e4 6.19e4 6.20e4 6.08e2
F
4
3.42e-7 9.89e-7 8.40e-7 8.73e-7 1.80e-7
2.78e4 2.95e4 2.88e4 2.87e4 5.30e2
F
5
8.68e-7 9.89e-7 9.26e-7 9.29e-7 3.45e-8
8.20e4 8.41e4 8.28e4 8.27e4 4.79e2
F
6
7.05e-7 9.85e-7 9.17e-7 9.35e-7 7.49e-8
9.55e4 9.72e4 9.64e4 9.64e4 5.84e2
F
7
6.72e-7 9.95e-7 8.87e-7 9.13e-7 1.09e-7
8.48e4 8.77e4 8.62e4 8.62e4 8.43e2
F
8
6.36e-7 9.98e-7 8.72e-7 9.27e-7 1.28e-7
2.20e5 2.59e5 2.43e5 2.43e5 1.06e4
F
9
7.86e-7 9.97e-7 8.92e-7 9.10e-7 7.30e-8
8.58e4 8.74e4 8.64e4 8.63e4 4.63e2
F
10
6.89e-7 9.94e-7 9.13e-7 9.56e-7 9.62e-8
5.66e4 5.85e4 5.77e4 5.77e4 5.45e2
F
11
8.43e-7 9.90e-7 9.54e-7 9.71e-7 4.44e-8
1.25e5 1.28e5 1.26e5 1.26e5 9.30e2
4.1 Testing the BEMNA with Both
Annealing Schedules
This subsection presents the results of the BEMNA
in 30 dimensions. This set of problems is taken from
the literature: 8 unimodal, 2 multimodal and the gen-
eralized Rosenbrock functions. The population size,
number of samples and similar issues are discussed in
subsections 2.2 and 2.3. In addition, the algorithm is
stopped when either: a maximum number of 3× 10
5
evaluations is reached or the precision to the optimum
value is F F
< 1× 10
6
, except for problem
F
11
where the maximum number is 6× 10
5
evaluations.
The results are presented in Table 2. The value 0
is reported if F F
< 1 × 10
16
. For each prob-
lem the statistics are presented in two rows. The first
one summarizes the fitness values reached whilst the
second one shows the needed number of evaluations.
If the obtained fitness value reaches the desired preci-
sion, i.e. F F
< 1× 10
6
, then this cell is bold-
faced. Here, F
is the optimum value.
4.1.1 Results Using the Annealing Schedule 1
The Table 2-(a) presents the results of 15 independent
executions for the set of 10 unimodal functions and
the Rosenbrock problem (F
11
). Most of the prob-
lems, except the Rosenbrock function, are success-
fully solved by the BEMNA with a precision less than
1e 6. Actually, the column of worst values shows a
perfect success rate, except for the Rosenbrock prob-
lem. Therefore for most of the problems the important
result is the number of function evaluations.
Also, note that all the problems, with exception of
the Rosenbrock function, can be successfully solved
with a similar computational cost. Even though they
have different characteristics, for example the Ack-
ley and Griewank functions are multimodal, the al-
gorithm does not significantly increase the number of
function evaluations in contrast with the other prob-
lems. Also, the covariance matrix is adapted accord-
ing to the function; as seen in all the convexproblems.
In the case of F
11
it is worth noting that the algo-
rithm is not trapped in a local minimum, because the
reached fitness values are less than 0.1. This means
that the algorithm is capable of displacing the search
distribution to the optimum region. An extra experi-
ment in this problem, please see
F
11
, shows that the
BEMNA (using the first annealing schedule) needs
more computational effort to solve the Rosenbrock
problem.
ECTA2014-InternationalConferenceonEvolutionaryComputationTheoryandApplications
256
4.1.2 Results Using the Annealing Schedule 2
Similar to the last subsection, Table 2-(b) shows the
results of our proposal, but using the second annealing
schedule. The extra computational effort of tracking
the surviving individuals delivers an excellent payoff
of a 63% reduction in the number of evaluations, for
the Rosenbrock problem; whilst the rest of the test
problems may or may not be improved. An inspection
in the mean values of the function evaluations show
some differences in comparison with the BEMNA us-
ing the first annealing schedule. In fact, our proposal
using the second annealing schedule needed less com-
putational effort to solve 7 out of 10 test problems;
F
2
-F
5
, F
7
, F
9
and F
10
.
These results show that this schedule is more con-
venient than the previous one. Even though the num-
ber of function evaluations are reduced, the BEMNA
is quite effective and can solve problems that simi-
lar approaches can not (Yunpeng et al., 2006), such
as the Rosenbrock function. Despite the differences
between both schedules, we can conclude that the
BENMA does not present any inconvenience to ad-
equately adapt the covariance matrix as demanded by
the problem.
4.2 Comparison Versus the BUMDA
Since the Boltzmann distribution has been widely
used in evolutionary computation, a comparison
against a related method from literature is desirable.
In order to make a comparison versus the state of
the art in evolutionary computing, a successful al-
gorithm in this branch is selected: Boltzmann Uni-
variate Marginal Distribution Algorithm (BUMDA),
see (Valdez et al., 2013). Our proposal is capable of
modeling dependency between variables. According
to this property, a suitable set of problems is chosen:
F
11
-F
17
. Here, we use the BEMNA with the second
annealing schedule.
Table 3 contrasts the error F F
reached for
each algorithm. For each problem there are three
measures: 1) the first row is the percentage of success
rate, 2) the second row is the mean and standard devi-
ation of reached fitness values, 3) the third row is the
mean and standard deviation of needed evaluations of
function. In addition, the performance between both
algorithms is verified by a non-parametric bootstrap
test. Here, the hypothesis is based in the mean value
µ. So, if the null hypothesis H
0
: µ
1
= µ
2
is rejected
there is statistical evidence to accept differences be-
tween both algorithms; this case is marked in bold.
The BEMNA shows a better performance than the
BUMDA in most of the test problems in 10 dimen-
Table 3: Percentage of success rate, reached fitness values
and needed number of evaluations (mean and standard de-
viation) for each algorithm. The last column shows a non-
parametric bootstrap test. If ρ is boldface there are statisti-
cal evidence of difference between both algorithms. (a) 10
dimensions. (b) 30 dimensions.
(a)
F BEMNA BUMDA ρ
F
11
100.00 0.00
7.29e-7±2.10e-7 8.08e+0±7.13e-2 1.00e-4
1.59e+4±1.20e+3 3.00e+5±0.00e+0 1.00e-4
F
12
100.00 0.00
8.17e-7±1.29e-7 6.58e+1±1.56e+1 1.00e-4
8.04e+3±1.72e+2 3.00e+5±0.00e+0 1.00e-4
F
13
100.00 100.00
8.17e-7±1.52e-7 7.03e-7±1.92e-7 7.63e-2
6.04e+3±1.89e+2 1.17e+4±2.99e+2 1.00e-4
F
14
100.00 100.00
8.54e-7±1.04e-7 7.43e-7±1.87e-7 5.26e-2
5.36e+3±1.68e+2 1.12e+4±2.49e+2 1.00e-4
F
15
100.00 100.00
7.27e-7±1.85e-7 7.14e-7±1.94e-7 8.45e-1
5.90e+3±2.48e+2 1.28e+4±3.61e+2 1.00e-4
F
16
100.00 100.00
7.80e-7±1.81e-7 6.57e-7±2.07e-7 8.34e-2
5.71e+3±1.47e+2 1.15e+4±2.22e+2 1.00e-4
F
17
80.00 100.00
7.31e+0±1.52e+1 7.95e-7±1.90e-7 7.39e-2
2.71e+4±3.78e+4 1.91e+4±6.98e+2 3.73e-1
(b)
F BEMNA BUMDA ρ
F
11
100.00 0.00
9.27e-7±7.11e-8 2.80e+1±7.56e-2 1.00e-4
2.52e+5±9.92e+3 3.00e+5±0.00e+0 1.00e-4
F
12
100.00 0.00
9.12e-7±7.13e-8 5.00e+3±1.08e+3 1.00e-4
8.40e+4±7.91e+2 3.00e+5±0.00e+0 1.00e-4
F
13
100.00 93.33
8.86e-7±9.44e-8 1.08e-6±8.72e-7 4.67e-1
5.62e+4±1.18e+3 4.25e+4±7.13e+4 4.42e-1
F
14
100.00 100.00
8.85e-7±8.55e-8 8.47e-7±1.29e-7 3.35e-1
4.51e+4±6.21e+2 2.05e+4±2.20e+2 1.00e-4
F
15
100.00 100.00
9.23e-7±6.00e-8 7.99e-7±9.51e-8 4.00e-4
5.47e+4±6.58e+2 2.33e+4±3.29e+2 1.00e-4
F
16
100.00 100.00
9.46e-7±4.87e-8 8.60e-7±1.04e-7 7.60e-3
5.11e+4±6.23e+2 2.21e+4±3.29e+2 1.00e-4
F
17
40.00 0.00
4.48e+1±5.38e+1 1.57e-2±1.66e-2 5.40e-3
2.14e+5±1.10e+5 3.00e+5±0.00e+0 6.20e-3
sions. Also, notice that the BEMNA solves all the
problems using fewer evaluations than the BUMDA.
On the other hand, for 30 dimensional problems the
BUMDA displays a better performance than our pro-
posal in F
14
F
16
. It is an expected result be-
cause these problems have weakly correlated vari-
ables. However, our approach effectively solved the
hardest problems in 30 dimensions: F
11
,F
12
and F
17
.
ABoltzmannMultivariateEstimationofDistributionAlgorithmforContinuousOptimization
257
5 CONCLUSIONS
The main contributions of this proposalare the deriva-
tion of formulae for computing the parameters of an
adequate Multivariate Normal Distribution for locat-
ing the optimum, and the introduction of simple an-
nealing schedules for updating the β value.
The derived formulae for computing the search
distribution use the objectivefunctionvalue as a linear
factor for estimating weighted parameters. The lin-
ear weights avoid prematurely collapsing probability
mass around a single solution, preventing premature
convergence. In addition, this fashion of parameter
estimation produces a softer change in the structure
of the covariance matrix between consecutive gener-
ations, in contrast to the exponential weights used in
similar approaches (Yunpeng et al., 2006). The ad-
vantage of using linear weights, even with a fixed β
value, is well documented in (Valdez et al., 2013),
where similar formulae are used for the univariate
case. Our proposal combines the conviniences of the
linear weights with simple annealing schedules to reg-
ulate the exploration of the algorithm.
The advantage of using the linear weights and
the annealing schedules is evidenced by statistical re-
sults presented in Section 4. Furthermore, the results
demonstrate that the BEMNA effectively solves the
Rosenbrock problem, which is not solved by similar
algorithms, (Yunpeng et al., 2006) and (Valdez et al.,
2013); as well as the other problems using an inferior
computational cost.
Future work intends to propose additional en-
hancement techniques to be applied over the current
BEMNA for reducing the population size as well as
the number of function evaluations. Moreover, we
will explore new ways to use this approach in evo-
lutionary computation.
REFERENCES
Cover, T. M. and Thomas, J. A. (2006). Elements of Infor-
mation Theory (Wiley Series in Telecommunications
and Signal Processing). Wiley-Interscience.
Dong, W. and Yao, X. (2008). Unified Eigen analysis on
multivariate Gaussian based estimation of distribution
algorithms. Information Sciences, 178(15):215–247.
Grahl, J., Bosman, P. A. N., and Minner, S. (2007). Conver-
gence phases, variance trajectories, and runtime anal-
ysis of continuos EDAs. In GECCO ’07: Proceedings
of the 8th annual conference on Genetic and evolu-
tionary computation, pages 516–522. ACM.
Hu, J., Wang, Y., Zhou, E., Fu, M. C., and Marcus, S. I.
(2012). A survey of some model-based methods for
global optimization. In Optimization, Control, and
Applications of Stochastic Systems, pages 157–179.
Springer.
Larra˜naga, P. (2002). A review on estimation of distribution
algorithms. In Larra˜naga, P. and Lozano, J., editors,
Estimation of Distribution Algorithms, volume 2 of
Genetic Algorithms and Evolutionary Computation,
pages 57–100. Springer US.
Larra˜naga, P., Etxeberria, R., Lozano, J. A., and Pe˜na,
J. M. (2000). Optimization in continuous domains by
learning and simulation of gaussian networks. Tech-
nical Report EHU-KZAA-IK-4/99, University of the
Basque Country.
Mahnig, T. and Muhlenbein, H. (2001). A new adap-
tive boltzmann selection schedule sds. In Evolu-
tionary Computation, 2001. Proceedings of the 2001
Congress on, volume 1, pages 183–190. IEEE.
M¨uehlenbein
1
, H., Bendisch
1
, J., and Voight
2
, H.-M.
(1996). From recombination of genes to the estima-
tion of distributions ii. continuous parameters.
M¨uhlenbein, H. (2012). Convergence theorems of estima-
tion of distribution algorithms. In Shakya, S. and
Santana, R., editors, Markov Networks in Evolution-
ary Computation, volume 14 of Adaptation, Learning,
and Optimization, pages 91–108. Springer Berlin Hei-
delberg.
Muhlenbein, H. and Mahnig, T. (1999). The factorized dis-
tribution algorithm for additively decomposed func-
tions. In Evolutionary Computation, 1999. CEC 99.
Proceedings of the 1999 Congress on, volume 1.
IEEE.
M¨uhlenbein, H., Mahnig, T., and Rodriguez, A. O.
(1999). Schemata, distributions and graphical models
in evolutionary optimization. Journal of Heuristics,
5(2):215–247.
M¨uhlenbein, H., Mahnig, T., and Rodriguez, A. O.
(1999). Schemata, distributions and graphical models
in evolutionary optimization. Journal of Heuristics,
5(2):215–247.
Ochoa, A. (2010). Opportunities for expensive optimization
with estimation of distribution algorithms. In Compu-
tational Intelligence in Expensive Optimization Prob-
lems, volume 2, pages 193–218. Springer.
Segovia-Dominguez, I., Hernandez-Aguirre, A., and Di-
harce, E. V. (2013). The gaussian polytree eda with
copula functions and mutations. In EVOLVE, volume
447 of Studies in Computational Intelligence, pages
123–153. Springer Berlin Heidelberg.
Shapiro, J. L. (2006). Diversity loss in general estimation of
distribution algorithms. In Parallel Problem Solving
from Nature-PPSN IX, pages 92–101. Springer.
Valdez, S. I., Hern´andez, A., and Botello, S. (2013). A
boltzmann based estimation of distribution algorithm.
Information Sciences, 236:126–137.
Yunpeng, C., Xiaomin, S., and Peifa, J. (2006). Probabilis-
tic modeling for continuous eda with boltzmann selec-
tion and kullback-leibeler divergence. In Proceedings
of the 8th annual conference on Genetic and evolu-
tionary computation, pages 389–396. ACM.
ECTA2014-InternationalConferenceonEvolutionaryComputationTheoryandApplications
258