A Boltzmann Multivariate Estimation of Distribution Algorithm for

Continuous Optimization

Ignacio Segovia-Dom´ınguez, S. Ivvan Valdez and Arturo Hern´andez-Aguirre

Department of Computer Science, Center for Research in Mathematics, Guanajuato, Mexico

Keywords:

Boltzmann distribution, Estimation of Distribution Algorithm, Optimization.

Abstract:

This paper introduces an approach for continuous optimization using an Estimation of Distribution Algorithm

(EDA), based on the Boltzmann distribution. When using the objective function as energy function, the Boltz-

mann function favors the most promising regions, making the probability exponentially proportional to the

objective function. Using the Boltzmann distribution directly for sampling is not possible because it requires

the computation of the objective function values in the complete search space. This work presents an ap-

proximation to the Boltzmann function by a multivariate Normal distribution. Formulae for computing the

mean and covariance matrix are derived by minimizing the Kullback-Leibler divergence. The proposed EDA

is competitive and often superior to similar algorithms as it is shown by statistical results reported here.

1 INTRODUCTION

The Estimation of Distribution Algorithms

(M¨uehlenbein

et al., 1996) are optimization methods

based on estimating and sampling a probability

distribution. The aim is to favor the most promising

regions assigning them the highest probability values.

In fact, the most promising regions are unknown

and have to be discovered during the optimization

process. The main goal of the EDA is to pose the

probability mass around the optima. The strategy,

without loss of generality for a maximization pro-

cess, is to reinforce the sampling in regions with

the maximum objective function or ﬁtness function

values, and disregard the regions with the minimum

values. The most common scheme for continuous

optimization, using EDAs, is to use a multivari-

ate or univariate Normal distribution (Larra˜naga,

2002; Larra˜naga et al., 2000; Dong and Yao, 2008;

Segovia-Dominguez et al., 2013). The parameters of

the Normal density are estimated by using maximum

likelihood estimators (MLEs) over the selected

set, which is determined by the truncation method;

usually half of the population with the worst objective

value is truncated. Nevertheless, these approaches

have shown a competitive performance, some evident

issues can be noticed in the strategy just mentioned:

• The truncation selection hides the ﬁtness land-

scape assigning to the selected individuals the

same importance in the parameter estimation. In

consequence, the search distribution parameters

are estimated as if the regions represented by the

selected set were equally good in ﬁtness values.

• It is a well-known issue that the variance in esti-

mation of distribution algorithms is often less than

required, hence, the MLE variance estimator is

not the most adequate for searching the optimum

(Shapiro, 2006; Grahl et al., 2007).

The Boltzmann distribution has been largely used

in optimization; in the Estimation of Distribution Al-

gorithms (EDAs) context researchers have proposed

different approaches such as the BEDA (M¨uhlenbein,

2012; Mahnig and Muhlenbein, 2001; M¨uhlenbein

et al., 1999; M¨uhlenbein et al., 1999). This is a gen-

eral framework for Boltzmann distribution based es-

timation of distribution algorithms, where practical

EDAs have been derived from. For example, the FDA

(Muhlenbein and Mahnig, 1999), the Yun peng et al.

proposal (Yunpeng et al., 2006) and Valdez et al. pro-

posal. (Valdez et al., 2013) demonstrate this. The

unifying characteristic of this approach is that they

intend to equip the stochastic search algorithm with

an engine which favors the most promising regions.

The better the objective function is in a region, the

more intense the sampling must be. The Boltzmann

distribution is used to achieve this purpose.

The Gibbs or Boltzmann distribution of a ﬁtness

function g(x) is deﬁned by:

p(x) :=

exp(βg(x))

(1)

251

Segovia-Domínguez I., Valdez S. and Hernández-Aguirre A..

A Boltzmann Multivariate Estimation of Distribution Algorithm for Continuous Optimization.

DOI: 10.5220/0005079902510258

In Proceedings of the International Conference on Evolutionary Computation Theory and Applications (ECTA-2014), pages 251-258

ISBN: 978-989-758-052-9

 2014 SCITEPRESS (Science and Technology Publications, Lda.)

As can be noticed in Equation (1) the objective

function is used as an energy function directly. In

practical approaches, the Boltzmann distribution can-

not be used directly for sampling because it is nec-

essary to know the objective function in the whole

domain. That is the reason why the parameters of

a density function are computed by minimizing a

measure between the parametric distribution and the

Boltzmann distribution, for instance, the Kullback-

Leibler divergence (Ochoa, 2010; Yunpeng et al.,

2006; Valdez et al., 2013).

There are remarkable challenges to consider for

the designing of EDAs based on the Boltzmann dis-

tribution:

• To choose an adequate β parameter in Equation

(1). Usually β depends on the time or is dy-

namic during the optimization process. The pro-

cess which controls the β updating each gener-

ation is called the annealing schedule. The an-

nealing schedule can be used to manage the ex-

ploration and convergence of the algorithm.

• To derive robust parameter estimators for the ap-

proximated distribution. Some approaches (Hu

et al., 2012; Yunpeng et al., 2006) have derived

formulae for estimating parameters of a distribu-

tion which approximate the Boltzmann density, by

weighting the population or selected set by expo-

nential functions, similar to Equation (1). Even

though competitive results are obtained, the pro-

posals often suffer from premature convergence

because the exponential function drastically leads

the probability mass to suboptimal positions. This

behavior can be avoided by manipulating the β

value, but it is not simple to determine how to

do it, as the second option is to obtain formulae

which do not impact the estimators as drastically.

• The last two issues are also related with the men-

tioned variance reduction which is a common is-

sue in EDAs (Shapiro, 2006).

Our proposal intends to tackle the challenges just

mentioned. Accordingly, this paper presents a novel

algorithm, based on the approximation of the Boltz-

mann function by a Normal multivariate distribution,

which introduces the following features:

• Two proposals of annealing schedules to update

the β value.

• Formulae which are robust or at least are not im-

pacted as drastically as the exponential function

used in other approaches.

• The novel annealing schedules tackle the variance

reduction problem; hence, these are mechanisms

to avoid the premature convergence of the algo-

rithm.

The organization of the paper is as follows: Sec-

tion 2 presents the derivationof the formulae for com-

puting the parameters of the Normal multivariate dis-

tribution, Section 3 introduces the Boltzmann Estima-

tion of Multivariate Normal Algorithm (BEMNA), as

well as two annealing schedules used in it. Section

4 is devoted to testing the proposed EDA on well-

known test functions. Also, a comparison to another

algorithm from literature is performed. Finally, Sec-

tion 5 provides some concluding remarks.

2 APPROXIMATING THE

BOLTZMANN DISTRIBUTION

BY THE NORMAL

MULTIVARIATE

DISTRIBUTION

This section introduces the formulae to estimate the

mean vector and covariance matrix, it is to say~µ

∗

and

∗

, of a Normal multivariate density which approx-

imate the multivariate Boltzmann P

density, given

a set of samples ~x

(1)

,~x

(2)

,...,~x

(N)

which are observa-

tions of a random vector

X. Let

X be a random vector

such that

X ∼ Q

, where Q

= Q(x,µ,Σ) is the multi-

variate Normal density as shown in Equation (2). The

corresponding Boltzmann density is in Equation (3).

(2π)

|Σ|

exp



−

(~x−~µ)

−1

(~x−~µ)



(2)

exp(βg(~x))

(3)

The procedure for ﬁnding the parameters of Q

which best approximate P

consist in minimizing

a measure of dissimilarity between density func-

tions; similar to previous works (Yunpeng et al.,

2006) (Valdez et al., 2013). Here, the Kullback-

Leibler Divergence presented in Equation (4), K

||P

), is used as statistical measure between

probability distributions.

log

d~x (4)

The minimization of K

for ﬁnding the optimal

parameters [~µ

∗

,Σ

∗

] can be stated as shown in Eq. (5).

[~µ, Σ] = argmin{K

} (5)

Notice that K

can be rewrite as

ECTA2014-InternationalConferenceonEvolutionaryComputationTheoryandApplications

252

logQ

d~x−

logP

d~x

= −H(Q

) −

logP

d~x

= −

log((2πe)

|Σ|) −

logP

d~x,

(6)

where the term H(Q

) means the entropy of the mul-

tivariate Normal density (Cover and Thomas, 2006).

In order to ﬁnd the parameters which minimize the

Kulback-Leibler Divergence, the partial derivatives

are calculated; Equations (7) and (8).

δK

δ~µ

= −

δQ

δ~µ

logP

d~x

= −

[Σ

−1

(~x−~µ)]logP

d~x

(7)

δK

δΣ

= −

δlog(|Σ|)

δΣ

−

δQ

δΣ

logP

d~x

= −

[Σ

−1

(~x−~µ)(~x−~µ)

−1

]logP

d~x

−1

logP

d~x−

−1

(8)

The optimal estimates for the mean vector and co-

variance matrix are obtained by making the deriva-

tives equal to 0, as in Equations (9) and (10), and

solving for~µ and Σ respectively.

0 =

δK

δ~µ

0 =~µ

logP

d~x−

~xlogP

d~x

0 =~µβE

[g(

X)] −~µlogZ − E[g(

X]β + E[

X]logZ

~µ =

[g(

X)]

(9)

The following equivalences were used to obtain

Equations (9) and (10):

• logP

= βg(~x) − logZ,

•

~xd~x = E

[

X],

(~x − ~µ)(~x − ~µ)

d~x =

[(

X −~µ)(

X −~µ)

], and other similar equations.

• As

X ∼ Q

then E

[

X] =~µ and E

[(

X −~µ)(

X −

~µ)

] = Σ.

0 =

δK

δΣ

0 =

(~x−~µ)(~x−~µ)

logP

d~x

−

logP

d~xΣ+ Σ

Σ =

[g(

X)(

X −~µ)(

X −~µ)

]

[g(

X)] − 1/β

(10)

Finally, for estimating the parameters using the

observations ~x

(1)

,~x

(2)

,...,~x

(N)

of the random vari-

able

X, a numerical stochastic approximation by the

Monte Carlo method is computed, as shown in Equa-

tions (11) and (12). These two equations are the esti-

mators that approximate the parameters of the search

distribution.

~µ

∗

∑

i=1

g(~x

(i)

)~x

(i)

∑

i=1

g(~x

(i)

)

(11)

∗

= r

∑

i=1

g(~x

(i)

)(~x

(i)

−~µ)(~x

(i)

−~µ)

(12)

where

∑

i=1

g(~x

(i)

) −

−1

(13)

2.1 A Note About the derived Formulae

In Equations (11) and (12) the estimators use weights

deﬁned by

g(~x

(i)

)

∑

i=1

g(~x

(i)

)

. In other words, the weighted es-

timators are computed by using weights proportional

to the objective function value of each individual in

the selected set. In contrast, with similar approaches

(Hu et al., 2012; Yunpeng et al., 2006), there are some

advantages of these derivations:

• A proportional weight of the estimators avoids

drastic changes when the individuals considerably

differ in the objective function value. It is to

say that if a new individual with a large objec-

tive value is sampled, the exponential weights can

concentrate the probability mass around this sin-

gle individual, leading the algorithm to premature

convergence. This is less desirable when using

proportional weights.

• A second advantageis that the minimum variance,

which is bounded by a β = ∞, is not 0 for our ap-

proach, which is a signiﬁcant advantage consid-

ering that naturally EDAs suffer from premature

convergence and variance reduction (Hu et al.,

2012; Yunpeng et al., 2006).

ABoltzmannMultivariateEstimationofDistributionAlgorithmforContinuousOptimization

253

The energy function g(~x) must be positive or equal

to zero in the domain. However, the objective func-

tion F (~x) might be negative, considering that this

is a maximization/minimization problem. In order

to construct a valid energy function for the contin-

uous minimization problem, throughout this paper

the g(~x

) value is computed as g(~x

) = −F (~x

) −

min(−F (~x

)) + 10

−12

2.2 Annealing Schedule 1

As seen in Equations (11), (12) and (13), the β

value only affects the covariance matrix computa-

tion. The grade of impact of β over the covari-

ance is highly related with

∑

i=1

g(~x

(i)

). Accordingly,

N/β <

∑

i=1

g(~x

(i)

) must remain in order to maintain

a positive variance in the diagonal of the covariance

matrix, on the other hand if N/β <<

∑

i=1

g(~x

(i)

) then

its effect is diminished. As a consequence, the es-

timator reaches the minimum variance when β → ∞

because N/β → 0. An interesting remark about this

last setting is that the Normal distribution with such a

minimum variance is not similar to a Dirac δ, while

the corresponding Boltzmann distribution actually is.

Considering the arguments stated above, assume a β

value as shown in Equation (14).

β = N/((1− γ)

∑

i=1

g(~x

(i)

)), (14)

where 0 < γ < 1 in order to fulﬁll the requirements

discussed above. Hence, Equation (12) is rewritten as

Equation (15).

∗

= α

∑

i=1

g(~x

(i)

)(~x

(i)

−~µ)(~x

(i)

−~µ)

∑

i=1

g(~x

(i)

)

(15)

For α = 1/γ, recalling that 0 < γ < 1, in conse-

quence 1 < α < ∞. The ﬁrst schedule proposed is to

set α according historical improvements as follows:

For the minimization case:

if (F

(t)

best

< F

(t−1)

best

) α = 1.1α

else α = 0.9α

if (α > 2) α = 2

if (α < 1) α = 1

This schedule increases the covariance matrix val-

ues if an improvement in the objective function of

the elite individual F

best

is detected. Otherwise, the

covariance matrix at least gets its minimum values

(β = ∞). On the other hand, we prevent having scal-

ing factors greater than 0 in order to control the ex-

ploration. The reader can notice that α = 1, which is

equivalent to β = ∞, indeed it is a weighted estimator

of the covariancematrix using weights proportionalto

the objective value. Hence, this captures the structure

of the population favoring the most promising solu-

tions.

2.3 Annealing Schedule 2

The second annealing schedule uses α = 1/γ, where

γ will be modiﬁed in a linear way with the improve-

ments. This shows a difference versus the previous

proposal because the ﬁrst annealing schedule in-

creases/decreases the α value in an exponential way.

Another difference between the two schedules is that

the ﬁrst schedule uses the improvements over the

best objective value found so far, while this second

schedule veriﬁes the improvements over the selected

set. Notice that updating β in Equation (13) is equiv-

alent to updating α in Equation (15) and equivalent

to updating γ considering that α = 1/γ. According

to the arguments in the Subsection above, 0 < γ ≤ 1,

γ = 1 corresponds to the minimum variance (β = ∞).

The updating of γ proceeds as follows:

• Let M

be the number of selected individuals that

are preserved from the current generation to the

next one, throughout this paper known as the sur-

vivor individuals. Please note that the maximum

number of survivor individuals is the sample size

• Deﬁne a number of partitions of the interval [0,1]

as n

. For our experiments n

= 30.

• If M

> 0.5 then γ = γ− 1/n

, otherwise γ =

γ+ 1/n

• If γ < 1/n

then γ = 1/n

. If γ > 1 then γ = 1.

3 THE BOLTZMANN

ESTIMATION OF

MUTIVARIATE NORMAL

ALGORITHM

The Boltzmann Estimation of Normal Multivariate

Algorithm (BEMNA) is presented in Figure 1. The

BEMNA starts with a random population, then it is

evaluated and half of the population with the best indi-

viduals are selected. The selected set objective func-

tion is used to compute the weights for the parameter

computation,~µ and Σ then are computed using the se-

lected set variable values and the weights. Finally,

using the computed parameters a set of individuals is

simulated. From the second generation in advance the

new selected set is computed by using the current se-

lected set and the simulated one.

ECTA2014-InternationalConferenceonEvolutionaryComputationTheoryandApplications

254

BEMNA

1. t ← 0. Initialize N

pop

, N

sel

, S

and α/γ according to

the annealing schedule.

2. P

(0)

← Uniformly distributed population, where

(0)

= {~x

,...,~x

sel

3. Evaluate F (~x

) and g(~x

4. Compute the estimates ~µ

∗

and Σ

∗

of P

(t)

, by equa-

tions (11) and (15).

5. P

(t)

← Sampling S

individuals from Q(x;~µ

∗

,Σ

∗

6. Evaluate F (~x

) and g(~x

) of P

(t)

7. Update α/γ according to the annealing schedule.

8. P

(t+1)

← The best N

sel

individuals from P

(t)

∪ P

(t)

9. t ← t + 1.

10. If stopping criterion is not reached return to step 4.

11. Return the elite individual in P

(t)

as the best approx-

imation to the optimum.

Figure 1: Pseudo-code of the BEMNA.

3.1 Detailed Steps for the First

Annealing Schedule

The recommended population size for this schedule is

of N

pop

= 15d where d is the number of dimensions.

Initial α = 1. The number of selected individuals in

Steps 2 and 8 is N

sel

= N

pop

/2. The parameter esti-

mation for the multivariate Normal in Step 4 is done

as explained in Section 2. In Step 5 the new sample is

actually P

(t)

(which is different in the second sched-

ule) of size N

pop

. To update the α value the annealing

schedule is performed as explained in Subsection 2.2.

Please remember that for each α there is a correspond-

ing β and vice versa.

A possible issue well known in Normal multivari-

ate EDAs, is that the covariance matrix could present

negative eigenvalues, due to numerical errors (Dong

and Yao, 2008). In such a case we apply the follow-

ing repairing scheme:

Let L be the matrix of eigenvectors of Σ by

columns, and Λ a diagonal matrix with the corre-

sponding eigenvalues, in decreasing order, and d the

number of dimensions.

while Λ

d,d

< 0

λ = −Λ

d,d

For i = 1..d

i,i

= Λ

i,i

+ λ

Σ = LΛL

Decompose Σ in L and Λ

The repairing method is not utilized most of the

time, and it is quite rare that it performs more than 1

iteration to ﬁx the covariance matrix.

3.2 Detailed Steps for the Second

Annealing Schedule

For the second annealing schedule the population size

is N

pop

= ⌈(d + 3)(1+d

0.7

)⌉, while the sample size is

= ⌈2(1 + d

0.7

)⌉; taken from an empirical test. For

this schedule the γ parameter is updated. Remember

that there is a direct relation among β, α and γ; one of

them can deﬁne the others. The initial gamma value

is γ

= 0.5− 1/n

. In this schedule the number of se-

lected individuals is the same as the population size

sel

= N

pop

. In the ﬁrst generation, the complete pop-

ulation becomes the selected set, as we do not have

more individuals. In Step 5 a new sample P

(t)

of size

is simulated. In Step 7 the annealing schedule is

performed as explained in Subsection 2.3 for updat-

ing γ. Finally in Step 8, a new selected set P

(t+1)

size N

sel

is obtained.

Table 1: Test problems (Larra˜naga, 2002; Valdez et al.,

2013). All of them are minimization problems. For ap-

plying the BEMNA these are converted to maximization

and translated to positive as follows: g(~x) = −F (~x) −

min(−F (~x)) + 1 × 10

−12

. Where F (~x) is the objective

function and g(~x) is the energy function used in Figure 1.

The minimum ﬁtness value of all problems is 0 except for

where F

∗

= −d(d + 4)(d − 1)/6.

F Name Domain

Sphere [−10, 5]

Tablet [−10, 5]

Ellipsoid [−10, 5]

Cigar [−10, 5]

Cigar Tablet [−10, 5]

Different Powers [−10, 5]

Parabolic Ridge [−10, 5]

Sharp Ridge [−10, 5]

Griewank [−600, 600]

Ackley [−32.768, 16.384]

Rosenbrock [−10, 5]

Trid [−d

, d

]

Brown [−1, 4]

Levy Montalvo 1 [−20, 10]

Levy Montalvo 2 [−20, 10]

Levy 8 [−20, 10]

Pinter [−20, 10]

4 EXPERIMENTS

This section provides statistical results from well-

known test problems, see Table 1. The BEMNA, de-

scribed in Figure 1, is tested in three different ways:

1) using the annealing schedule 1, 2) using the anneal-

ing schedule 2, and 3) comparing our proposal against

a previous version with uncorrelated variables, taken

ABoltzmannMultivariateEstimationofDistributionAlgorithmforContinuousOptimization

255

from literature. The following subsections discuss

each of these experiments.

Table 2: Statistical results in 30 dimensions of 11 test prob-

lems, from 15 independent executions. First row: objective

values reached, and second row: number of function evalu-

ations. If the obtained ﬁtness value reaches the desired pre-

cision, i.e. F − F

∗

< 1× 10

−6

, then this cell is boldfaced.

(a) Schedule 1. (b) Schedule 2.

(a)

F Best Worst Mean Median SD

6.32e-7 9.97e-7 8.73e-7 8.69e-7 1.03e-7

7.77e4 9.92e4 8.59e4 8.53e4 6.01e3

7.23e-7 9.95e-7 8.98e-7 9.55e-7 9.57e-8

7.63e4 1.05e5 9.01e4 9.20e4 7.92e3

5.58e-7 9.95e-7 8.59e-7 9.13e-7 1.25e-7

1.08e5 1.24e5 1.18e5 1.20e5 5.29e3

3.96e-7 9.71e-7 8.28e-7 8.67e-7 1.61e-7

1.28e5 1.65e5 1.48e5 1.48e5 1.07e4

7.28e-7 9.87e-7 8.82e-7 8.99e-7 7.29e-8

1.20e5 1.44e5 1.33e5 1.33e5 7.24e3

2.84e-7 9.84e-7 7.51e-7 8.28e-7 2.04e-7

3.64e4 5.21e4 4.27e4 4.31e4 4.07e3

0.00 0.00 0.00 0.00 2.58e-7

1.06e5 1.30e5 1.17e5 1.14e5 6.74e3

0.00 0.00 0.00 0.00 0.00

1.81e5 2.20e5 2.00e5 2.01e5 1.03e4

7.10e-7 9.96e-7 8.46e-7 8.33e-7 9.64e-8

9.56e4 1.23e5 1.11e5 1.12e5 7.63e3

7.28e-7 9.80e-7 9.13e-7 9.39e-7 6.57e-8

1.40e5 1.81e5 1.58e5 1.58e5 1.00e4

8.97e-4 1.35e-1 2.71e-2 9.13e-3 3.94e-2

3.00e5 3.00e5 3.00e5 3.00e5 0.00

4.62e-7 9.95e-7 8.82e-7 9.53e-7 1.47e-7

3.19e5 3.60e5 3.41e5 3.38e5 1.34e4

(b)

F Best Worst Mean Median SD

7.44e-7 9.97e-7 9.31e-7 9.51e-7 7.81e-8

9.97e4 1.02e5 1.01e5 1.01e5 6.21e2

8.25e-7 9.88e-7 9.01e-7 8.96e-7 5.15e-8

7.15e4 7.34e4 7.26e4 7.28e4 6.61e2

8.14e-7 9.99e-7 9.32e-7 9.40e-7 5.44e-8

6.09e4 6.31e4 6.19e4 6.20e4 6.08e2

3.42e-7 9.89e-7 8.40e-7 8.73e-7 1.80e-7

2.78e4 2.95e4 2.88e4 2.87e4 5.30e2

8.68e-7 9.89e-7 9.26e-7 9.29e-7 3.45e-8

8.20e4 8.41e4 8.28e4 8.27e4 4.79e2

7.05e-7 9.85e-7 9.17e-7 9.35e-7 7.49e-8

9.55e4 9.72e4 9.64e4 9.64e4 5.84e2

6.72e-7 9.95e-7 8.87e-7 9.13e-7 1.09e-7

8.48e4 8.77e4 8.62e4 8.62e4 8.43e2

6.36e-7 9.98e-7 8.72e-7 9.27e-7 1.28e-7

2.20e5 2.59e5 2.43e5 2.43e5 1.06e4

7.86e-7 9.97e-7 8.92e-7 9.10e-7 7.30e-8

8.58e4 8.74e4 8.64e4 8.63e4 4.63e2

6.89e-7 9.94e-7 9.13e-7 9.56e-7 9.62e-8

5.66e4 5.85e4 5.77e4 5.77e4 5.45e2

8.43e-7 9.90e-7 9.54e-7 9.71e-7 4.44e-8

1.25e5 1.28e5 1.26e5 1.26e5 9.30e2

4.1 Testing the BEMNA with Both

Annealing Schedules

This subsection presents the results of the BEMNA

in 30 dimensions. This set of problems is taken from

the literature: 8 unimodal, 2 multimodal and the gen-

eralized Rosenbrock functions. The population size,

number of samples and similar issues are discussed in

subsections 2.2 and 2.3. In addition, the algorithm is

stopped when either: a maximum number of 3× 10

evaluations is reached or the precision to the optimum

value is F − F

∗

< 1× 10

−6

, except for problem

where the maximum number is 6× 10

evaluations.

The results are presented in Table 2. The value 0

is reported if F − F

∗

< 1 × 10

−16

. For each prob-

lem the statistics are presented in two rows. The ﬁrst

one summarizes the ﬁtness values reached whilst the

second one shows the needed number of evaluations.

If the obtained ﬁtness value reaches the desired preci-

sion, i.e. F − F

∗

< 1× 10

−6

, then this cell is bold-

faced. Here, F

∗

is the optimum value.

4.1.1 Results Using the Annealing Schedule 1

The Table 2-(a) presents the results of 15 independent

executions for the set of 10 unimodal functions and

the Rosenbrock problem (F

). Most of the prob-

lems, except the Rosenbrock function, are success-

fully solved by the BEMNA with a precision less than

1e− 6. Actually, the column of worst values shows a

perfect success rate, except for the Rosenbrock prob-

lem. Therefore for most of the problems the important

result is the number of function evaluations.

Also, note that all the problems, with exception of

the Rosenbrock function, can be successfully solved

with a similar computational cost. Even though they

have different characteristics, for example the Ack-

ley and Griewank functions are multimodal, the al-

gorithm does not signiﬁcantly increase the number of

function evaluations in contrast with the other prob-

lems. Also, the covariance matrix is adapted accord-

ing to the function; as seen in all the convexproblems.

In the case of F

it is worth noting that the algo-

rithm is not trapped in a local minimum, because the

reached ﬁtness values are less than 0.1. This means

that the algorithm is capable of displacing the search

distribution to the optimum region. An extra experi-

ment in this problem, please see

, shows that the

BEMNA (using the ﬁrst annealing schedule) needs

more computational effort to solve the Rosenbrock

problem.

ECTA2014-InternationalConferenceonEvolutionaryComputationTheoryandApplications

256

4.1.2 Results Using the Annealing Schedule 2

Similar to the last subsection, Table 2-(b) shows the

results of our proposal, but using the second annealing

schedule. The extra computational effort of tracking

the surviving individuals delivers an excellent payoff

of a 63% reduction in the number of evaluations, for

the Rosenbrock problem; whilst the rest of the test

problems may or may not be improved. An inspection

in the mean values of the function evaluations show

some differences in comparison with the BEMNA us-

ing the ﬁrst annealing schedule. In fact, our proposal

using the second annealing schedule needed less com-

putational effort to solve 7 out of 10 test problems;

-F

, F

and F

These results show that this schedule is more con-

venient than the previous one. Even though the num-

ber of function evaluations are reduced, the BEMNA

is quite effective and can solve problems that simi-

lar approaches can not (Yunpeng et al., 2006), such

as the Rosenbrock function. Despite the differences

between both schedules, we can conclude that the

BENMA does not present any inconvenience to ad-

equately adapt the covariance matrix as demanded by

the problem.

4.2 Comparison Versus the BUMDA

Since the Boltzmann distribution has been widely

used in evolutionary computation, a comparison

against a related method from literature is desirable.

In order to make a comparison versus the state of

the art in evolutionary computing, a successful al-

gorithm in this branch is selected: Boltzmann Uni-

variate Marginal Distribution Algorithm (BUMDA),

see (Valdez et al., 2013). Our proposal is capable of

modeling dependency between variables. According

to this property, a suitable set of problems is chosen:

-F

. Here, we use the BEMNA with the second

annealing schedule.

Table 3 contrasts the error F − F

∗

reached for

each algorithm. For each problem there are three

measures: 1) the ﬁrst row is the percentage of success

rate, 2) the second row is the mean and standard devi-

ation of reached ﬁtness values, 3) the third row is the

mean and standard deviation of needed evaluations of

function. In addition, the performance between both

algorithms is veriﬁed by a non-parametric bootstrap

test. Here, the hypothesis is based in the mean value

µ. So, if the null hypothesis H

: µ

= µ

is rejected

there is statistical evidence to accept differences be-

tween both algorithms; this case is marked in bold.

The BEMNA shows a better performance than the

BUMDA in most of the test problems in 10 dimen-

Table 3: Percentage of success rate, reached ﬁtness values

and needed number of evaluations (mean and standard de-

viation) for each algorithm. The last column shows a non-

parametric bootstrap test. If ρ is boldface there are statisti-

cal evidence of difference between both algorithms. (a) 10

dimensions. (b) 30 dimensions.

(a)

F BEMNA BUMDA ρ

100.00 0.00

7.29e-7±2.10e-7 8.08e+0±7.13e-2 1.00e-4

1.59e+4±1.20e+3 3.00e+5±0.00e+0 1.00e-4

100.00 0.00

8.17e-7±1.29e-7 6.58e+1±1.56e+1 1.00e-4

8.04e+3±1.72e+2 3.00e+5±0.00e+0 1.00e-4

100.00 100.00

8.17e-7±1.52e-7 7.03e-7±1.92e-7 7.63e-2

6.04e+3±1.89e+2 1.17e+4±2.99e+2 1.00e-4

100.00 100.00

8.54e-7±1.04e-7 7.43e-7±1.87e-7 5.26e-2

5.36e+3±1.68e+2 1.12e+4±2.49e+2 1.00e-4

100.00 100.00

7.27e-7±1.85e-7 7.14e-7±1.94e-7 8.45e-1

5.90e+3±2.48e+2 1.28e+4±3.61e+2 1.00e-4

100.00 100.00

7.80e-7±1.81e-7 6.57e-7±2.07e-7 8.34e-2

5.71e+3±1.47e+2 1.15e+4±2.22e+2 1.00e-4

80.00 100.00

7.31e+0±1.52e+1 7.95e-7±1.90e-7 7.39e-2

2.71e+4±3.78e+4 1.91e+4±6.98e+2 3.73e-1

(b)

F BEMNA BUMDA ρ

100.00 0.00

9.27e-7±7.11e-8 2.80e+1±7.56e-2 1.00e-4

2.52e+5±9.92e+3 3.00e+5±0.00e+0 1.00e-4

100.00 0.00

9.12e-7±7.13e-8 5.00e+3±1.08e+3 1.00e-4

8.40e+4±7.91e+2 3.00e+5±0.00e+0 1.00e-4

100.00 93.33

8.86e-7±9.44e-8 1.08e-6±8.72e-7 4.67e-1

5.62e+4±1.18e+3 4.25e+4±7.13e+4 4.42e-1

100.00 100.00

8.85e-7±8.55e-8 8.47e-7±1.29e-7 3.35e-1

4.51e+4±6.21e+2 2.05e+4±2.20e+2 1.00e-4

100.00 100.00

9.23e-7±6.00e-8 7.99e-7±9.51e-8 4.00e-4

5.47e+4±6.58e+2 2.33e+4±3.29e+2 1.00e-4

100.00 100.00

9.46e-7±4.87e-8 8.60e-7±1.04e-7 7.60e-3

5.11e+4±6.23e+2 2.21e+4±3.29e+2 1.00e-4

40.00 0.00

4.48e+1±5.38e+1 1.57e-2±1.66e-2 5.40e-3

2.14e+5±1.10e+5 3.00e+5±0.00e+0 6.20e-3

sions. Also, notice that the BEMNA solves all the

problems using fewer evaluations than the BUMDA.

On the other hand, for 30 dimensional problems the

BUMDA displays a better performance than our pro-

posal in F

− F

. It is an expected result be-

cause these problems have weakly correlated vari-

ables. However, our approach effectively solved the

hardest problems in 30 dimensions: F

and F

ABoltzmannMultivariateEstimationofDistributionAlgorithmforContinuousOptimization

257

5 CONCLUSIONS

The main contributions of this proposalare the deriva-

tion of formulae for computing the parameters of an

adequate Multivariate Normal Distribution for locat-

ing the optimum, and the introduction of simple an-

nealing schedules for updating the β value.

The derived formulae for computing the search

distribution use the objectivefunctionvalue as a linear

factor for estimating weighted parameters. The lin-

ear weights avoid prematurely collapsing probability

mass around a single solution, preventing premature

convergence. In addition, this fashion of parameter

estimation produces a softer change in the structure

of the covariance matrix between consecutive gener-

ations, in contrast to the exponential weights used in

similar approaches (Yunpeng et al., 2006). The ad-

vantage of using linear weights, even with a ﬁxed β

value, is well documented in (Valdez et al., 2013),

where similar formulae are used for the univariate

case. Our proposal combines the conviniences of the

linear weights with simple annealing schedules to reg-

ulate the exploration of the algorithm.

The advantage of using the linear weights and

the annealing schedules is evidenced by statistical re-

sults presented in Section 4. Furthermore, the results

demonstrate that the BEMNA effectively solves the

Rosenbrock problem, which is not solved by similar

algorithms, (Yunpeng et al., 2006) and (Valdez et al.,

2013); as well as the other problems using an inferior

computational cost.

Future work intends to propose additional en-

hancement techniques to be applied over the current

BEMNA for reducing the population size as well as

the number of function evaluations. Moreover, we

will explore new ways to use this approach in evo-

lutionary computation.

REFERENCES

Cover, T. M. and Thomas, J. A. (2006). Elements of Infor-

mation Theory (Wiley Series in Telecommunications

and Signal Processing). Wiley-Interscience.

Dong, W. and Yao, X. (2008). Uniﬁed Eigen analysis on

multivariate Gaussian based estimation of distribution

algorithms. Information Sciences, 178(15):215–247.

Grahl, J., Bosman, P. A. N., and Minner, S. (2007). Conver-

gence phases, variance trajectories, and runtime anal-

ysis of continuos EDAs. In GECCO ’07: Proceedings

of the 8th annual conference on Genetic and evolu-

tionary computation, pages 516–522. ACM.

Hu, J., Wang, Y., Zhou, E., Fu, M. C., and Marcus, S. I.

(2012). A survey of some model-based methods for

global optimization. In Optimization, Control, and

Applications of Stochastic Systems, pages 157–179.

Springer.

Larra˜naga, P. (2002). A review on estimation of distribution

algorithms. In Larra˜naga, P. and Lozano, J., editors,

Estimation of Distribution Algorithms, volume 2 of

Genetic Algorithms and Evolutionary Computation,

pages 57–100. Springer US.

Larra˜naga, P., Etxeberria, R., Lozano, J. A., and Pe˜na,

J. M. (2000). Optimization in continuous domains by

learning and simulation of gaussian networks. Tech-

nical Report EHU-KZAA-IK-4/99, University of the

Basque Country.

Mahnig, T. and Muhlenbein, H. (2001). A new adap-

tive boltzmann selection schedule sds. In Evolu-

tionary Computation, 2001. Proceedings of the 2001

Congress on, volume 1, pages 183–190. IEEE.

M¨uehlenbein

, H., Bendisch

, J., and Voight

, H.-M.

(1996). From recombination of genes to the estima-

tion of distributions ii. continuous parameters.

M¨uhlenbein, H. (2012). Convergence theorems of estima-

tion of distribution algorithms. In Shakya, S. and

Santana, R., editors, Markov Networks in Evolution-

ary Computation, volume 14 of Adaptation, Learning,

and Optimization, pages 91–108. Springer Berlin Hei-

delberg.

Muhlenbein, H. and Mahnig, T. (1999). The factorized dis-

tribution algorithm for additively decomposed func-

tions. In Evolutionary Computation, 1999. CEC 99.

Proceedings of the 1999 Congress on, volume 1.

IEEE.

M¨uhlenbein, H., Mahnig, T., and Rodriguez, A. O.

(1999). Schemata, distributions and graphical models

in evolutionary optimization. Journal of Heuristics,

5(2):215–247.

M¨uhlenbein, H., Mahnig, T., and Rodriguez, A. O.

(1999). Schemata, distributions and graphical models

in evolutionary optimization. Journal of Heuristics,

5(2):215–247.

Ochoa, A. (2010). Opportunities for expensive optimization

with estimation of distribution algorithms. In Compu-

tational Intelligence in Expensive Optimization Prob-

lems, volume 2, pages 193–218. Springer.

Segovia-Dominguez, I., Hernandez-Aguirre, A., and Di-

harce, E. V. (2013). The gaussian polytree eda with

copula functions and mutations. In EVOLVE, volume

447 of Studies in Computational Intelligence, pages

123–153. Springer Berlin Heidelberg.

Shapiro, J. L. (2006). Diversity loss in general estimation of

distribution algorithms. In Parallel Problem Solving

from Nature-PPSN IX, pages 92–101. Springer.

Valdez, S. I., Hern´andez, A., and Botello, S. (2013). A

boltzmann based estimation of distribution algorithm.

Information Sciences, 236:126–137.

Yunpeng, C., Xiaomin, S., and Peifa, J. (2006). Probabilis-

tic modeling for continuous eda with boltzmann selec-

tion and kullback-leibeler divergence. In Proceedings

of the 8th annual conference on Genetic and evolu-

tionary computation, pages 389–396. ACM.

ECTA2014-InternationalConferenceonEvolutionaryComputationTheoryandApplications

258