because 2
N
sums for all possible configurations in
∑
s
s
s
(···) are needed to evaluate the E({ J} : β), where
we defined a set of interactions by {J} ≡ {J
ij
|i, j =
1,··· ,N}. To overcome this difficulty, we usually use
the so-called Markov chain Monte Carlo (MCMC)
method to calculate the expectation (7) by important
sampling from the Gibbs distribution at temperature
T = β
−1
.
On the other hand, the first term appearing in the
right hand side of (5), we evaluate the expectation by
making use of
U
GA
({J}) ≡ −
∑
s
s
s
P
GA
(s
s
s)
∑
ij
J
ij
s
i
s
j
!
= − lim
L→∞
1
L
L
∑
l=1
∑
ij
J
ij
s
i
(t,l)s
j
(t,l)
!
(8)
where s
i
(t,l) is the l-th sampling point at time t from
the empirical distribution of GA. Namely, we shall
replace the expectation of the cost function H(s
s
s) =
−
∑
ij
J
ij
s
i
s
j
over the distribution P
GA
(s
s
s) by sampling
from the empirical distribution of GA.
By a simple transformation β → T
−1
in equation
(5), we obtain the Boltzmann-machine-type learning
equation with respect to effective temperature T as
follows.
dT
dt
= −T
2
U({J}: T
−1
) −U
GA
({J})
(9)
From this learning equation, we find that time-
evolution of effective temperature depends on the dif-
ference between the expectations of the cost function
over the Gibbs distribution at temperature T and the
empirical distribution of GA.
3.3 Average-case Performance
We should evaluate the ‘average-case performance’of
the learning equation which is independent of the re-
alization of ‘problem’ {J}. Namely, one should eval-
uate the ‘data-averaged’ learning equation
dT
dt
= −T
2
E
{J}
U({J}: T
−1
)
−E
{J}
(U
GA
({J}))
(10)
to discuss the average-case performance, where
we defined the average E
{J}
(···) by E
{J}
(···) ≡
∏
ij
R
dJ
ij
(···)P(J
ij
). We should keep in mind that
in this paper we deal with the problem in which
each interaction J
ij
has no correlation with the oth-
ers, namely, E
{J}
(J
ij
J
kl
) = J
2
δ
i,k
δ
j,l
where we de-
fined J
2
as a variance of P(J
ij
) and δ
x,y
stands for a
Kronecker’s delta.
4 MATHEMATICALLY
TRACTABLE MODEL
In this section, we introduce a spin glass model which
will be used as a benchmark cost function to be mini-
mized by GA. The model is called as spin glass chain.
It is one-dimensional spin glass model having only
nearest neighboring interactionsD It is possible for us
to investigate the temperature dependence of internal
energy and moreover, one can obtain the lowest en-
ergy exactly. The energy function (Hamiltonian in the
literature of statistical physics) is given by
H = −
N
∑
i=1
J
i
s
i
s
i+1
, J
i
= N (0, 1) (11)
where J
i
stands for the interaction between spins s
i
and s
i+1
. N (a, b) denotes a normal Gaussian distri-
bution with mean a variance bD We plot the typical
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
0 200 400 600 800 1000
H/N
S
N = 10
-0.8
-0.7
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
U
β
Theory
N = 3000, MCS = 20000
-5.5
-5
-4.5
-4
-3.5
-3
-2.5
-2
-1.5
-1
-0.5
0 1 2 3 4 5
U
min
J
0
Figure 1: Typical energy landscape H(s
s
s) = −
∑
i
J
i
s
i
s
i+1
with P(J
i
) = N (0,1), E(J
i
J
j
) = δ
i, j
of the spin glass chain.
The number of spins is N = 10. It should be noted that
the horizontal axis S denotes the label of states, that is,
S = 1, 2,··· ,2
N
(= 1028). For instance, S = 1 stands for a
state, say, s
s
s(S = 1) = (+1,+1,··· , +1) and S = 2
N
denotes
s
s
s(S = 2
N
) = (−1, −1,··· ,−1). The right panel stands for
internal energy of spin glass chain as a function of temper-
ature. The solid line is exact result U = −β
R
∞
−∞
Dx
cosh
2
βx
,
whereas the dots denote the internal energy calculated by
the MCMC for N = 3000. The error-bars are calculated
by 10-independent runs for different choice of the {J} ≡
{J
i
|i = 1,··· ,N}. The inset indicates the U
min
as a function
of J
0
. We set J = 1.
energy landscape in Figure 1 (left). From this figure,
we find that the structure of the energy surface is com-
plicated and it seems to be difficult for us to find the
lowest energy state.
However, we should notice that in (11) s
i
takes ±1
and the product s
i
s
i+1
also has a value ±1. Hence, we
introduce the new variable τ
i
which is defined by τ
i
=
s
i
s
i+1
, then τ
i
takes τ
i
∈ {1, −1}. Therefore, in order
to minimize H(τ
τ
τ) = −
∑
i
J
i
τ
i
, we should determine
τ
i
= sgn(J
i
) for each i and then, we have the lowest
energy as U
min
= −
∑
i
J
i
sgn(J
i
) = −
∑
i
|J
i
|. Namely,
when J
i
obeys a Gaussian with mean J
0
and variance
J
2
, the lowest energy for a single spin is obtained in
A GIBBS DISTRIBUTION THAT LEARNS FROM GA DYNAMICS
297