Using a free variable q, we can rewrite the above as
follows.
w
(J)
1
= q bw
(J−1)
m−1
, w
(J)
m
= (1 −q) bw
(J−1)
m−1
(10)
3 LEARNING METHODS
3.1 Existing Learning Methods
As existing learning methods, we focus on two meth-
ods: a basic one and an excellent one.
Complex-valued backpropagation (C-BP) (Nitta,
1997) is the most basic learning method of C-MLP.
It carries out search using only the complex gradient.
There can be two ways of processing a step length:
fixed or adaptive. When using the unbounded acti-
vation function such as eq.(3), a step length should
be adaptive, since a fixed step length may guide the
search into undesirable directions. Since we employ
eq.(3) as the activation function, our C-BP always
adapts a step length doing line search.
The Quasi-Newton method requires only the gra-
dient at each iteration, but by measuring the changes
in gradients, it calculates the approximate of the in-
verse Hessian to make the method much better than
the steepest descent or sometimes more efficient than
the Newton method (Nocedal and Wright, 2006). Al-
though there are several ways of approximating the
inverse Hessian, the BFGS update is considered to
work best. Complex-valued BFGS (C-BFGS) (Popa,
2015) is a complex-valued version of quasi-Newton
method with the BFGS update. The performance of
C-BFGS was reported to exceed those of other exist-
ing learning methods.
3.2 New Learning Method: C-SSF
A new learning method called Complex-valued Sin-
gularity Stairs Following (C-SSF) was recently pro-
posed, and then two kinds of modifications have been
done to significantly improve its performance. The
latest version is shown in (Satoh and Nakano, 2015b).
C-SSF starts search from C-MLP(J =1) and then
gradually increases the number J of hidden units one
by one until the specified number J
max
.
When searching C-MLP(J), the method begins
with applying reducibility mapping to the optimum
of C-MLP(J−1) to get two kinds of singular regions
b
Θ
Θ
Θ
(J)
αβ
and
b
Θ
Θ
Θ
(J)
γ
. Since the gradient is zero all over the
singular region, C-SSF calculates eigenvalues of the
Hessian to find descending directions. Following the
direction of the eigenvector corresponding to a neg-
ative eigenvalue, the method can descend the search
space. After leaving the singular regions, the method
employs C-BFGS as a search engine from then on.
The processing time gets larger as the number J
of hidden units gets large. This is natural because
the number of search routes increases as J gets large.
To make C-SSF much faster without deteriorating so-
lution quality, the following speeding-up techniques
were introduced (Satoh and Nakano, 2015b).
One is search pruning. In the search, we often ob-
tain duplicate solutions. Considering that duplicates
are obtained via much the same search routes, we in-
troduced search pruning to speed up the method by
monitoring the redundancy of search routes. In the
search of C-MLP(J), search points are stored at a cer-
tain interval (100 steps in our experiments) and the
current search line segment is checked at the certain
interval to see if it is close enough to any of the previ-
ous search line segments. If the condition holds, the
current search route is instantly pruned.
The other is to set the upper bound S
max
on the
number of search routes. That is, the number of the
search routes for each C-MLP(J) is limited by S
max
.
To implement this, we calculate eigenvalues of all ex-
pected initial points on the singular regions. Then,
we pick up the limited number of negative eigenval-
ues in ascending order, and perform search using their
eigenvectors. We assume the larger convex curva-
ture at a starting point may suggest the better solution
quality at the end of the search.
4 EXPERIMENTS
We performed experiments to evaluate how the per-
formance of C-MLPs depends on learning methods
using three quite different types of learning methods
and five different types of datasets.
As learning methods, we employed C-BP, C-
BFGS, and C-SSF. As described previously, they per-
form search in quite different paradigms. Our C-BP
always calculates a reasonable step length in the di-
rection of the gradient. Both C-BP and C-BFGS run
100 times independently changing initial weights for
each J. As for C-SSF, the upper bound of search
routes S
max
was set to 100 for each J, the free param-
eters of singular regions were set as follows: w
(J)
1,0
=
−1, 0, 1 and q = 0.5, 1.0, 1.5.
The common learning conditions are mentioned
below. The number of hidden units was changed as
J = 1, ··· ,20. As for initial weights, real and imag-
inary parts of each weight were randomly generated
from the range of (0, 1). Each method was termi-
nated if the number of sweeps exceeded 1,000 or the
step length of line search was smaller than 10
−8
.