using test data of 1,000 data points without noise,
generated independently of training data. Test error
for real data was evaluated using one segment among
10 segments; the remaining 9 segments were used for
training.
4.1 Experiment using Artificial Data
Our artificial data set was generated using MLP hav-
ing weights shown in the following equations. Values
of variables x
1
,··· , x
10
were randomly selected from
the range (0,1). Note that five variables x
1
,··· ,x
5
contribute to y, but the other five variables x
6
,··· ,x
10
are irrelevant. We included irrelevant variables to
make the learning harder. Values of y were generated
by adding small Gaussian noise N (0,0.05
2
) to MLP
outputs. The sample size was 1,000 (N=1,000). We
set J
min
=15 and J
max
=24, while the original J=22.
[w
0
,w
1
,··· ,w
22
]
= [−11,−12,−10,−6,4,20,18,−3, 12,−18,−17,17,−1,
13,−9,9,−3,14,1− 18,15,12, 13],
[w
1
,w
2
,··· ,w
22
]
=
3 −7 −2 6 −8 7 −3 10 −5 10 −3
−3 3 −6 2 −7 −8 9 2 3 1 −5
9 1 4 −1 −4 −3 −8 2 −4 1 −9
−8 −8 3 −6 −7 −10 1 5 0 2 −4
3 −4 5 7 −9 −4 −4 0 4 6 −8
−10 −6 −1 2 4 5 −10 −9 −5 −3 2
−1 1 10 −6 −7 1 −1 3 −3 7 −6
0 0 −8 2 −7 −4 3 −3 −7 3 10
−5 −10 −8 9 −7 −9 −2 −2 1 4 −1
5 8 6 −1 9 −6 3 −1 5 −2 0
6 2 −2 2 10 4 −10 8 −9 −5 3
2 −2 −5 9 −8 5 4 −3 7 6 5
In our experiments using artificial data, we
examine how the performance of SSF2 is influenced
by our decrement method. Below we examined in
two ways 1a and 1b.
(1) Experiment 1a
First, in the decrement method, only singular re-
gion
b
Θ
αβ
J
was considered, and every hidden unit
was tested as a candidate to delete. In eq. (9), we
changed the number of guiding steps in three ways:
one step, three steps (a = 2/3, 1/3,0), and ten steps
(a = 9/10,8/10, · · · ,1/10,0). They are referred to as
SSF2(1 step), SSF2(3 steps), and SSF2(10 steps) re-
spectively. Note that SSF2(1 step) is nothing but sim-
ple deletion.
Figure 1 shows the solution quality of each
method, showing the best training errors and cor-
responding test errors. Each SSF2 method made a
round trip, at first decreasing J from 24 until 15, and
then increasing J until 24, while SSF1.3 went on in-
creasing J. SSF1.3 outperformed BPQ both in train-
ing and test. In each SSF2 method, the second half
5 10 15 20
0
1000
2000
3000
4000
5000
6000
CPU time (sec.)
J
BPQ
SSF1.3
(a) BPQ, SSF1.3
5 10 15
0
1000
2000
3000
4000
5000
6000
CPU time (sec.)
t
SSF2(1 step)
SSF2(3 steps)
SSF2(10 steps)
(b) SSF2
Figure 2: CPU time for Experiment 1a.
(increase phase) worked better than the first half (de-
crease phase) in both training and test. SSF2(1 step)
worked rather poorly especially in test. However,
SSF2(3 steps) and SSF2(10 steps) worked in much the
same manner, and their increase phases were almost
equivalent to that of SSF1.3. SSF2(3 steps), SSF2(10
steps) and SSF1.3 indicate J=18 is the best model,
while BPQ indicates J=21 is the best.
Figure 2 shows CPU time required by each
method. The horizontal axis t in Fig. 2(b) indi-
cates how many times J was changed; thus, t=1,···,10
corresponds to the decrease phase from J=24 until
15, and t=10,···,19 means the increase phase from
J=15 until 24. Each method has a tendency to re-
quire longer CPU time as J increases. Among SSF2
methods, SSF2(1 step) spent the longest because it
required the largest number of search routes in its in-
crease phase. The total CPU time of BPQ, SSF1.3,
SSF2(1 step), SSF2(3 steps), and SSF2(10 steps)
were 5h28m, 6h30m, 7h20m, 5h1m, and 6h1m re-
spectively. Hence, both SSF2(3 steps) and SSF2(10
steps) were faster than SSF1.3.
Based on the results of Experiment 1a, we consid-
ered SSF(3 steps) as the most promising when using
only singular region
b
Θ
αβ
J
.
SingularityStairsFollowingwithLimitedNumbersofHiddenUnits
183