neurons in the input layer (one neuron for each value
from event), the output layer had just one neuron, rep-
resenting benign or malignant event. The characteris-
tics of the hidden layer have already been defined in
the previous section.
In order to determine the value of the learning
rate and the mean square error used in the tests,
some ANNs were trained using only the traditional
BP. From these tests, the best values for the learn-
ing rate and the mean square error were 0.0001 and
0.0145, respectively. These values indicate a success
rate around 98%, which corresponds to the percent-
age of times that the network could indicate correctly
a benign or malignant event, after training.
Concerning the size of the interval for correlation
coefficient calculation, many tests were performed in
order to define empirically the adequate amount of it-
erations. For instance, using just 10 iterations, good
results for the training iterations reduction were ob-
tained, but it wasted time as much as the traditional
BP. With 50 iterations, all local minima could not be
avoided and some tests were unsuccessful for the de-
sired knowledge. The amount of 20 iterations showed
more suitable results for all criteria, that means, good
reduction in the number of iterations, not much time
for running and successful knowledge were achieved.
Therefore, the following results were performed using
this size of iterations.
It is worth noting that only five tests using BP
with momentum term and α = 1 (momentum term con-
stant) reached the desired error, thus those cases were
deleted from the next two tables.
5.2 Results
After the first stage, some results were generated,
which are presented in Table 1, showing the average
number of iterations on the 20 trainings.
According to Table 1, using the methodology pro-
posed in this paper, the case with the largest number
of iterations was using seven neurons in the hidden
layer only, i.e. 1,523 iterations for achieving the de-
sired error. On the other hand, Table 1 shows the least
iterations were obtained with 105 neurons in the hid-
den layer, when just 195 iterations was necessary to
reach this error. Also according to the same table,
with 35 neurons in the hidden layer, there was a great
reduction in the number of iterations: from 5,814 us-
ing the traditional BP to just 363 iterations with the
proposed methodology. At the same time, for com-
parison purpose, using BP with momentum term and
α = 0.5, the number of iterations dropped to 2,988.
From the values on Table 1, Figure 1 was gener-
ated, which shows a graph of the percentage of re-
duction in the number of iterations provided by the
proposed method when compared to other methods
tested, for each amount of neurons in the hidden layer.
The graph demonstrates the best performance of BP
with Selective Momentum Term, because when com-
pared to other approaches, in all cases there was a re-
duction in the number of training iterations.
Yet in Figure 1, the average reduction provided
by this research was 92% compared to traditional BP
and 62% compared to BP with momentum term and
α = 0.9. The reduction compared to BP with max-
imum momentum term cannot be calculated, due to
the existence of training that were trapped in a local
minimum and therefore no results were obtained with
that approach. However, one can highlight that the
BP with selective momentum term was able to avoid
all existing local minima, because it was successful in
all training.
Still comparing the criteria of speed, Table 2
shows the average time in seconds of performed train-
ing. In this table, the case of the lowest average time
with the methodology proposed in this paper occurred
when the hidden layer had 14 neurons, in this case
the desired error was reached in just 2.36 s. With the
same number of neurons in the hidden layer, using
the BP with momentum term and α = 0.1, the aver-
age time was 22.79 s. In percentage, one can notice
the proposed method could reduce 79% of the time of
traditional BP and was result slightly better than the
BP with momentum term and α = 0.9.
Table 3 was also produced in the first stage. It
shows the results of generalization tests for each ap-
proach, where the average hit rate, calculated over 20
trainings (20 different weight initializations), is pre-
sented. It is noticed that the variation in hit rates is
very small, except for BP with maximum momentum
term, due to the trainings were trapped in a local min-
imum. It is worth mentioning that the results for BP
with Selective Momentum Term of the first stage were
achieved using the Pearson correlation coefficient.
After the second stage, Tables 4 and 5 were gener-
ated. All data were obtained taking into consideration
the same characteristics of ANNs used to produce the
tables 1 and 2. But now, the comparison is related
to use of correlation coefficients, not the methods of
training.
According to Table 4, the highest amount of iter-
ations using this proposal occurred when there were
seven neurons in the hidden layer and when the Pear-
son correlation coefficient was used, in this case 4,524
iterations were necessary to achieve the desired error.
The lowest number of iterations was obtained with 98
neurons in the hidden layer using Spearman correla-
tion coefficient, when just 201 iterations were neces-
Acceleration of Backpropagation Training with Selective Momentum Term
447