was stopped if: (i) the GL
5
criterion was satisfied
twice (to avoid initial oscillations in validation
errors); (ii) the training progress criterion was met,
with P
5
(t) < 0.1; or (iii) a maximum number of
iterations is reached.
Subset Selection - Classical Methods. In ex-
periments, we use different search strategies (For-
ward, Backward and Random), considering the char-
acteristics of the search technique. The Forward strat-
egy starts with the empty set and adds features. The
Backward strategy starts with the full set and deletes
features. The Random approach starts from a ran-
dom set and randomly performs the addition and re-
motion of features. Unlike the others, the Random
Bit Climber method has attributes removed and added
during the search process. Thus, in order to carry out
the the search in different directions, we used different
initial states - initial solution without features, with all
features and randomly selected features. The Las Ve-
gas and the proposed GaTSa methods implement their
own search strategies.
To determine the classification accuracy for the
classical methods (Hill-Climbing, Best-first, Random
Bit Climber and Las Vegas), a K-Nearest Neighbor
(k-NN) classification algorithm is used. In the k-NN
algorithm, the number of k is 7 defined empirically.
4 RESULTS AND DISCUSSION
For SA, TS and GA the maximal topology in the Ar-
tificial Nose data set (A) contains six input units, ten
hidden units and three output units (N1 = 6, N2 = 10
and N3 = 3, the maximum number of connections
(N
max
) is equal to 90). In the Iris data set (B) N1 = 4,
N2 = 5, N3 = 3 and N
max
= 32. For the Thyroid data
set (C) N1 = 21, N2 = 10, N3 = 3 and N
max
= 240.
In the Diabetes data set (D) N1 = 8, N2 = 10, N3 = 2
and N
max
= 100. In the Mackey-Glass (E) experi-
ments N1 = 4, N2 = 4, N3 = 1 and N
max
= 50. In
all neural network topologies, N1 and N3 values are
problem-dependent and N2 was obtained in experi-
ments from (Zanchettin and Ludermir, 2006). For
GaTSa, the same values for N1 and N3 are used, but
the value of N2 is optimized, together with the net-
work weights and connections, in a constructive man-
ner.
Figure 1 displays the average performance of each
optimization technique investigated. These results
were obtained for each technique in the optimization
of the number of connections and weight connection
values of an MLP artificial neural network. The pa-
rameters evaluated were: (1) Squared Error Percent-
age (SEP) and the classification error (Class) of the
test set; and (2) Percentage of network connections.
This figure displays the average results of 10 simula-
tions. Each simulation contains 30 different runs of
the algorithms.
Genetic algorithms, tabu search and simulated an-
nealing methods incorporate domain specific knowl-
edge in their search heuristics. They also tolerate
some elements of non-determinism, which helps the
search escape from local minima. The proposed in-
tegration combines these advantages in order to use a
larger amount of information in the problem domain
and apply this information to practically all search
phases. The initial solution is coded with a minimum
valid network topology and hidden nodes are inserted
in the network topology during algorithm execution.
This process is similar to constructive neural network
training and allows better topology selection. More-
over, the proposed methodology has two well-defined
stages: a global search phase, which makes use of the
capacity for generating new solutions from the ge-
netic algorithms, the cooling process and cost func-
tion of the simulated annealing as well as the memory
characteristics of the tabu search technique; and a lo-
cal search phase, which makes use of characteristics
such as gradient descending for a more precise solu-
tion adjustment.
For all data sets, the optimized neural network
obtain a lower classification error than those ob-
tained by MLP networks without topology optimiza-
tion (Zanchettin and Ludermir, 2006) and the mean
number of connections is much lower than the maxi-
mum number allowed. Greater number of simulations
the best performance to optimize MLP architecture
was obtained by the method GaTSa.
It is important to note that in experiments with
GaTSa the average of connections number was com-
puted in relation to the maximum network topology
generated, rather than being calculated with the max-
imum fixed topology (as in the other models). This
seemed to be the fairest approach, however, in some
ways its harmed the model, because most of the time
the proposed method has generated topology with less
connections than the maximum allowed.
Statistically, the GaTSa method achieves better
optimization of the architecture input nodes. The
MLP performance obtained from the optimized
neural networks was statistically equivalent for the
Thyroid, Diabetes and Mackey Glass data sets. The
GaTSa method obtained better results in the Artificial
Nose data set, whereas GA had the best performance
in the Iris data set.
GaTSa - The Cost Function Influence. Table 2
displays the experiment results, these values are the
average performance from 10 simulations. Each
ICEIS 2009 - International Conference on Enterprise Information Systems
244