classification problem. The classification efficiency
was not improved when the ANN contained more
than one hidden layer: the ANN remembered the
training examples more successfully but lost the
ability to generalize them. Therefore, it was decided
to set the max number of hidden layers equal to 1
and the max number of neurons in a hidden layer
equal to 32 with a reserve on the safe case.
We obtained some optimization tasks of different
dimensionality with this encoding. The
dimensionality depends on the sizes of blocks in the
hidden and output layers. These sizes were chosen as
the following: all aliquot parts of 32 and 20 were
sorted in ascending order and then were grouped in
pairs. There were 6 different pairs in total.
We used resource proportional to the length of a
binary string in each case. The max length of the
binary string with the blocks (1,1) is equal to 136. In
this case we used 110 generations and a population
size equal to 110. The min length of the binary string
with the blocks (32,20) is equal to 5. In this case we
used exhaustive search instead of GA due to the
small volume of the search space.
The solutions obtained include one hidden layer
with 30 neurons (for the blocks pairs containing the
first block size which is an aliquot part of 30) or 32
neurons (for other blocks pairs) in it. A t-test with
the confidential probability 0.95 was performed for
all different pairs of the solutions. Generally, the
classification effectiveness depends on the sizes of
blocks: the ANN with the blocks (2,2) works
significantly worse than the others (F-score =
0.670), the ANN with the blocks (8,5) works
significantly better than the others (F-score = 0.684),
and there is no significant difference between the
ANNs with the blocks (1,1), (4,4), (16,10) and
(32,20). All these implementations of the ANN
structural optimization work significantly better than
the error backpropagation algorithm for the fixed
ANN structure. However, they are more time-
consuming; the simplest version requires
approximately 10 minutes, and the most complicated
one - 11 hours.
4.4 The Novel Approach to ANN
Structure Optimization with GA
We found that the baseline encoding was excessive
and proposed a novel approach to the ANN structure
representation. In fact, the novel encoding is a
simplified version of the original one. It deals with
whole ANN layers, not with separate blocks of
neurons; all the neurons in one layer have the same
activation function. However, a new tuning tool was
added; all the activation functions were parametrized
with the parameter a. It is possible to get different
forms of each activation function changing this
parameter.
In that way, it is necessary to set the max number
of hidden layers, the max number of neurons in each
hidden layer, the number of activation functions and
to discretize the parameter a. The required number
of bits for each entity can be found after all. The
general length of the binary string is calculated as
follows:
aa
kkkkmnl
)(
1
,
where n
1
is the max number of hidden layers, m is
the number of bits for coding the number of neurons
in each hidden layer, k is the number of bits for
coding the activation function kind and k
a
is the
number of bits for coding the parameter a. We
obtained a less flexible model than the original one,
however, it has reasonable dimensionality, fewer
parameters for tuning, and there is no transposition
sensitivity. It requires much less resource of the GA
and works much faster. In our case, the length of the
binary string equals 19; it is 7 times smaller than the
max binary string length for the baseline approach
with the blocks (1,1). Moreover, the difference
between the dimensionalities increases exponentially
when we increase the max structure of an ANN.
Other options of GA such as a fitness function
calculation method and genetic operators remain the
same as in the baseline approach.
We obtained a GA implementation with effective
convergence and reasonable resource consumption
(a population size was 50 individuals, the generation
number was 50; the resource is 5 times less than for
the baseline approach with the blocks (1,1)).
The results obtained show that the novel GA of
the ANN structural optimization performs well at the
call routing task; the mean F-score with TRR is
equal to 0.684; according to the t-test with the
confidential probability 0.95, it is significantly better
than the RapidMiner implementation of a simple
ANN trained with the error backpropagation
algorithm. Moreover, it is also significantly better
than the baseline GA-based ANN structural
optimization. There is the only case when the novel
and the baseline encodings have no significant
difference: when the baseline approach uses the
blocks (8,5). Generally, the structure of the solution
obtained (1 hidden layer with 30 neurons) remains
the same as with the baseline approach. At the same
time, the novel approach requires much less resource
and computational time than the baseline one in
ANovelApproachtoNeuralNetworkDesignforNaturalLanguageCallRouting
107