A modular architecture, which should generalize better than a monolithic network,
involves the difference between local and global generalization. Modular architec-
tures perform local generalization in the sense of the architecture that only learns
patterns from a limited region of the input space. Therefore training a modular archi-
tecture on a training pattern from one of these regions should not ideally affect the
architecture’s performance on pattern from the other regions. Modular architectures
tend to develop representations that are more easily interpreted than the representa-
tions developed by single networks. As a result of learning, the hidden units of the
system used in separate networks for the tasks contribute to the solutions of these
tasks in more understandable ways that the hidden units of the single network applied
to both tasks. In modular networks, a different set of hidden units is used to represent
information about the different tasks, see figure 3.
5 Experiments
During our experimental work, we made a very easy comparative adaptation study. In
order to test the efficiency of described algorithms, we applied them to the Fisher's
Iris data set [7] that is the bench test database from the area of machine learning. The
Fisher's Iris data set is a multivariate data set introduced by Sir Ronald Aylmer Fish-
er. The dataset consists of 150 samples from each of three species of Iris flowers (Iris
setosa, Iris virginica, and Iris versicolor). Four features were measured from each
sample; they are the length and the width of sepal and petal. Based on the combina-
tion of the four features, Fisher developed a linear discriminated model to determine
which species they are. We used 100 examples for training and the remainder for
testing.
Neural Network Tree. Expert neural networks are multilayer perceptrons of the
same size 4 - 2 - 2, where four neurons are in the input layer, two neurons are in the
hidden layer, and two neurons are in the output layer. For any given examples it is
assigned to the i-th subset, if the i-th output neuron has the largest value (when this
example is used as input). All nets are fully connected. To find such expert neural
network for each node, we adopt the genetic algorithm, which has the following
parameters: number of generation is 1000, the population size is 30, selection rate is
0.9 (e.g. 90% of individuals with low fitness values are exchanged in each
generation), crossover rate is 0.7, and mutation rate is 0.01. The number of bits per
weight is 16. The fitness is defined directly as the gain ratio. The desired fitness is
0.9. The maximum fitness is 1.0 from its definition. All individuals are sorted
according to their priority ranks, and the worst p × N individuals are simply deleted
from the population, where p is the selection rate, and N is the population size. In
each experiment, we first extract a subnet from the whole training set, and use it for
designing (training) a neural network tree. Of course, we count the number of
neurons contained in the whole tree.
28