The external activation of the input neurons causes via the different functions the
“spread of information”, determined by the respective weight values. In the case of
simple feed forward networks the final activation values of an output layer are
immediately generated; in the case of feed back networks or recurrent ones the output
is generated in a more complex manner; yet in the end in all cases a certain output
vector is generated, i.e., each neuron of the output layer, if there is any, has obtained a
certain activation value. If there is no distinction between different layers as for
example it is the case with a Hopfield network or an interactive network the output
vector will consist of the final activation values of all neurons. Note that except in the
case of feed forward networks the output vector may be an attractor with a period p >
1. The network will then oscillate between different vectors, i.e. between different
states of the attractor. For theoretical and practical purposes neural networks are
mainly analyzed with respect to the input-output relation. Therefore, we define the
final state
S
f
of a neural network as S
f
= ((A
i
), (A
f
)), if (A
i
) is again the input vector
and (A
f
) the final output vector. If (A
f
) is an attractor with period p > 1, then the
components of (A
f
) consists of ordered sets, i.e. the set of all different activation
values the output neurons obtain in the attractor.
Because in the experiments described below we investigate only the behavior of
feed forward networks with respect to different MC-values, for practical purposes we
just define the final state as the values of the output vector after the external activation
via the input vector. Hence we speak of a large basin of attraction if many different
input vectors generate the same output vector and vice versa. The limiting case MC =
1 for example defines a network where each different input vector generates a
different output vector. Accordingly the case M = 1/n defines a network where
practically all n different input vectors generate the same output vector.
With these definitions it is easy to explain and measure in a formal manner the
characteristics of neural networks with respect to robustness. A robust network, i.e. a
network that is tolerant of faulty inputs, has necessarily a MC-value significantly
smaller than 1. Robustness means that different inputs, i.e. inputs that differ from the
correct one, still will generate the “correct” output, i.e. that output that is generated by
the correct input. That is possible only if some faulty inputs belong to the same basin
of attraction as the correct input; these and only these inputs from this basin of
attraction will generate the correct output. All other faulty inputs transcend the limits
of tolerance with respect to the correct output and will accordingly generate another
output. If MC = 1 or near 1 then the network will not be robust at all for the respective
reasons.
The same explanation can be given for the also frequently quoted capability of
neural networks to “generalize”: In a formal sense the generalizing capability is just
the same as robustness, only looked upon from another perspective. A new input can
be perceived as “similar” or as “nearly the same” as an input that the net has already
learned if and only if the similar input belongs to the same basin of attraction as the
input the network has been trained to remember. In other words, the training process
with respect to a certain vector automatically is also a training process with respect to
the elements of the according basin of attraction. The capability of generalization,
hence, can be understood as the result of the construction of a certain basin of
attraction. Accordingly the generalization capability is again dependent on the MC-
values: if these are small, i.e. if the basins of attraction are rather large, then the
15