variables or dependent variables.
To be sure, this double usage of the activation values is no logical circle. Yet it
seems rather redundant to use the activation values twofold. Hence it would be
desirable to construct learning rules where the activation or output values are used
only as performance variables. Mathematically the rules would become simpler,
which would be an advantage for teaching, programming, and application purposes.
In addition, from a theoretical point of view it is rather unsatisfactory that the
specific learning rules for different types of learning, particularly supervised learning
and self-organized learning, have not much in common, besides the general aspects of
Hebb’s principle. The Backpropagation rule as a paradigm for supervised learning for
example has on a first and second sight not much to do with the well-known “Winner-
takes all” rule that is used in Kohonen Feature Maps for self-organized learning.
Hence it would be desirable to construct a general learning schema that would be the
basis for learning rules, applied to different types of learning, and that is also based on
Hebb’s proven principle - in neuroinformatics and biology (cf. e.g. [6]).
In a formal sense its essence certainly is the increasing or decreasing respectively
of the weight values between sending neurons and a receiving artificial neuron, as
many learning rules take into account. By leaving out the output values of the
respective neurons we obtain a general learning schema in its simplest form as
w
ij
= c and 0 c 1.
(2)
The constant c has the same function as the learning rate in different standard
learning rules. If one uses for example the linear activation function
A
j
w
ij
A
i
,
(3)
then in large networks the activation values frequently become too large. In order to
avoid such an increasing equation (2) can be extended by introducing a “dampening
factor”. Then the schema becomes
w
ij
= c * (1 – w
ij
(t)
(4)
if w
ij
(t) is the according weight value at time t, i.e. before changing. The dampening
factor is used in order to keep the weight values in the interval (-1, 1).
Equations (2) and (4) are just schemas, namely the General Enforcing Rule
Schemas (GERS). The application to different types of learning needs in the two
types investigated by us, namely supervised learning and self-organized learning,
additional components. We chose these types of learning because they are by far the
most important ones in the usage of neural networks. By the way, when dealing with
reinforcement learning equation (4) can be used directly as the according learning
rule.
For example, an according learning rule for self-organized learning, which we
developed for a self-organized learning network (Self Enforcing Network, SEN), is
w(t+1) = w(t) + w and
w = c
*
v
sm
,
(5)
c is again a learning rate and v
sm
is the according value in a so-called semantical
matrix, i.e. the data base for the learning process (cf. [2]; [3]). Obviously this is a
direct application of GERS to this type of learning without taking into regard the
28