
The
0
l
=1 is the bias for neuron. The training of
network is realized through the Back-Propagation
algorithm.
To handle the temporal information (time
patterns) for the times series data, a neural network
must be “short-term memory”, the primary role of is
to remember the past signal. There are two attributes
of memory structure: depth and resolution. The
memory depth defines the length of the time window
available in the memory structure to store past
information. The deeper the memory, the further it
holds information from the past. Memory resolution
defines how much information will be remembered
in a given time window. The so-called “tapped delay
line” is the simplest and most commonly used form
of “short-term memory”.
A popular neural network that uses ordinary time
delays to perform temporal processing the so-called
time delay neural network (TDNN), which was first
described in Lang and Hinton (1988) and Waibel et
al. (1989). The TDNN is a multilayer feedforward
network whose hidden neurons and output neurons
are replicated across time. It was devised to capture
explicitly the concept of symmetry time as
encountered in the recognition of an isolated word
(phoneme) using a spectrogram.
The main goal in development of TD-neural
networks was to have a neural network architecture
for non-linear feature invariant classification under
translation in time or space. TD-Networks uses
built-in time-delay steps to represent temporal
relationships. The invariant classification translation
is realized by sharing the connection weights of the
time delay steps. The activation of each TD-neuron
is computed by the weighted summation of all
activations of predecessor neurons in a input
window over time and applying a non-linear
function (i.e. a sigmoid function) to the sum.
Time-Delay neural network (TDNN) is a
dynamic neural network structure that is constructed
by embedding local memory (tapped delay line
memory) in both input and output layers of a
multilayers feed forward neural network. Both, its
input and output are time series data.
3 GRADIENT DESCENDENT:
LEVENBERG-MARQUARDT
Like the quasi-Newton methods, the Levenberg-
Marquardt algorithm was designed to approach
second-order training speed without having to
compute the Hessian matrix. When the performance
function has the form of a sum of squares (as is
typical in training feedforward networks), then the
Hessian matrix can be approximated as
and the gradient can be computed as
where the Jacobian matrix that contains first
derivatives of the network errors respect to the
weights and biases, and
e is a vector of network
errors. The Jacobian matrix can be computed
through a standard Backpropagation technique that
is much less complex than computing the Hessian
matrix.
The Levenberg-Marquardt algorithm uses this
approximation to the Hessian matrix in the following
Newton-like update:
When the scalar µ is zero, this is just Newton's
method, using the approximate Hessian matrix.
When µ is large, this becomes gradient descent with
a small step size. Newton's method is faster and
more accurate near an error minimum, so the aim is
to shift towards Newton's method as quickly as
possible. Thus, µ is decreased after each successful
step (reduction in performance function) and is
increased only when a tentative step would increase
the performance function. In this way, the
performance function will always be reduced at each
iteration of the algorithm.
We use the algorithm implemented by
MATLAB. TRAINLM can train any network as
long as its weight, net input, and transfer functions
have derivative functions. We use the version 6.5
of MATLAB and toolbox of neural network with a
license of campus available in Universidad of
Granada.
4 RESULTS
To continue, we apply this model of neural network
to resolve the problem of forecasting of indebtedness
economic of the autonomous community Spanish in
the period between year 1986 and 2000. We have a
set of values of each community. These values
reflect the ratio of indebtedness. The TDNN utilized
is formed by 20 neuron input and a neuron output.
Moreover, we utilized three step delay in the input
layer.
In the following figure are represented the real
values (blue) and the predicted values (red). The
sixteen value is the prediction to year 2001. The axis
ICEIS 2004 - ARTIFICIAL INTELLIGENCE AND DECISION SUPPORT SYSTEMS
458