Derivative Free Training of Recurrent Neural Networks
A Comparison of Algorithms and Architectures
Branimir Todorović
1,4
, Miomir Stanković
2,4
and Claudio Moraga
3,4
1
Faculty of Natural Sciences and Mathematics, University of Niš, Niš, Serbia
2
Faculty of Occupational Safety, University of Niš, Niš, Serbia
3
European Centre for Soft Computing, 33600 Mieres, Spain
4
Technical University of Dortmund, 44221 Dortmund, Germany
Keywords: Recurrent Neural Networks, Bayesian Estimation, Nonlinear Derivative Free Estimation, Chaotic Time
Series Prediction.
Abstract: The problem of recurrent neural network training is considered here as an approximate joint Bayesian
estimation of the neuron outputs and unknown synaptic weights. We have implemented recursive estimators
using nonlinear derivative free approximation of neural network dynamics. The computational efficiency
and performances of proposed algorithms as training algorithms for different recurrent neural network
architectures are compared on the problem of long term, chaotic time series prediction.
1 INTRODUCTION
In this paper we consider the training of Recurrent
Neural Networks (RNNs) as derivative free
approximate Bayesian estimation. RNNs form a
wide class of neural networks with feedback
connections among processing units (artificial
neurons). Neural networks with feed forward
connections implement static input-output mapping,
while recurrent networks implement the mapping of
both input and internal state (represented by outputs
of recurrent neurons) into the future internal state.
In general, RNNs can be classified as locally
recurrent, where feedback connections exist only
from a processing unit to itself, and globally
recurrent, where feedback connections exist among
distinct processing units. The modeling capabilities
of globally recurrent neural networks are much
richer than that of the simple locally recurrent
networks.
There exist a group of algorithms for training
synaptic weights of recurrent neural networks that
are based on the exact or approximate computation
of the gradient of an error measure in the weight
space. Well known approaches that use methods for
exact gradient computation are back-propagation
through time (BPTT) and real time recurrent
learning (RTRL) (Williams and Zipser, 1989;
Williams and Zipser, 1990). Since BPTT and RTRL
are using only first-order derivative information,
they exhibit slow convergence. In order to improve
the speed of the RNN training, a technique known as
teacher forcing has been introduced (Williams and
Zipser, 1989). The idea is to use the desired outputs
of the neurons instead of the obtained to compute the
future outputs. In this way the training algorithm is
focused on the current time step, given that the
performance is correct on all earlier time steps.
However, in its basic form teacher forcing is not
always applicable. It clearly cannot be applied in
networks where feedback connections exist only
from hidden units, for which the target outputs are
not explicitly given. The second important case is
the training on noisy data, where the target outputs
are corrupted by noise. Therefore, to apply teacher
forcing in such cases, a true target outputs of
neurons have to be estimated somehow.
The well-known extended Kalman filter
(Anderson and Moore, 1979), as a second order
sequential training algorithm and state estimator
offers the solution to the both stated problems. It
improves the learning rate by exploiting second
order information on criterion function and
generalizes the teacher forcing technique by
estimating the true outputs of the neurons.
The extended Kalman filter can be considered as
the approximate solution of the recursive Bayesian
state estimation problem. The problem of estimating
the hidden state of a dynamic system using
observations which arrive sequentially in time is
76
Todorovi
´
c B., Stankovi
´
c M. and Moraga C..
Derivative Free Training of Recurrent Neural Networks - A Comparison of Algorithms and Architectures.
DOI: 10.5220/0005081900760084
In Proceedings of the International Conference on Neural Computation Theory and Applications (NCTA-2014), pages 76-84
ISBN: 978-989-758-054-3
Copyright
c
2014 SCITEPRESS (Science and Technology Publications, Lda.)