The combinations that yielded the highest mean ac-
curacy on the test set were chosen for the experiment,
that is α = 0.1, η = 0.4 for eRTRL and α = 0.6,
η = 0.5 for eBPTT.
Training was stopped when the mean squared er-
ror of an epoch fell below 0.01 and thus, the net-
work was considered to have successfully learned the
task. For other cases training was cancelledafter 1000
epochs. Table 1 shows the results for eRTRL and
eBPTT for sequences of length T from 60 to 130.
The column for the number of successfully trained
networks (#suc) in Tab. 1 clearly shows a decrease
for eBPTT with the length of the sequences T. On
the other hand, nearly all networks were trained suc-
cessfully with eRTRL. Therefore, we can state that
eRTRL is generally able to cope better with longer
ranges of output dependencies than eBPTT.
The pure mean number of epochs (#eps) that were
needed for training is somewhat misleading. Over the
whole experiment eBPTT needs an average of 243.3
epochs for successful training while eRTRL needs
only 67.1 epochs. It is important to note that this does
not indicate that eRTRL training takes less time than
eBPTT. The high computational complexity of Real-
Time Recurrent Learning (O (n
4
)), and therefore also
of eRTRL, results in a much longer computation time
for a single epoch compared to eBPTT. This becomes
more and more evident with increasing network size.
Figure 4 shows the time that is needed to train an SM-
RNN for 100 epochs (T = 60, set size 50) depending
on the number of neurons in the hidden layers
1
. For
a considerable big network with n
x
= n
y
= 100 the
training took about 3 minutes with eBPTT and 21.65
hours with eRTRL.
0 200 400 600 800 1000
0
5
10
15
20
22
number of neurons in hidden layers n
x
= n
y
time for training in hours
eBPTT
eRTRL
Figure 4: Computation time for training depending on the
number of neurons in the hidden layers of the network.
Training lasted 100 epochs of 50 sequences of length T =
60.
1
Both algorithms were implemented in Matlab. Training
was done on a AMD Opteron 8222 (3GHz), 8GB RAM,
CentOS, Matlab R2011b (7.13.0.564) 64-bit.
The third column in Tab. 1 shows the performance
of successfully trained networks on the test set (Acc.).
For eBPTT we could observe higher accuracies than
for eRTRL. It is also reflected by the overall accuracy
of 96.8% for eBPTT compared to 89.2% for eRTRL.
This implies, that successful learning with eBPTT
guaranteed better generalisation.
4 DISCUSSION
Even though eRTRL was generally better able to cope
with the latching of information over longer periods
of time, the networks that finally learned the task with
eBPTT showed higher accuracies on the test set.
Altogether, the question which learning algorithm
to use for a specific task strongly depends on the char-
acter of the problem at hand. For small networks, as
used for the experiment in Tab. 1, the choice depends
on the timespan that has to be bridged. If we expect
the output to be dependent on inputs that are com-
paratively shortly ago (T = 60,.. . , 100) eBPTT pro-
vides the better choice. There is a high chance for a
successful training of the network with a good gen-
eralisation. When the outputs depend on inputs that
appeared long ago (T > 130), the eRTRL algorithm
provides the better solution. It guarantees a success-
ful network training where eBPTT could hardly train
the network.
In real world problems, as speech recognition,
handwriting recognition or protein secondary struc-
ture prediction the data to be classified has not such
a compact representation as the strings in the infor-
mation latching task. To be able to learn from such
data the network size, that is, number of processing
units, has to be increased. As shown in Fig. 4, eRTRL
becomes simply impractical for large networks (train-
ing time: 3 minutes with eBPTT in contrast to 21.65
hours with eRTRL / n
x
= n
y
= 100). In these cases,
eBPTT becomes the only viable choice of a training
algorithm.
In future, the combination of both learning algo-
rithms might be a possibility to overcome the draw-
backs of both methods. It could reduce the compu-
tational complexity of eRTRL and increase eBPTT’s
ability to learn long-term dependencies.
ACKNOWLEDGEMENTS
The authors acknowledge the support provided by
the Transregional Collaborative Research Centre
SFB/TRR 62 “Companion-Technology for Cognitive
ExtensionofBackpropagationthroughTimeforSegmented-memoryRecurrentNeuralNetworks
455