attention mechanism, like the very prominent Trans-
former (Vaswani et al., 2017).
6 CONCLUSION
We defined the Random Memorization Task for
RNNs. Despite it contradicting the intuition be-
hind the usage of recurrent neural networks, classical
RNNs, LSTM and GRU networks were able to mem-
orize a random input sequence to a certain extent,
depending on configuration and architecture. There
is a discernible borderline between past positions in
the sequence that could be memorized and those that
could not, which correlates with the MC formulated
by Jaeger. We therefor conclude, that Jaegers MC for-
mula is applicable for calculating the memory limit in
respect to the RNN’s type and architecture.
While our experiments are very limited in scale
(due to the already computationally expensive nature
of the experiments), we observe the trend that more
cells increase the memory limit. The limiting fac-
tor that we observed, at least for vanilla-RNN’s, is
the VEGP. However, it is important to note that for
current RNN applications the ratio of input sequence
length and cells inside the RNN is usually in favor
of the number of cells, so that we would not expect
the memory limit to play an important role in the ap-
plication of the typical RNN. Still, we hope this re-
search represents an additional step towards a theo-
retical framework concerned with the learnability of
problems using specific machine learning techniques.
REFERENCES
Bengio, Y., Simard, P., Frasconi, P., et al. (1994). Learning
long-term dependencies with gradient descent is diffi-
cult. IEEE transactions on neural networks, 5(2):157–
166.
Buonomano, D. V. and Merzenich, M. M. (1995). Tem-
poral information transformed into a spatial code by
a neural network with realistic properties. Science,
267(5200):1028–1030.
Cho, K., Van Merri
¨
enboer, B., Gulcehre, C., Bahdanau, D.,
Bougares, F., Schwenk, H., and Bengio, Y. (2014).
Learning phrase representations using RNN encoder-
decoder for statistical machine translation. arXiv
preprint arXiv:1406.1078.
Chung, J., Gulcehre, C., Cho, K., and Bengio, Y.
(2014). Empirical evaluation of gated recurrent neu-
ral networks on sequence modeling. arXiv preprint
arXiv:1412.3555.
Cleeremans, A., Servan-Schreiber, D., and McClelland,
J. L. (1989). Finite state automata and simple recur-
rent networks. Neural computation, 1(3):372–381.
Dey, R. and Salemt, F. M. (2017). Gate-variants of gated
recurrent unit (GRU) neural networks. In 2017 IEEE
60th International Midwest Symposium on Circuits
and Systems (MWSCAS), pages 1597–1600. IEEE.
Engelbrecht, A. P. (2007). Computational intelligence: an
introduction. John Wiley & Sons.
Gallicchio, C. and Micheli, A. (2017). Deep echo state
network (deepesn): A brief survey. arXiv preprint
arXiv:1712.04323.
Giles, C. L., Miller, C. B., Chen, D., Chen, H.-H., Sun,
G.-Z., and Lee, Y.-C. (1992). Learning and extract-
ing finite state automata with second-order recurrent
neural networks. Neural Computation, 4(3):393–405.
Gonon, L., Grigoryeva, L., and Ortega, J.-P. (2020). Mem-
ory and forecasting capacities of nonlinear recur-
rent networks. Physica D: Nonlinear Phenomena,
414:132721.
Graves, A., Wayne, G., and Danihelka, I. (2014). Neural
turing machines. arXiv preprint arXiv:1410.5401.
Hochreiter, S. and Schmidhuber, J. (1997). Long short-term
memory. Neural Computation, 9(8):1735–1780.
Jaeger, H. (2001). The “echo state” approach to analysing
and training recurrent neural networks-with an erra-
tum note. Bonn, Germany: German National Re-
search Center for Information Technology GMD Tech-
nical Report, 148(34):13.
LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learn-
ing. Nature, 521(7553):436.
Lu, Y. and Salem, F. M. (2017). Simplified gating in
long short-term memory (LSTM) recurrent neural
networks. In 2017 IEEE 60th International Mid-
west Symposium on Circuits and Systems (MWSCAS),
pages 1601–1604. IEEE.
Maass, W., Natschl
¨
ager, T., and Markram, H. (2002). Real-
time computing without stable states: A new frame-
work for neural computation based on perturbations.
Neural computation, 14(11):2531–2560.
Mayer, N. M. (2017). Echo state condition at the critical
point. Entropy, 19(1):3.
Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986).
Learning representations by back-propagating errors.
nature, 323(6088):533.
Schmidhuber, J. (2015). Deep learning in neural networks:
An overview. Neural networks, 61:85–117.
Stanley, K. O. and Miikkulainen, R. (2002). Evolving neu-
ral networks through augmenting topologies. Evolu-
tionary computation, 10(2):99–127.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,
L., Gomez, A. N., Kaiser, L., and Polosukhin, I.
(2017). Attention is all you need. In NIPS.
Yildiz, I. B., Jaeger, H., and Kiebel, S. J. (2012). Re-visiting
the echo state property. Neural networks, 35:1–9.
Zhang, C., Bengio, S., Hardt, M., Recht, B., and Vinyals, O.
(2016). Understanding deep learning requires rethink-
ing generalization. arXiv preprint arXiv:1611.03530.
Empirical Analysis of Limits for Memory Distance in Recurrent Neural Networks
315