components. This may cause network instability and
weights overflow. The problem can be easily solved
by clipping gradients when their norm exceeds a
given threshold (Goodfellow et al., 2016), by weight
regularization, i.e., applying a penalty to the networks
loss function for large weight values (Pascanu et al,
2013).
On the other side, the vanishing gradient problem
occurs when the values of a gradient are too small. As
a consequence, the model slows down or stops
learning. Thus, the range of contextual information
that standard RNNs can access is in practice quite
limited.
Figure 1: (a) An RNN (b) The equivalent NN unfolded in
time.
Long Short-Term Memory (LSTM, Graves et al.,
2009) is an RNN specifically designed to address the
exploding and vanishing gradient problems. An
LSTM hidden layer consists of recurrently connected
subnets, called memory blocks. Each block contains
a set of internal units, or cells, whose activation is
controlled by three multiplicative gates: the input
gate, the forget gate, and the output gate. An LSTM
network can remember arbitrary time intervals. The
cell decides whether to store (by the input gate), to
delete (by the forget gate), or to provide (output gate)
information, based on the importance assigned. The
assignment of importance happens through weights,
which are learned by the algorithm. Since the gates in
an LSTM are analog, in the form of sigmoid, the
network is differentiable, and trained by BP.
In recent years, LSTM networks have become the
state-of-the-art models for many machine learning
problems (Greff et al., 2017). This has attracted the
interest of researchers on the computational
components of LSTM variants.
This paper focuses on a novel concept of
computational memory in RNNs, based on stigmergy.
Stigmergy is defined as an emergent mechanism for
self-coordinating actions within complex systems, in
which the trace left by a unit’s action on some
medium stimulates the performance of a subsequent
unit’s action (Heylighen, 2016). To our knowledge,
this is the first study that proposes and lays down a
basic design for the derivation of Stigmergic Memory
RNN (SM-RNN). In the literature, stigmergy it is a
well-known mechanism for swarm intelligence and
multi-agent systems. Although its high potential,
demonstrated by the use of stigmergy in biological
systems at diverse scales, the use of stigmergy for
pattern recognition and data classification is still
poorly investigated (Heylighen, 2016). As an
example, in (Cimino et al.¸2015) a stigmergic
architecture has been proposed to perform adaptive
context-aware aggregation. In (Alfeo et al., 2017) a
multi-layer architectures of stigmergic receptive
fields for pattern recognition have been experimented
for human behavioral analysis. In (Galatolo et al.,
2018), the temporal dynamics of stigmergy is applied
to weights, bias and activation threshold of a classical
neural perceptron, to derive a non-recurrent
architecture called Stigmergic NN (S-NN). However,
due to the large NN produced by the unfolding
process, the S-NN scalability is limited by the
vanishing gradient problem. In contrast, the SM-RNN
proposed in this paper employs FF-NN as store and
forget cells operating on a Multi-mono-dimensional
SM, in order to reduce the network complexity.
To appreciate the computational power achieved
by SM-RNN, in this paper a conventional FF-NN, an
S-NN (Galatolo et al., 2018), an RNN and an LSTM-
NN have been trained to solve the MNIST digits
recognition benchmark (LeCun et al., 2018).
Specifically, two MNIST variants have been
considered: spatial, i.e., as sequences of bitmap rows,
and temporal, i.e., as sequences of pen strokes (De
Jong, E. D., 2018).
The remainder of the paper is organized as
follows. Section 2 discusses the architectural design
of SM-NNs. Experiments are covered in Section 3.
Finally, Section 4 summarizes conclusions and future
work.
2 ARCHITECTURAL DESIGN
Let us consider, in neuroscience, the phenomenon of
selective forgetting that characterizes memory in the
brain: information pieces that are no longer reinforced
will gradually be lost with respect to recently
reinforced ones. This behavior can be modeled by
using stigmergy. Figure 2 shows the ontology of an
SM, made by four concepts: Stimulus, Deposit,
Removal, and Mark. In essence, the Stimulus is the
input of a stigmergic memory. The past dynamics of
the Stimulus are indirectly propagated and stored in
the Mark. This propagation is mediated by Deposit
and Removal: Stimulus affects Deposit and Removal
which, respectively, reinforces and weakens Mark.
Using Stigmergy as a Computational Memory in the Design of Recurrent Neural Networks
831