can be trained in neural networks: these are weights
and biases. Weights are used more often in learning
and training, and so we choose weights to represent
the string we wish to rewrite.
Thus, the second major assumption is: adjusting
weights of a network is similar to rewriting terms.
So, given a string s, we construct one layer of neu-
rons, with the weight vector w equal to s, and the lin-
ear transfer function F(x) = x. We will work with
input signals equal to 1, so as to preserve the exact
value of w at the first step. Next, we wish the pro-
cess of training of this weight to correspond to steps
of parallel rewriting. How close is conventional unsu-
pervised learning implemented in neural networks to
the term rewriting known in computational logic?
Consider a simple form of a Hebbian learning:
given an input x = 1 to the layer, and having received
an output y, the rate of change ∆w for w is com-
puted as follows: ∆w = L(y, x), with L some cho-
sen function. In a special case, it may be ∆w = ηyx,
where η is a positive constant called the rate of learn-
ing. We take, for example, η = 2. At the first it-
eration, the output will be equal to w, and so the
network will compute ∆w = 2w. At the next itera-
tion, the network will modify its weight as follows:
w
new
= w+ ∆w = w+ 2w = 3w. And this value will
be sent as the output, see also Section 3.
Interestingly enough, the conventional Hebbian
network we have just described above does rewriting
as we know it in computer science. In terms of term
rewriting, it takes any string, and rewrites it according
to the rewriting rule ρ : x → 3x, albeit, as we will see
in Section 2, we can use only ground instances of ρ.
Given a string [1, 2, 3, 1, 2, 3, 3, 1, 2] the network will
transform it into [3, 6, 9, 3, 6, 9, 9, 3, 6].
This justifies the third main assumption we use
throughout: unsupervised (Hebbian) learning pro-
vides a natural and elegant framework for imple-
menting parallel rewriting in neural networks.
These three assumptions lay the basis for the main
definitions of Section 3. Additionally, in Sections
3 and 4, we show the ways to formalise the more
complex cases of term-rewriting by means of unsu-
pervised learning. These cases arise when one has
more than one rewriting steps, and these steps are
not instances of one rewriting rule, when the length
of a given string changes in the process of rewriting,
and also, when one uses first-order terms instead of
abstract strings. In Section 3, we define the archi-
tecture and a simple unsupervised learning rule for
neural networks that can perform abstract rewriting,
with some restrictions on the shape and the number of
rewriting steps. In Section 4, we refine the architec-
ture of these neural networks and adapt them for the
purpose of first-order term rewriting. We prove that
for an arbitrary Term Rewriting System, these neural
networks perform exactly the parallel term rewriting.
When moving from simple examples of rewriting
systems to more specific and complex ones, all we
have to do is to re-define the function L used in the
definition of the learning rule ∆w = L(y, x). While
for some examples, as the one we have just con-
sidered, L is completely conventional, for other ex-
amples we define and test new functions (
rewrite
,
rewrite mult
), using MATLAB Neural Network
Simulator. The most complex of these functions -
rewrite mult
- can support rewriting by unsuper-
vised learning for any given Abstract or Term rewrit-
ing System.
Finally, in Section 5, we conclude the paper.
2 REWRITING SYSTEMS
In this section, we outline some basic notions used in
the theory of Term-Rewriting, see (Terese, 2003).
The most basic and fundamental notion we en-
counter is the notion of an abstract reduction (or
rewriting) system.
Definition 1. An abstract rewriting system (ARS) is a
structure A = (A, {→
α
|α ∈ I}) consisting of a set A
and a set of binary relations →
α
on A, indexed by a
set I. We write (A, →
1
, →
2
) instead of (A, {→
α
|α ∈
{1, 2}}).
A term rewriting system (TRS) consists of terms
and rules for rewriting these terms. So we first need
the terms. Briefly, they will be just the terms over a
given first-order signature, as in the first-order pred-
icate logic. Substitution is the operation of filling in
terms for variables. See (Terese, 2003) for more de-
tails. Given terms, we define rewriting rules:
Definition 2. A reduction rule (or rewrite rule) for a
signature Σ is a pair hl, ri of terms of Ter(Σ). It will
be written l → r, often with a name: ρ : l → r. Two
restrictions on reduction rules are imposed:
• the left-hand side l is not a variable;
• every variable occurring in the right-hand side r
occurs in the left-hand side l as well.
A reduction rule ρ : l → r can be viewed as a scheme.
An instance of ρ is obtained by applying a substitution
σ. The result is an atomic reduction step l
σ
→
ρ
r
σ
.
The left-hand side l
σ
is called a redex and the right-
hand side r
σ
is called its contractum.
Given a term, it may contain one or more occur-
rences of redexes. A rewriting step consists of con-
tracting one of these, i.e., replacing the redex by its
contractum.
PARALLEL REWRITING IN NEURAL NETWORKS
453