PARALLEL REWRITING IN NEURAL NETWORKS

Ekaterina Komendantskaya

School of Computer Science, University of St Andrews, U.K.

Keywords:

Computational logic in neural networks, Neuro-symbolic networks, Abstract rewriting, Parallel term-

rewriting, Unsupervised learning, Computer simulation of neural networks.

Abstract:

Rewriting systems are used in various areas of computer science, and especially in lambda-calculus, higher-

order logics and functional programming. We show that the unsupervised learning networks can implement

parallel rewriting. We show how this general correspondence can be reﬁned in order to perform parallel

term rewriting in neural networks, for any given ﬁrst-order term. We simulate these neural networks in the

MATLAB Neural Network Toolbox and present the complete library of functions written in the MATLAB

Neural Network Toolbox.

1 INTRODUCTION

Term rewriting (Terese, 2003) is a major area of re-

search in theoretical computer science, and has found

numerous applications in lambda calculus, higher-

order logics and functional programming. Different

forms of term-rewriting techniques underly various

areas of automated reasoning.

A simple example of an abstract rewriting system

would be a string together with a rule for rewriting

the elements of the string. In more complex cases,

the string can be given by some ﬁrst-order term, there

can be a system of rewriting rules rather than one

rule, and, of course, the rewriting rules can be such

that the initial string would be shortened or extended

through the rewriting process. Certain rewriting sys-

tems would always lead to normal forms, some - not,

and the process of reducing to a normal form can be

ﬁnite or inﬁnite. We will give formal deﬁnitions and

explanations in Section 2.

If we have to build neural networks capable

of automated reasoning, we would need to imple-

ment term-rewriting techniques into them; (Komen-

dantskaya, 2009a). These methods can be further

used in hybrid systems research.

There are several obstacles on our way. First prob-

lem is that, according to a general convention, neu-

ral networks do not process strings, or ordered se-

quences. Every neuron can accept only a scalar as

a signal, and output a scalar in its turn. This gen-

eral convention has been developed through decades

of discussion, and different views on it are best sum-

marised in (Aleksander and Morton, 1993; Smolen-

sky and Legendre, 2006). However, it happens to be

that some order is innate to neural networks: and this

order is imposed by position of neurons in a given

layer, and by positions of layers in a network. So, al-

though each neuron accepts only a scalar number as

an input, a layer of neurons accepts a vector of such

numbers, and the whole network can accept a matrix

of numbers.

This gives us the ﬁrst basic assumption of the pa-

per: a vector of neurons in a layer mirrors the

structure of a string. This is why, we will use one

layer networks throughout.

Related literature that concerns the structure pro-

cessing with neural networks falls within three areas

of research: the core method to deal with symbolic

formulae and prolog terms (Bader et al., 2008); recur-

sive networks which can deal with string trees (Strick-

ert et al., 2005); and kernel methods for structures

(G¨artner, 2003). The approach we pursue here does

not follow any of the mentioned mainstream direc-

tions, but, as a pay-off, it is very direct and simple.

Having made the ﬁrst assumption above, we still

need to determine which of the parameters of a neural

network will hold information about the elements of a

givenstring. One easy solution could be to send a vec-

tor consisting of the elements of a string as an input

to a chosen network. However, in this case the task of

rewriting this string would be delegated to a process-

ing function of the layer, whereas we wish to realise

the process by means of learning. This reduces our

options: conventionally, there are two parameters that

452

Komendantskaya E. (2009).

PARALLEL REWRITING IN NEURAL NETWORKS.

In Proceedings of the International Joint Conference on Computational Intelligence, pages 452-458

DOI: 10.5220/0002319704520458

 SciTePress

can be trained in neural networks: these are weights

and biases. Weights are used more often in learning

and training, and so we choose weights to represent

the string we wish to rewrite.

Thus, the second major assumption is: adjusting

weights of a network is similar to rewriting terms.

So, given a string s, we construct one layer of neu-

rons, with the weight vector w equal to s, and the lin-

ear transfer function F(x) = x. We will work with

input signals equal to 1, so as to preserve the exact

value of w at the ﬁrst step. Next, we wish the pro-

cess of training of this weight to correspond to steps

of parallel rewriting. How close is conventional unsu-

pervised learning implemented in neural networks to

the term rewriting known in computational logic?

Consider a simple form of a Hebbian learning:

given an input x = 1 to the layer, and having received

an output y, the rate of change ∆w for w is com-

puted as follows: ∆w = L(y, x), with L some cho-

sen function. In a special case, it may be ∆w = ηyx,

where η is a positive constant called the rate of learn-

ing. We take, for example, η = 2. At the ﬁrst it-

eration, the output will be equal to w, and so the

network will compute ∆w = 2w. At the next itera-

tion, the network will modify its weight as follows:

new

= w+ ∆w = w+ 2w = 3w. And this value will

be sent as the output, see also Section 3.

Interestingly enough, the conventional Hebbian

network we have just described above does rewriting

as we know it in computer science. In terms of term

rewriting, it takes any string, and rewrites it according

to the rewriting rule ρ : x → 3x, albeit, as we will see

in Section 2, we can use only ground instances of ρ.

Given a string [1, 2, 3, 1, 2, 3, 3, 1, 2] the network will

transform it into [3, 6, 9, 3, 6, 9, 9, 3, 6].

This justiﬁes the third main assumption we use

throughout: unsupervised (Hebbian) learning pro-

vides a natural and elegant framework for imple-

menting parallel rewriting in neural networks.

These three assumptions lay the basis for the main

deﬁnitions of Section 3. Additionally, in Sections

3 and 4, we show the ways to formalise the more

complex cases of term-rewriting by means of unsu-

pervised learning. These cases arise when one has

more than one rewriting steps, and these steps are

not instances of one rewriting rule, when the length

of a given string changes in the process of rewriting,

and also, when one uses ﬁrst-order terms instead of

abstract strings. In Section 3, we deﬁne the archi-

tecture and a simple unsupervised learning rule for

neural networks that can perform abstract rewriting,

with some restrictions on the shape and the number of

rewriting steps. In Section 4, we reﬁne the architec-

ture of these neural networks and adapt them for the

purpose of ﬁrst-order term rewriting. We prove that

for an arbitrary Term Rewriting System, these neural

networks perform exactly the parallel term rewriting.

When moving from simple examples of rewriting

systems to more speciﬁc and complex ones, all we

have to do is to re-deﬁne the function L used in the

deﬁnition of the learning rule ∆w = L(y, x). While

for some examples, as the one we have just con-

sidered, L is completely conventional, for other ex-

amples we deﬁne and test new functions (

rewrite

rewrite mult

), using MATLAB Neural Network

Simulator. The most complex of these functions -

rewrite mult

- can support rewriting by unsuper-

vised learning for any given Abstract or Term rewrit-

ing System.

Finally, in Section 5, we conclude the paper.

2 REWRITING SYSTEMS

In this section, we outline some basic notions used in

the theory of Term-Rewriting, see (Terese, 2003).

The most basic and fundamental notion we en-

counter is the notion of an abstract reduction (or

rewriting) system.

Deﬁnition 1. An abstract rewriting system (ARS) is a

structure A = (A, {→

|α ∈ I}) consisting of a set A

and a set of binary relations →

on A, indexed by a

set I. We write (A, →

, →

) instead of (A, {→

|α ∈

{1, 2}}).

A term rewriting system (TRS) consists of terms

and rules for rewriting these terms. So we ﬁrst need

the terms. Brieﬂy, they will be just the terms over a

given ﬁrst-order signature, as in the ﬁrst-order pred-

icate logic. Substitution is the operation of ﬁlling in

terms for variables. See (Terese, 2003) for more de-

tails. Given terms, we deﬁne rewriting rules:

Deﬁnition 2. A reduction rule (or rewrite rule) for a

signature Σ is a pair hl, ri of terms of Ter(Σ). It will

be written l → r, often with a name: ρ : l → r. Two

restrictions on reduction rules are imposed:

• the left-hand side l is not a variable;

• every variable occurring in the right-hand side r

occurs in the left-hand side l as well.

A reduction rule ρ : l → r can be viewed as a scheme.

An instance of ρ is obtained by applying a substitution

σ. The result is an atomic reduction step l

→

The left-hand side l

is called a redex and the right-

hand side r

is called its contractum.

Given a term, it may contain one or more occur-

rences of redexes. A rewriting step consists of con-

tracting one of these, i.e., replacing the redex by its

contractum.

PARALLEL REWRITING IN NEURAL NETWORKS

453

Deﬁnition 3. A rewriting step accordingto the rewrit-

ing rule ρ : l → r consists of contracting a redex

within an arbitrary context:

C[l

] →

C[r

]

We call →

the one-step rewriting relation generated

by ρ.

Deﬁnition 4. • A term rewriting system is a pair

R = (Σ, R) of a signature Σ and a set of rewrit-

ing rules R for Σ.

• The one-step rewriting relation of R , denoted by

→

, is deﬁned as the union

{→

|ρ ∈ R}. So we

have t →

s when t →

s for one of the rewriting

rules ρ ∈ R.

Example 1. Consider a rewrite rule ρ : F(G(x), y) →

F(x, x). Then a substitution σ, with σ(x) = 0 and

σ(y) = G(x), yields the atomic reduction step

ρ : F(G(0), G(x)) →

F(0, 0)

with redex F(G(0), G(x)) and contractum F(0, 0).

The rule gives rise to (e.g.) the rewriting step

F(z, G(F(G(0), G(x)))) →

F(z, G(F(0, 0)))

Here the context is F(z, G(2)).

Example 2. Consider the TRS with rewriting rules

: F(a, x) → G(x, x) (1)

: b → F(b, b) (2)

• The substitution [x := b] yields the atomic rewrit-

ing step F(a, b) →

G(b, b).

• A corresponding one-step rewriting is

G(F(a, b), b) →

G(G(b, b), b).

• Another one-step rewriting is G(F(a, b), b) →

G(F(a, b), F(b, b)).

The notion of a parallel rewriting is central for

establishing conﬂuence; (Terese, 2003).

Deﬁnition 5. Let a term t contain some disjoint re-

dexes s

, s

, . . . , s

; that is, suppose we have t ≡

C[s

, s

, . . . , s

], for some context C. Obviously, these

redexes can be contracted in any order. If their con-

tracta are respectively s

′

, s

′

, . . . , s

′

, in n steps the

reduct t

′

≡ C[s

′

, s

′

, . . . , s

′

] can be reached. These n

steps together are called a parallel step.

Performing disjoint reductions in parallel brings

signiﬁcant speed-up to computations. However, very

often the parallel steps are conceived or implemented

as a sequence of disjoint rewriting steps. As we show

in the next sections, term-rewriting implemented in

neural networks does the parallel step not as a se-

quence, but truly in parallel.

3 UNSUPERVISED LEARNING

AND ABSTRACT REWRITING

In this section, we deﬁne neural networks, following

(Hecht-Nielsen, 1990; Haykin, 1994).

An artiﬁcial neural network (also called a neu-

ral network) is a directed graph. A unit k in this

graph is characterised, at time t, by its input vector

(t), . . . v

(t)), its potential p

(t), its bias b

and its

value v

(t). In what follows, we will use integers.

Units are connected via a set of directed and

weighted connections. If there is a connection from

unit j to unit k, then w

denotes the weight associ-

ated with this connection, and i

(t) = w

(t) is the

input received by k from j at time t. At each update,

the potential and value of a unit are computed with

respect to an input (activation) and an output (trans-

fer) functions respectively. The units considered here

compute their potential as the weighted sum of their

inputs:

(t) =

∑

j=1

(t)

. ⋆

The units are updated synchronously, time be-

comes t + ∆t, and the output value for k, v

(t + ∆t),

is calculated from p

(t) by means of a given transfer

function F, that is, v

(t + ∆t) = F(p

(t)).

A unit is said to be a linear unit if its transfer func-

tion is the identity. In this case, v

(t + ∆t) = p

(t).

We will consider networks where the units can be

organised in layers. A layer is a vector of units.

In the rest of the paper, we will normally work

with layers of neurons rather than with single neu-

rons, and hence we will manipulate with vectors of

weights, output signals, and other parameters. In this

case, we can drop the subscripts and write simply w

for the vector of weights.

There are two major kinds of learning distin-

guished in Neurocomputing: supervised and unsuper-

vised learning. In this paper, we focus only on unsu-

pervised learning.

Unsupervised learning in its different forms has

the following common features. A network is given a

learning rule, according to which it trains its weights.

Adaptation is achieved by means of processing exter-

nal signals, and applying the learning rule L. To train

the weight w

(t), we apply a learning function L to

the input and output signals v

(t) and v

(t), and get

∆w

(t) = L(v

(t), v

(t)). We will call the vector ∆w

the change vector for the weight vector w. As a par-

ticular case of this formula, one can have ∆w

(t) =

η(v

(t), v

(t)), where η is a positive constant called

the rate of learning. At the next time step t + 1, the

weight is changed to w

(t + 1) = w

(t) + ∆w

(t).

IJCCI 2009 - International Joint Conference on Computational Intelligence

454

/.-,()*+

η=2

{{

/.-,()*+

η=2

{{

/.-,()*+

;;







))



/.-,()*+

;;







))



/.-,()*+

η=2

/.-,()*+

η=2

Figure 1: ARNN

net

at training steps 1 and 2.

We could perceive this learning function L as a

rewriting rule for the weight w

, and the process of

training would be the process of rewriting in this case.

A suitable architecture for a network capable of per-

forming abstract rewriting by unsupervised learning is

given in the next deﬁnition, under the name abstract

rewriting neural network (ARNN). For simplicity, we

will ﬁrst cover only ARS with one rewriting rule.

We adopt the following notation. For a given vec-

tor v, we denote its length by l

. For a given string s,

the vector that corresponds to it is denoted v

, and the

length of this vector is denoted by l

Deﬁnition 6. Given an ARS A = (A, {→

}), and a

sequence s of elements of A, an architecture for the

abstract rewriting neural network (ARNN) net for s is

deﬁned as follows. Let v

be the vector of elements of

s. Let l

be the length of v

. Then net is constructed

from one layer k of l

neurons. Its weight vector w

is equal to v

. The transfer function is taken to be

identity. The network receives input signal 1.

This deﬁnition realises the ﬁrst two basic assump-

tions we outlined in Introduction. In the future, we

will freely transform sequences of symbols into vec-

tors, in the way we have done in the Deﬁnition 6. Be-

cause the input signal is equal to 1, the network built

as in Deﬁnition 6 will always output v

, as we further

illustrate in the next example.

Example 3. Given a set A= {1, 2, 3}, and a sequence

s = 1, 2, 3, 1, 2, 3, 3, 1, 2, the corresponding ARNN is

constructed as follows. We take one layer k of 9

neurons, and deﬁne the weight w

to be the vector

= [1;2;3;1;2;3;3;1;2]. Once initialised, the net-

work will output the same vector: If we look at the

equation ⋆, and put j = 1 (there is only one input),

and v

= 1, then the potential p

will be equal to w

See Figure 1. This example and many more are also

available in the ﬁle experiments s in (Komen-

dantskaya, 2009b).

We have a learning rule to add, in order to enable

the network to rewrite. Suppose we have a rewrit-

ing rule ρ

: [a] → [b], with vectors a and b of equal

length, and we want to apply this rewriting rule. Fol-

lowing the usual conventions, and taking the input

signal to be 1, the learning rule will take the out-

put vector v

and apply some learning function L to

, to form the change vector ∆w

, and compute

w1k

new

= w

old

+ ∆w

. The only thing left is to de-

ﬁne L.

As we mentioned in the introduction, in some

cases we can use conventional Hebbian learning. For

example, taking the rate of learning to be equal to 2,

we can obtain the difference vector ∆w

= 2v

, for

the network from Example 3. This will amend the

weight w

new

= w

+ ∆w

= v

+ 2v

= 3v

. Such

a network would perform rewriting for ground instan-

tiations of the rule x → 3x. Applied to Example 3,

it would give the result [3;6;9;3;6;9;9;3;6], see Fig-

ure 1. But note that Deﬁnition 2 prohibits the use of

the rewriting rules which contain a variable as a re-

dex, and so we use three ground instances of this rule,

substituting 1, 2, 3 for x.

However, transformation of a rewriting rule into a

linear function is not normally given, and not always

possible. Therefore, we need to develop a more gen-

eral approach. We deﬁne L, and call it

rewrite

; in

the MATLAB library (Komendantskaya, 2009b) this

function is called

rewrite2

Deﬁnition 7. (Function Rewrite). Let A =

(A, {→

}) be an ARS, s

be the given string, v

its

corresponding vector, and ρ

: [a] → [b] be the rewrit-

ing rule, where a and b are vectors of the same length

m. Take zero vector Z of length l

. Compute ρ

′

−([a] − [b]). For every n = {1, . . . , l

}, do the follow-

ing: If n, n+ 1, . . . (n+l

−1)th elements of the vector

are equal to a, put ρ

′

on the n. . . (n + l

− 1)th

place of Z.

The function

rewrite

takes three arguments - the

output vector v

, and two vectors [a] and [b] that cor-

respond to the left-hand side and the right-hand side

of the rewriting rule ρ. It outputs the change vector

that contains ρ

′

at precisely those positions where [a]

appeared in v

, and zeros at all other positions. To

simulate this in MATLAB, one has to choose a train-

ing mode - in the standard library the unsupervised

training function is called

trainbuwb

. Then we de-

ﬁne a new learning function

learnr

that is used by

the training function. The learning function imple-

ments the function

rewrite

Note these subtle interconnections between the

functions participating in training. The unsupervised

training function (

trainbuwb

) activates the learning

function (

learnr

) that computes ∆w, and the latter is

given by implementing the function

rewrite

. This

PARALLEL REWRITING IN NEURAL NETWORKS

455

hierarchy is imposed by MATLAB Neural Network

Toolbox, and we respect it throughout the paper.

Example 4. We continue Example 3, and introduce a

rewriting rule ρ

: [2 3] → [2 1]. Now we can compute

rewrite(v

, [2 3], [2 1]) = [0;0;− 2;0;0;−2;0;0;0].

After one iteration, net performs a parallel rewrit-

ing step computing: [1;2;1;1;2;1;3;1;2]; see the

ﬁle experiments s.mat in (Komendantskaya,

2009b).

Lemma 1. Given an ARS A = (A, {→

}), such that

the rewriting rule’s redex and contractum are of the

same size, given a sequence s

of elements of A, there

exists an Abstract Rewriting Neural Network (ARNN)

that performs the parallel rewriting step for s

in A .

Proof. The architecture of such a network is given in

Deﬁnition 6, and the learning function (called

learnr

in the MATLAB library) implements L =

rewrite

from Deﬁnition 7.

We have illustrated, on a limited class of ARSs,

that term-rewriting evolves naturally in unsupervised

learning neural networks. In the next section, we want

to exploit this idea to its full potential and apply it to

more complex rewriting systems.

4 TERM REWRITING NETS

In this section we consider ARSs and TRSs in their

full generality. Two major extensions will be needed.

We will need to arrange special training functions that

would allow to replace a redex by contractum when

they have different sizes. This ﬁrst problem arises be-

cause in neural networks, we use vectors instead of

strings. Secondly, we must enrich the learning rule

in such a way that several rewriting steps, possibly

arising from several rewriting rules, can be applied in

parallel.

Suppose we have a string s from Example 3 and

a rewriting rule ρ

: [1 2] → [4 5 6]. Following the

method described in the previous section, we can

build a network with weight w = v

. The train-

ing function will automatically attempt to compute

w + ∆w, this is possible only if the error vector ∆w

is of the same length as w; otherwise the vector addi-

tion is not deﬁned. But clearly, the rewriting rule ρ

will produce ∆w that is longer than w. To bring w into

appropriate form, we introduce

completion

Deﬁnition 8. (Completion Algorithm). Let s be a

given string, and v

be the corresponding vector. Let

: [a] → [b] be a given rewriting step, such that a

and b are vectors of length l

and l

, and l

> l

. Then

complete v

as follows. Compute l = l

− l

, and form

a zero vector v

of length l. Find occurrences of the

subvector a in v

. Concatenate v

with each such sub-

vector in v

. Completion outputs the vector v

′

that

contains v

after each occurrence of a, but otherwise

contains all the elements of v

in their given order.

In the library of functions we present (Komen-

dantskaya, 2009b), this function is called

completion r

. Completion can easily be embedded

into deﬁnitions of the term-rewriting networks.

We now generalise

rewrite

from Deﬁnition 7 by

adding

completion

to it.

Deﬁnition 9. (Generalised Rewrite). Let s be a

given string, and v

be the corresponding vector. Let

: [a] → [b] be a given rewriting rule, such that a

and b are vectors of arbitrary length l

and l

. Let v

′

be completed v

Form a zero vector Z of length l

′

Compute ρ

′

= −([a] − [b]), if l

= l

; otherwise

concatenate the shortest of them with the vector of

zeros of the length |l

− l

|, and compute ρ

′

= −([a] −

[b]) of length m.

For every n = {1, . . . , l

′

}, do the following: If

n, n+ 1, . . . (n+ l

− 1)th elements of the vector v

′

are

equal to a, put ρ

′

on the n, n+1, . . . (n+l

−1)th place

of Z. The resulting vector is the change vector ∆w.

Generalised

rewrite

outputs the change vector

for v

′

, its implementation in MATLAB Neural Net-

work toolbox can be found in (Komendantskaya,

2009b). As in the previous section, the reduced (or

rewritten) term can be found by computing: v

new

+ ∆w. This agrees with the training mechanism

used in neural networks and we use

rewrite

to gen-

eralise Lemma 1:

Lemma 2. Given an ARS A = (A, {→

}), and a

sequence s

of elements of A, there exists a term-

rewriting neural network (TRSNN) that performs the

parallel rewriting step for s

in A .

Proof. The architecture of such a network is given

in Deﬁnition 6, and the learning rule (

learn trs

the MATLAB library) implements L =

rewrite

from

Deﬁnition 9, see (Komendantskaya, 2009b).

So far, we have considered only rewriting on num-

bers. If we wish to apply the TRNN to terms, we

would need some numerical vector representation of

the ﬁrst-order syntax. We simply take the standard

ASCII encoding provided by MATLAB and com-

mand

double

. In general, any one-to-one encoding

will be as good.

Example 5. We take the atomic rewriting step ρ

from Example 1. We train the TRNN constructed in

Lemma 2 to rewrite the term F(z, G(F(G(0), G(x)))).

For this, we take numerical vector encoding v of

IJCCI 2009 - International Joint Conference on Computational Intelligence

456

F(z, G(F(G(0), G(x)))). The weight vector is set

to v. We get the learning function learn trs

to implement the generalised rewrite. On the

next iteration, the network outputs the answer

F(z, G(F(0, 0))); see experiments TRS.mat in

(Komendantskaya, 2009b).

The last extension we wish to introduce here con-

cerns the number of rewriting rules. So far, we con-

sidered only cases with one rewriting rule. How-

ever, there can be several disjoint redexes to which

different rewriting steps are applied. Clearly, com-

position of rewriting steps does not convey this idea,

(Terese, 2003). To implement the parallel term rewrit-

ing for several rules, we need to customise the func-

tions

completion r

and

rewrite

. Thus, they need

to have as many arguments as desired - depending on

the number of different and disjoint rewriting steps.

For example,

rewrite

was deﬁned to have three ar-

guments v - the vector we rewrite, and r1, r2, if the

rewriting rule is ρ

: r1 → r2. In case of two rewriting

rules, we will additionally have arguments r3 and r4

- for the rule ρ

: r3 → r4.

Similarly to the TRSNN that process TRSs with

one rewriting rule,

completion

and

rewrite

will

be applied hand-in-hand. We assume now that

we already have the generalised completion deﬁned

for several rewriting rules, see (Komendantskaya,

2009b). We deﬁne the generalised

rewrite

for sev-

eral rewriting rules (

rewrite mult

in MATLAB).

Deﬁnition 10. (Rewrite for Several Rewriting

Rules.) Let s be the given string, and v

be the corre-

sponding vector. Let ρ

: [a

] → [b

], ..., ρ

: [a

] →

] be disjoint atomic rewriting steps, such that each

and b

are vectors of arbitrary length l

and l

. Let

′

be the completed v

Form a zero vector Z of length l

′

For every i ∈ {1, . . . , n}, do the following: com-

pute ρ

′

= −([a

]−[b

]), if l

= l

; otherwise concate-

nate the shortest of them with the vector of zeros of the

length |l

− l

|, and then compute ρ

′

= −([a

] − [b

])

of length l

′

For every i ∈ {1, . . . , n}, ﬁnd the occurrences of

the ﬁrst element of the vector a

in v

′

, and form the

vector v

of indexes of the occurrences. Concatenate

all such v

in one vector v

, and sort its elements in

ascending order. For all k ∈ v

, for all i ∈ {1, . . . , n},

do the following. If k, k + 1, . . . , (k + l

′

− 1)th ele-

ments of the vector v

′

are equal to a

, put ρ

′

on the

k, k + 1, . . . (k+ l

′

− 1)th place of Z.

Rewrite mult outputs the difference vector ∆w for

w = v

′

, if v

′

is taken to be the weight vector of a net-

work. And we come to the main theorem of the paper.

Theorem 3. Given an arbitrary ARS A (or an arbi-

trary TRS R ), and a string s of elements of A (or any

term t of R ), there exists a neural network that per-

forms a parallel rewriting step for s according to the

rewriting rules of A (or R ).

Proof. The architecture of such a network is given

by Deﬁnition 6, the training function is conventional

(

trainbuwb

), the learning rule (

learn mult

) imple-

ments

rewrite mult

from Deﬁnition 10. The initial

weight of the network is equal to the vector v

′

(respec-

tively, v

′

), where v

′

and v

′

are completed vectors ob-

tained by applying the function

completion mult

and v

, respectively. See (Komendantskaya, 2009b)

for a ready-to-use library.

Note that the network described in this paper is

built in a very generic way, and in practice, we only

have to deﬁne such a network once (as we did in Fig-

ure 1), for one string or term. For other terms or

strings of different length, one would simply need

to re-deﬁne the length of the layer, given by MAT-

LAB command

net.layers

{

}

.size

, the newvalue

of the weight w, given by command

net.iw

{

1,1

and plug in the given rewriting rules into the learning

function. This can be easily automatised.

Example 6. We return to Example 2. Suppose we

have chosen the substitution σ = [x := c], and need

to perform a parallel rewriting step for G(F(a, c), b)

using ρ

and ρ

. We again take the template def-

inition of a neural network net from Example 3.

We customize it by computing the numerical vec-

tor v for G(F(a, b), b), and taking l

be the length

of the network’s only layer. The learning function

learn mult implements rewrite mult. The

network outputs G(G(c, c), F(b, b)) - the result of per-

forming parallel rewriting step for G(F(a, c), b), ρ

and ρ

. See also the ﬁle experiments TRS.mat

in (Komendantskaya, 2009b) for the MATLAB imple-

mentation of it.

In order to perform a sequence of parallel rewrit-

ing steps, one needs to iterate the unsupervised train-

ing of the given network: n parallel rewriting steps

will be performed in n time steps. Additionally, we

will need to embed the function

completion mult

into the training function, such that at each iteration

of learning, the network could amend the number of

neurons and the weights.

When embedded into the training function, the

complete mult

will give an effect of a growing neu-

ral gas (Fritzke, 1994), that is, the network may grow

at each training step. The growth will always be

bound by the length of the contracta appearing in the

rewriting rules, and the contracta are always ﬁnite,

and often not too big.

PARALLEL REWRITING IN NEURAL NETWORKS

457

5 CONCLUSIONS

We have shown that unsupervised learning used in

Neurocomputing implements naturally the parallel

rewriting, both for ARSs and TRSs. For a simple and

limited class of rewriting systems, where only one

rewriting rule is allowed, and its redex and contrac-

tum are of the same length, the abstract rewriting is

described naturally by a simple form of unsupervised

learning. For ARSs and TRSs in their full general-

ity, we have constructed neural networks that perform

parallel rewriting steps with the help of completion

algorithm embedded into the learning rule.

The neural networks deﬁned here are fully for-

malised in the MATLAB Neural Network Toolbox,

and the library of functions is available in (Komen-

dantskaya, 2009b). The implementation brings com-

putational optimisation to the theory of TRS, in that

it achieves true parallelism, as opposed to the clas-

sical view on parallel term rewriting as a “sequence

of disjoint reductions”. Since term-rewriting plays a

central role in typed theories and functional program-

ming, this implementation may prove to be an im-

portant step on integration of the computational logic

with learning techniques of neurocomputing; see also

(Komendantskaya, 2009a).

The arguablepart of the presented work is whether

the new (unconventional) learning functions we de-

ﬁned are admissible in neural networks. There can

be two responses to this criticism. The ﬁrst and

general response (see also (Komendantskaya, 2008))

is that the devision between unconventional (“sym-

bolic”) and conventional (“arithmetic”, “statistical”)

functions is arguable, as there is no formal criteria

that separates the two. Depending on a programming

language we use, arithmetic functions can be repre-

sented symbolically (Komendantskaya, 2008), or, as

we did here, symbolic functions can be represented

numerically. Another, more concrete and practical re-

sponse, is that the clear advantage of the networks we

presented here is the ease of implementation in hybrid

systems: one and the same network can easily switch

between the conventional and “symbolic” learning

functions, without any structural or other transforma-

tions.

ACKNOWLEDGEMENTS

The work was sponsored by EPSRC PF research grant

EP/F044046/1. I thank Roy Dyckhoff for useful dis-

cussions. Finally, I thank the authors and presenters

of EIDMA/DIAMANT minicourse Lambda Calculus

and Term Rewriting Systems Henk Barendregtand Jan

Willem Klop for inspiration.

REFERENCES

Aleksander, I. and Morton, H. (1993). Neurons and Sym-

bols. Chapman and Hall.

Bader, S., Hitzler, P., and H¨olldobler, S. (2008). Connec-

tionist model generation: A ﬁrst-order approach. Neu-

rocomputing, 71:2420–2432.

Fritzke, B. (1994). Fast learning with incremental rbf net-

works. Neural Processing Letters, 1:1–5.

G¨artner, T. (2003). A survey of kernels for structured data.

SIGKDD Explorations, 5(1):49–58.

Haykin, S. (1994). Neural Networks. A Comprehensive

Foundation. Macmillan College Publishing Company.

Hecht-Nielsen, R. (1990). Neurocomputing. Addison-

Wesley.

Komendantskaya, E. (2008). Uniﬁcation by error-

correction. In Proceedings of NeSy’08 workshop at

ECAI’08, 21-25 July 2008, Patras, Greece, volume

366. CEUR Workshop Proceedings.

Komendantskaya, E. (2009a). Neurons or symbols: why

does or remain exclusive? In Proceedings of

ICNC’09.

Komendantskaya, E. (2009b). Term rewriting in neural

networks: Library of functions and examples writ-

ten in MATLAB neural network toolbox. www.cs.st-

andrews.ac.uk/˜ek/Term-Rewriting.zip.

Smolensky, P. and Legendre, G. (2006). The Harmonic

Mind. MIT Press.

Strickert, M., Hammer, B., and Blohm, S. (2005). Unsuper-

vised recursive sequence processing. Neurocomput-

ing, 63:69–97.

Terese (2003). Term Rewriting Systems. Cambridge Uni-

versity Press.

IJCCI 2009 - International Joint Conference on Computational Intelligence

458