TORQUE CONTROL WITH RECURRENT NEURAL NETWORKS

Guillaume Jouffroy

Artiﬁcial Intelligence Laboratory, University Paris 8, France

Keywords:

Joint constraint method, oscillatory recurrent neural network, generalized teacher forcing, feedback, adaptive

systems.

Abstract:

In the robotics ﬁeld, a lot of attention is given to the complexity of the mechanics and particularly to the

number of degrees of freedom. Also, the oscillatory recurrent neural network architecture is only considered

as a black box, which prevents from carefully studying the interesting features of the network’s dynamics. In

this paper we describe a generalized teacher forcing algorithm, and we build a default oscillatory recurrent

neural network controller for a vehicle of one degree of freedom. We then build a feedback system as a

constraint method for the joint. We show that with the default oscillatory controller the vehicle can however

behave correctly, even in its transient time from standing to moving, and is robust to the oscillatory controller’s

own transient period and its initial conditions. We ﬁnally discuss how the default oscillator can be modiﬁed,

thus reducing the local feedback adaptation amplitude.

1 INTRODUCTION

Central Pattern Generators (CPG) are biological pe-

riodic oscillatory neural networks responsible for a

wide range of rhythmic functions. They can be made

of endogeneous oscillatory neurons connected to non

oscillatory ones or from the sole interaction between

non oscillatory neurons.

Particularly, they are a great source of inspiration

in the robotics ﬁeld, for the control of joints in loco-

motion. In general, an oscillatory network controls a

joint angle in both directions, and the phase relation-

ships needed between all joints arise from the cou-

pling between the different networks.

The needed parameters for an artiﬁcial Recurrent

Neural Network (RNN) to have a periodic oscillatory

behavior cannot be measured experimentally. This

network is most of the time a relatively simpliﬁed

model of its biological counterpart when available.

Only clinical temporal data of joints kinetics and

kinematics can be of use, where however it is difﬁcult

to isolate the real control of a particular joint from the

inﬂuence of the others.

In the case of non endogeneous oscillatory neu-

rons, in the litterature, parameters are thus mainly

determined empirically or with genetic algorithms

(Buono and M.Golubitsky, 2001), (Ghigliazza and

P.Holmes, 2004), (Ishiguro et al., 2000), (Kamimura

et al., 2003), (Taga, 1994), (Ijspeert, 2001), compara-

tively to relatively few learning methods (Mori et al.,

2004), (Tsung and Cottrell, 1993), (Weiss, 1997).

Though this is useful with large networks in complex

mechanical models, there are two drawbacks. It is

very difﬁcult in general to isolate the resulting dynam-

ics of the different networks, and to understand their

interaction to each other and with the mechanical sys-

tem dynamics. Also, it is not clear how to modify

such networks in an adaptive context, e.g. in the case

of a permanent constraint change on a joint due to in-

jury.

Based on this considerations, we apply a general-

ized formulation of the so called teacher forcing gra-

dient descent-based learning algorithm, to create an

oscillatory RNN as a torque controller for an inter-

esting vehicle with one single degree of freedom, the

Roller Racer. The RNN is put in a closed loop with

the Roller Racer, such that the vehicle can be freely

controlled, where the RNN can be modiﬁed perma-

nently.

The paper is structured as follows. In section 2,

we brieﬂy present the Roller Racer model and we

show how it can be controlled with a torque input. In

section 3 we describe the control system. The sub-

section 3.1 presents the generalized formulation of

the teacher forcing learning algorithm with which we

build the oscillatory RNN as a basic torque controller

109

Jouffroy G. (2008).

TORQUE CONTROL WITH RECURRENT NEURAL NETWORKS.

In Proceedings of the Fifth International Conference on Informatics in Control, Automation and Robotics - RA, pages 109-114

DOI: 10.5220/0001501401090114

 SciTePress

for the Roller Racer vehicle. In subsection 3.2 we de-

scribe the local feedback control that can be built so

that the vehicle direction can be controlled, limiting

the effect of its transient state. We discuss how the

basic oscillatory system can be In section 4, we give

concluding remarks and discuss how the basic oscil-

latory system can be modiﬁed to better ﬁt the needs

of the Roller Racer, thus reducing the local feedback

adaptation amplitude.

2 VEHICLE MODEL

The Roller Racer is a toy-vehicle with one single de-

gree of freedom which is the handlebar. The direction

wheels are shifted back from the the axis. Thus, os-

cillating the handlebar from side to side, a component

of the reaction force on the ground which points back-

ward is created, moving forward the vehicle.

In (Jouffroy and Jouffroy, 2006), we revisited

and synthetized a mathematical model of the Roller

Racer from the original work of Krishnaprasad and

Tsakriris (Krishnaprasad and Tsakriris, 1995). The

input control was the angle of the axis. Here, we will

describe the torque input control formalization.

Recall the state of the Roller Racer vehicle is

= (θ

, p,θ

)

∈ R

, and its dynamics is

x = f(x,u), where

f(x,u)







∆(θ

)



sinθ

p − δ(θ

)



cosθ

∆(θ

)



χ(θ

)p − γ(θ

)sinθ



sinθ

∆(θ

)



χ(θ

)p − γ(θ

)sinθ





(θ

)

−C

(θ

)





(θ

)

(θ

)









, (1)

is the angle of the handlebar, p is the momen-

tum of the vehicle and (x

) are the rear coordinates

respectively to the global reference space. Friction

constraint is built in the model through the functions

(θ

) and C

(θ

). Thus one does not need to deal

with mechanical aspects, leaving focus on the control

strategy, and on the learning aspects of the RNN.

Here we control the Roller Racer using the torque

input control T

with the following equation

= u = B

(θ

)

p + B

(θ

)

θ + B

(θ

(2)

The right hand side of the equation replaces u in (1).

The parameters are deﬁned as

= −

(θ

)

∆

(θ

)

γ(θ

)sinθ

∆(θ

)∆

(θ

)

[γ(θ

)cosθ

+ d

δ(θ

)]

∆(θ

)

∆

(θ

)

where

∆

= I

sin

+ m

+ I

cos

and the other parameters are as deﬁned in (Jouffroy

and Jouffroy, 2006).

3 DESIGN OF THE

OSCILLATORY CONTROLLER

3.1 The Oscillatory Recurrent Neural

Network Torque Controller

Let us consider the RNN system

x = f(x,W), (3)

with x ∈ R

is the state vector of the network, W ∈

n×n

is the matrix of the weight connexions, w

i j

be considered as the weight from the neuron i to

the neuron j. We consider a fully connected RNN,

which means all neurons are interconnected and self-

connected (w

6= 0). For the neuron model we use the

rate based neuron model of the simplest form

f(x,W) = (Iτ

−1

)(−x + Ws(x)), (4)

with s(x) a squashing function such as tanh(x). I

is the identity matrix and τ ∈ R

the time constant

vector of the system.

Each component x

∗

of the teacher vector x

∗

is of

the form

∗

= sin(t + φ

) (5)

The learning is achieved when an error criterion

E, E ∈ R

is less or equal than a minimum ε ∈ R,

ε ' 0

E =

(x − x

∗

) ◦ (x−x

∗

) < ε, (6)

the operator ◦ being the Hadamard product.

The weight matrix W is ajusted according to the

following gradient rule

ICINCO 2008 - International Conference on Informatics in Control, Automation and Robotics

110

˙w

= −η

∑

i=1

∂E

∂x

, (7)

with η ∈ R is the learning rate. z ∈ R

n×n

is the sen-

sitivity of the state of the system with respect to a

weight w

, which can be written in the matrix form

= (Iτ

−1

)(J

(M)z + J

(M)), (8)

where J

(M) and J

(M) are the jacobian matrices of

the function f respectively to x and w at the point M.

In the teacher forcing case (8) reduces to

= (Iτ

−1

)(−Iz + J

(M)), (9)

For convenience z is of the form







··· z







(10)

Providing a target signal(s) x

∗

only for some neu-

ron(s) i, letting J

= A, the sensitivity equation (8)

can be written as

i j

= −a

i j

+ b

i j

∂s(x

)

∂x

, (11)

with a

i j

= 1 when i = j, 0 otherwise, b

i j

= 1 when i

is not a forced neuron, 0 otherwise. J

= B is of the

same form as z and its elements are such that B

= 0

when i 6= q, B

i j

= s(x

∗

) if p is a forced neuron,

i j

= s(x

) otherwise.

With this algorithm we build a 4 neurons fully

connected RNN. Teacher signals have an amplitude

of 1, and we choose the phase difference vector φ

∗

{0;π/3;2π/3;π}. To generate the torque control we

use the output of the neurons 1 and 4 which are in

opposite phase, to control each direction of the han-

dlebar angle using the transformation

= pos(x

) − pos(x

) (12)

The use of 3 neurons could have been the min-

imum acceptable to solve the phase difference of π

between the output neurons. But in the scope of re-

cover, with at least 4 of them if one break, we have

the opportunity to start the algorithm again and ob-

tain the needed oscillator. The data obtained for the

weight matrix W of the estimated oscillator needed is

W =







0.493 −0.18 −0.673 −0.493

0.958 0.882 −0.076 −0.958

0.465 1.062 0.597 −0.465

−0.493 0.18 0.673 0.493







Convergence is reached in at most 300 timesteps, with

η = 0.1.

Figure 1: Effect of the modiﬁcation of the time constant τ

on the state x

in the oscillatory neural network. τ

is kept

ﬁxed(τ

= 1).

3.2 Feedback Design of the System

The torque amplitude of the oscillatory controller,

considering its frequency, may be too large for the

needs of the Roller Racer. Beside, the starting energy

for a vehicle is generally different from when it is

at full speed. So is the Roller Racer with the torque

control. It needs a transient oscillatory input which

might depend on friction biases and inertia. Thus we

create an angle feedback from the Roller Racer to the

oscillatory controller, which purpose is to constrain

the angle within some limits. We control the vehicle

in the forward direction which means the average

angle of the handlebar should be π (see (Jouffroy and

Jouffroy, 2006)).

To apply a correction to the torque generated, we

use the feedback to modify the time constants τ

of the

oscillatory controller’s output neurons. In Figure 1

we illustrate the effect on the amplitude of the output

neurons state when changing only one time constant

(τ

on the ﬁgure). As might be expected, the state of

each neuron output decreases. However the correc-

tion applies more to x

, and the amplitude difference

is maximum at τ ≈ 4. It is not really really desirable

to have one output which becomes zero as it prevents

the Roller Racer to get energy. Therefore we should

not have τ

being set too high.

We now deﬁne the transfer function, that con-

strains the angle within the boundaries [π − 1; π + 1]

and with a correction applied when τ

< 5, consid-

ering the effect on the amplitude reduced above this

limit. Note that the frequency is relatively not mod-

iﬁed in this range (a frequency modiﬁcation would

take place if all τ

where equally changed). The trans-

fer function g for the feedback is deﬁned as

TORQUE CONTROL WITH RECURRENT NEURAL NETWORKS

111

Figure 2: Control architecture of the Roller Racer. T

is the torque control applied to the vehicle. v is the angular input which

is obtained from the user direction control δ.

g(u) =



(u)



, (13)

with

(u) =

1 + e

−(6u−4.5)

, (14)

being the transfer function for the feedback to the

neuron 1, and g

(u) = −g

(u) to the neuron 4.

The formalization of these transfer functions has

been chosen so that they can be used as “feedback”

neurons, with the same neuron model as in (3) , re-

placing f

by g

The signal u ∈ R is the actual feedback signal which

is the difference between the angle of the handlebar

(in radians) and the desired average angle control

v ∈ R. For the forward direction u is actually set as

u = θ

− v = θ

− (π + δ), (15)

where δ ∈ R is the user direction input, which is actu-

ally the continuous component control of the RNN.

The feedback information thus obtained is used to

modify the time constants τ

and τ

−1

1 + g

(u)

and τ

−1

1 + g

(u)

(16)

The whole architecture is summarized in Figure 2.

3.3 Results

We present here the results of two different trials. In

both of them the purpose is to have a straight trajec-

tory along the x axis of a physical space reference.

The architecture is of course able to freely control the

vehicle in all directions but the simulations are not

shown.

In the ﬁrst trial we start the RNN with a little energy

given to a neuron. The neuron 3 in the following sim-

ulation has the initial condition x

= 0.1.

In Figure 3 left, is plotted the angle of the han-

dlebar θ

which continuous component is π, for the

forward direction. It clearly shows the transient time

when the system is extracting itself from reaction

forces, until it reaches its permanent speed at around

40 timesteps. The transient period has an amplifying

oscillation around π because the RNN is also in its

transient state with little energy.

This transient activity of the RNN, is shown by the

very weak correction applied from the feedback g(u)

in Figure 3 right, and the low speed along the x axis in

Figure 3 bottom. One can clearly see on this graphics

of the right side of the ﬁgure, a little deviation during

this time. This is only the drawback of the transient

state of the RNN which does not provide a symmetric

gain, even if the angle is within the boundaries [π −

1;π + 1].

The correction applied once the vehicle is in its

permanent state shows that the RNN’s own oscillation

is not optimal and that a relearning could ﬁx this. The

trajectory however becomes quite straight.

In the second trial we initialize the RNN with a

strong gain to see how the feedback behaves during

the transient period. We set x

= 1.

In Figure 4 left we can see that the amplitude

of the transient state of the RNN has been pushed

too high. The angle of the handlebar θ

does not

show anymore an amplifying oscillatory behavior,

and reach the boundaries we have speciﬁed. The cor-

rection from the feedback apply a high gain correction

(see Figure 4 right).

After t ≈ 40, as in the ﬁrst trial, the symmet-

ric oscillations are recovered, and the correction re-

duces to the steady-state time in Figure 3 right. Inter-

estingly the correction has constrained the deviation

well, which is not higher than in the ﬁrst trial, except

in the transient period. The vehicle also gets speed

earlier and the trajectory is also straight (Figure 4 bot-

tom).

ICINCO 2008 - International Conference on Informatics in Control, Automation and Robotics

112

Figure 3: Simulation results when the RNN is initialized with a weak energy. Left: angle of the handlebar θ

. Right: feedback

correction g

(u). Bottom: trajectory evolution. During and after the transient times, little correction is applied.

4 CONCLUSIONS AND

DISCUSSION

Nature is a great source of inspiration for engi-

neers who deal with autonomous robots. Evolution

has found optimally-designed solutions for robustness

and adaptability in a changing environment, which are

exciting to discover. However, most of the research

in the control aspects of robots with neural networks

tackle the question of complex mechanics with many

degrees of freedom, and massive neural architectures,

which appear as “black boxes” designed by genetics

algorithms. This hides the dynamics of the neural

system and correlatively the opportunity to constitute

adaptive strategies.

In this work, we present a generalized version of the

teacher forcing learning algorithm, to build up an es-

timated oscillatory controller for a vehicle with one

degree of freedom, the Roller Racer. We create an

angular feedback such that the degree of freedom is

constrained within some boundaries. The purpose is

to prevent the vehicle to go out of control during its

transient state when it starts moving, as a consequence

of the oscillator being not adapted to this particular

moment.

Our simulation results show that the feedback

makes the vehicle to behave relatively well during

transient state, when the oscillator is initialized with a

weak energy or even a strong one. The deviation also

stays little. When the steady-state period is reached,

the vehicle moves in a straight line as expected.

From a design point of view, the correction

applied by the feedback system to the RNN, never

completely vanishes. This shows that a more optimal

oscillatory behavior can be obtained, though the

“default” one does not critically affect the system

with the help of a mutual entrainment between the

vehicle dynamics and the controller, as described ﬁrst

by (Taga, 1994).

We are currently studying how an adaptive pro-

cess or observer, can modify permanently the RNN

when the average correction is too high. The outputs

of the network, with the help of the feedback, could be

the desired targets for a second network which could

thus be trained in parallel, with a partial teacher forc-

ing. However this is highly computationally expen-

sive, and not biologically viable.

The teacher forcing principle is made such that

an oscillatory behavior can be obtained with a gra-

dient descent algorithm. However, forcing the out-

puts of the network means disconnecting it, and thus

loosing the interesting desired target obtained with

feedback. Algorithms without direct gradient descent

evaluation techniques may be more appropriate (for

e.g. (Kailath, 1990)).

TORQUE CONTROL WITH RECURRENT NEURAL NETWORKS

113

Figure 4: Simulation results when the RNN is initialized with a stronger energy. The correction is high during the transient

state. The deviation is not higher than in the ﬁrst trial which shows the effective action of the feedback.

Beside, the constraint method we used in this arti-

cle can have some interest when studying the coupling

of oscillatory neural networks, for e.g. to synchronize

different joints. When we attach an oscillatory RNN

to another one which has a constraining feedback, we

can ﬁnd coupling parameters which do not yield an

increase of the correction in the feedback. This con-

straint method thus helps to reduce the space to search

for suitable coupling parameters, and to better match

the desired phase relationship.

REFERENCES

Buono, P. and M.Golubitsky (2001). Models of central pat-

tern generators for quadruped locomotion. Journal of

Mathematical Biology, 42:291–326.

Ghigliazza, R. and P.Holmes (2004). A minimal model of

a central pattern generator and motoneurons for in-

sects locomotion. SIAM Journal on Applied Dynami-

cal Systems, 3(4):671–700.

Ijspeert, A. (2001). A connectionnist central pattern gener-

ator for the aquatic and terrestrial gaits of a simulated

salamander. Biological Cybernetics, 84:331–348.

Ishiguro, A., Otsu, K., Fujii, A., Uchikawa, Y., Aoki,

T., and Eggenberger, P. (2000). Evolving and adap-

tive controller for a legged-robot with dynamically-

rearranging neural networks. In Proceedings of

the Sixth International Conference on Simulation of

Adaptive Behavior, Cambridge, MA. MIT Press.

Jouffroy, G. and Jouffroy, J. (2006). A simple mechanical

system for studying adaptive oscillatory neural net-

works. IEEE International Conference on Systems,

Man and Cybernetics, pages 2584–2589.

Kailath, A. D. . T. (1990). Model-free distributed learning.

IEEE Trans. Neural Networks, 1(1):58–70.

Kamimura, A., Kurokawa, H., Yoshida, E., Tomita, K., Mu-

rata, S., and Kokaji, S. (2003). Automatic locomotion

pattern generation for modular robots. In Proceed-

ings of IEEE International Conference on Robotics

and Automation, pages 714–720.

Krishnaprasad, P. and Tsakriris, D. (1995). Oscilla-

tions, se(2)-snakes and motion control. New Orleans,

Louisiana.

Mori, T., Nakamura, Y., Sato, M., and Ishii, S. (2004). Rein-

forcement learning for cpg-driven biped robot. Nine-

teenth National Conference on Artiﬁcal Intelligence,

pages 623–630.

Taga, G. (1994). Emergence of bipedal locomotion through

entrainment among the neuro-musculo-skeletal sys-

tem and the environment. Physica D, 75:190–208.

Tsung, F. and Cottrell, G. (1993). Phase-space learning

for recurrent networks. Technical Report CS93-285,

Dept. Computer Science and Engineering, University

of California, San Diego.

Weiss, M. (1997). Learning oscillations using adaptive con-

trol. International Conference on artiﬁcal Neural Net-

works, pages 331–336.

ICINCO 2008 - International Conference on Informatics in Control, Automation and Robotics

114