
Fig. 2. Two link robot arm system.
error function itself. Therefore, without knowing Jacobian of the objective system, we
can design a direct neuro-controller and then the neuro-controller can learn a robot
kinematics and a dynamics simultaneously. Moreover, the NC is able to adapt changing
environment through learning.
2 Simultaneous Perturbation for Neuro-Controller
The simultaneous perturbation(SP) and its applications are widely reported[1–4]. We
explain the SP learning rule with a sign vector for NCs. Now let w, J(·) and u be a
weight vector of the NC including thresholds, an error function and an input for the
robot arm, respectively. The learning rule via the SP is as follows;
w
t+1
= w
t
− α
J(u(w
t
+ cs
t
)) − J(u(w
t
))
c
s
t
(1)
Where, α and c are a positive learning coefficient to adjust a magnitude of a modifying
quantity and a magnitude of a perturbation, respectively. s denotes a sign vector whose
components are +1 or -1. Moreover, t is iteration.
Note that only two values of the error function J(u(w)) and J(u(w + cs)) are
used to update the weights in the neural network. Any information about the objective
plant such as Jacobian does not have to be required in this learning rule. Therefore,
this learning rule is easily applicable to the direct control scheme by NCs for a plant
including unknown and/or unmodeled factors.
It is known that the learning rule has the following property[1]. That is, the learning
rule is a kind of stochastic gradient learning rule.
E
µ
J(u(w
t
+ cs
t
)) − J(u(w
t
))
c
s
t
¶
=
∂J (u(w
t
))
∂w
(2)
Note that the SP learning rule is not a merely expansion of the ordinary finite dif-
ference approximation. In our case, we have to adjust plural parameters, that is, weight
values in the NC. The number of the weights is relatively large. If we simply use the
finite difference, we have to know many values of the error J for all weights to update.
On the other hand, the SP method requires only two values of the error, even if the
number of the weights of the NC is large.
17