Figure 8: Illustration of the trial-to-trial variability of the 6-
DoF arm when reaching towards target (c). The plots depict
the joint angles (1–6) over time. Grey lines indicate iLQG,
black lines stem from iLQG–LD.
framework. Most importantly, we carried over the
favourable properties of iLQG to more realistic con-
trol problems where the analytic dynamics model is
often unknown, difficultto estimate accurately or sub-
ject to changes.
Utilising the derivatives (8) of the learned dynam-
ics model
˜
f avoids expensive finite difference cal-
culations during the dynamics linearisation step of
iLQG. This significantly reduces the computational
complexity, allowing the framework to scale to larger
DoF systems. We empirically showed that iLQG–LD
performs reliably in the presence of noise and that it
is adaptive with respect to systematic changes in the
dynamics; hence, the framework has the potential to
provide a unifying tool for modelling (and informing)
non-linear sensorimotor adaptation experiments even
under complex dynamic perturbations. As with iLQG
control, redundancies are implicitly resolved by the
OFC framework through a cost function, eliminating
the need for a separate trajectory planner and inverse
kinematics/dynamics computation.
Our future work will concentrate on implement-
ing the iLQG–LD frameworkon the anthropomorphic
LWR hardware – this will not only explore an alterna-
tive control paradigm, but will also provide the only
viable and principled control strategy for the biomor-
phic variable stiffness based highly redundant actua-
tion system that we are currently developing. Indeed,
exploiting this framework for understanding OFC and
its link to biological motor control is another very im-
portant strand.
REFERENCES
Abbeel, P., Quigley, M., and Ng, A. Y. (2006). Using inac-
curate models in reinforcement learning. In Proc. Int.
Conf. on Machine Learning, pages 1–8.
Atkeson, C. G. (2007). Randomly sampling actions in
dynamic programming. In Proc. Int. Symp. on Ap-
proximate Dynamic Programming and Reinforcement
Learning, pages 185–192.
Atkeson, C. G., Moore, A., and Schaal, S. (1997). Locally
weighted learning for control. AI Review, 11:75–113.
Atkeson, C. G. and Schaal, S. (1997). Learning tasks from a
single demonstration. In Proc. Int. Conf. on Robotics
and Automation (ICRA), volume 2, pages 1706–1712,
Albuquerque, New Mexico.
Bertsekas, D. P. (1995). Dynamic programming and optimal
control. Athena Scientific, Belmont, Mass.
Flash, T. and Hogan, N. (1985). The coordination of arm
movements: an experimentally confirmed mathemati-
cal model. Journal of Neuroscience, 5:1688–1703.
Harris, C. M. and Wolpert, D. M. (1998). Signal-dependent
noise determines motor planning. Nature, 394:780–
784.
Jacobson, D. H. and Mayne, D. Q. (1970). Differential Dy-
namic Programming. Elsevier, New York.
Li, W. (2006). Optimal Control for Biological Movement
Systems. PhD dissertation, University of California,
San Diego.
Li, W. and Todorov, E. (2004). Iterative linear-quadratic
regulator design for nonlinear biological movement
systems. In Proc. 1st Int. Conf. Informatics in Con-
trol, Automation and Robotics.
Li, W. and Todorov, E. (2007). Iterative linearization meth-
ods for approximately optimal control and estimation
of non-linear stochastic system. International Journal
of Control, 80(9):14391453.
Scott, S. H. (2004). Optimal feedback control and the neu-
ral basis of volitional motor control. Nature Reviews
Neuroscience, 5:532–546.
Shadmehr, R. and Mussa-Ivaldi, F. A. (1994). Adaptive
representation of dynamics during learning of a motor
task. The Journal of Neurosciene, 14(5):3208–3224.
Shadmehr, R. and Wise, S. P. (2005). The Computational
Neurobiology of Reaching and Ponting. MIT Press.
Stengel, R. F. (1994). Optimal control and estimation.
Dover Publications, New York.
Thrun, S. (2000). Monte carlo POMDPs. In Solla, S. A.,
Leen, T. K., and M¨uller, K. R., editors, Advances
in Neural Information Processing Systems 12, pages
1064–1070. MIT Press.
Todorov, E. (2004). Optimality principles in sensorimotor
control. Nature Neuroscience, 7(9):907–915.
Todorov, E. and Jordan, M. (2003). A minimal intervention
principle for coordinated movement. In Advances in
Neural Information Processing Systems, volume 15,
pages 27–34. MIT Press.
Todorov, E. and Li, W. (2005). A generalized iterative LQG
method for locally-optimal feedback control of con-
strained nonlinear stochastic systems. In Proc. of the
American Control Conference.
Uno, Y., Kawato, M., and Suzuki, R. (1989). Formation
and control of optimal trajectories in human multijoint
arm movements: minimum torque-change model. Bi-
ological Cybernetics, 61:89–101.
Vijayakumar, S., D’Souza, A., and Schaal, S. (2005). In-
cremental online learning in high dimensions. Neural
Computation, 17:2602–2634.
ICINCO 2008 - International Conference on Informatics in Control, Automation and Robotics
148