# OPTIMAL CONTROL WITH ADAPTIVE INTERNAL DYNAMICS MODELS

### Djordje Mitrovic, Stefan Klanke, Sethu Vijayakumar

#### Abstract

Optimal feedback control has been proposed as an attractive movement generation strategy in goal reaching tasks for anthropomorphic manipulator systems. The optimal feedback control law for systems with non-linear dynamics and non-quadratic costs can be found by iterative methods, such as the iterative Linear Quadratic Gaussian (iLQG) algorithm. So far this framework relied on an analytic form of the system dynamics, which may often be unknown, difficult to estimate for more realistic control systems or may be subject to frequent systematic changes. In this paper, we present a novel combination of learning a forward dynamics model within the iLQG framework. Utilising such adaptive internal models can compensate for complex dynamic perturbations of the controlled system in an online fashion. The specific adaptive framework introduced lends itself to a computationally more efficient implementation of the iLQG optimisation without sacrificing control accuracy – allowing the method to scale to large DoF systems.

#### References

- Abbeel, P., Quigley, M., and Ng, A. Y. (2006). Using inaccurate models in reinforcement learning. In Proc. Int. Conf. on Machine Learning, pages 1-8.
- Atkeson, C. G. (2007). Randomly sampling actions in dynamic programming. In Proc. Int. Symp. on Approximate Dynamic Programming and Reinforcement Learning, pages 185-192.
- Atkeson, C. G., Moore, A., and Schaal, S. (1997). Locally weighted learning for control. AI Review, 11:75-113.
- Atkeson, C. G. and Schaal, S. (1997). Learning tasks from a single demonstration. In Proc. Int. Conf. on Robotics and Automation (ICRA), volume 2, pages 1706-1712, Albuquerque, New Mexico.
- Bertsekas, D. P. (1995). Dynamic programming and optimal control. Athena Scientific, Belmont, Mass.
- Flash, T. and Hogan, N. (1985). The coordination of arm movements: an experimentally confirmed mathematical model. Journal of Neuroscience, 5:1688-1703.
- Harris, C. M. and Wolpert, D. M. (1998). Signal-dependent noise determines motor planning. Nature, 394:780- 784.
- Jacobson, D. H. and Mayne, D. Q. (1970). Differential Dynamic Programming. Elsevier, New York.
- Li, W. (2006). Optimal Control for Biological Movement Systems. PhD dissertation, University of California, San Diego.
- Li, W. and Todorov, E. (2004). Iterative linear-quadratic regulator design for nonlinear biological movement systems. In Proc. 1st Int. Conf. Informatics in Control, Automation and Robotics.
- Li, W. and Todorov, E. (2007). Iterative linearization methods for approximately optimal control and estimation of non-linear stochastic system. International Journal of Control, 80(9):14391453.
- Scott, S. H. (2004). Optimal feedback control and the neural basis of volitional motor control. Nature Reviews Neuroscience, 5:532-546.
- Shadmehr, R. and Mussa-Ivaldi, F. A. (1994). Adaptive representation of dynamics during learning of a motor task. The Journal of Neurosciene, 14(5):3208-3224.
- Shadmehr, R. and Wise, S. P. (2005). The Computational Neurobiology of Reaching and Ponting. MIT Press.
- Stengel, R. F. (1994). Optimal control and estimation. Dover Publications, New York.
- Thrun, S. (2000). Monte carlo POMDPs. In Solla, S. A., Leen, T. K., and Müller, K. R., editors, Advances in Neural Information Processing Systems 12, pages 1064-1070. MIT Press.
- Todorov, E. (2004). Optimality principles in sensorimotor control. Nature Neuroscience, 7(9):907-915.
- Todorov, E. and Jordan, M. (2003). A minimal intervention principle for coordinated movement. In Advances in Neural Information Processing Systems, volume 15, pages 27-34. MIT Press.
- Todorov, E. and Li, W. (2005). A generalized iterative LQG method for locally-optimal feedback control of constrained nonlinear stochastic systems. In Proc. of the American Control Conference.
- Uno, Y., Kawato, M., and Suzuki, R. (1989). Formation and control of optimal trajectories in human multijoint arm movements: minimum torque-change model. Biological Cybernetics, 61:89-101.
- Vijayakumar, S., D'Souza, A., and Schaal, S. (2005). Incremental online learning in high dimensions. Neural Computation, 17:2602-2634.

#### Paper Citation

#### in Harvard Style

Mitrovic D., Klanke S. and Vijayakumar S. (2008). **OPTIMAL CONTROL WITH ADAPTIVE INTERNAL DYNAMICS MODELS** . In *Proceedings of the Fifth International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO,* ISBN 978-989-8111-30-2, pages 141-148. DOI: 10.5220/0001484501410148

#### in Bibtex Style

@conference{icinco08,

author={Djordje Mitrovic and Stefan Klanke and Sethu Vijayakumar},

title={OPTIMAL CONTROL WITH ADAPTIVE INTERNAL DYNAMICS MODELS},

booktitle={Proceedings of the Fifth International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO,},

year={2008},

pages={141-148},

publisher={SciTePress},

organization={INSTICC},

doi={10.5220/0001484501410148},

isbn={978-989-8111-30-2},

}

#### in EndNote Style

TY - CONF

JO - Proceedings of the Fifth International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO,

TI - OPTIMAL CONTROL WITH ADAPTIVE INTERNAL DYNAMICS MODELS

SN - 978-989-8111-30-2

AU - Mitrovic D.

AU - Klanke S.

AU - Vijayakumar S.

PY - 2008

SP - 141

EP - 148

DO - 10.5220/0001484501410148