be tackled using this problem formulation. Usually,
adaptive controllers rely on the certainty equivalence
principle and ignore parameter uncertainty in the con-
trol process (
˚
Astr¨om and Wittenmark, 1995). In con-
trast, a controller based on the Bayesian control rule
considers this uncertainty for balancing exploration
and exploitation in a way that minimizes the expected
relative entropy with regard to the true control law.
In particular, indirect control methods provide an
interesting perspective here, because they allow solv-
ing the adaptive control problem purely based on in-
ference and sampling methods that can be recruited
from a rich arsenal in machine learning. Both infer-
ence and action sampling work forward in time and
are therefore applicable online. Also they do not re-
quire different phases of policy evaluation and policy
improvement as some of the previous reinforcement
learning methods. Inference can be done online in-
dependent of the sampled policy. Several other stud-
ies have previously proposed to solve adaptive con-
trol problems based on inference methods (Toussaint
et al., 2006; Engel et al., 2005; Haruno et al., 2001).
Crucially, however, these studies have concentrated
on the observation part of the learning problem with
no principled solution for the action selection prob-
lem. Usually, exploration noise has to be introduced
in an ad hoc fashion in order to avoid suboptimal per-
formance. In contrast, the minimum relative entropy
cost function naturally leads to stochastic policies.
The main contribution of this study is to illustrate
how a relative entropy formulation can be applied to
solve an adaptive control problem. This is done by
deriving a stochastic controller based on the Bayesian
control rule for the LQR problem with unknown sys-
tem and cost matrices. Similar minimum relative en-
tropy formulations have recently also been proposed
to solve optimal control problems with known system
dynamics (Todorov, 2009; Kappen et al., 2009). How
these two approaches for adaptiveand optimal control
relate is an interesting question for future research.
Also, the Bayesian control rule suggested here could
in principle be employed to solve more general adap-
tive control problems with possibly nonlinear dynam-
ics. However, finding optimal tailored controllers for
complex sub-environments can in general be highly
non-trivial. Therefore, finding inference and sam-
pling methods that work for more general classes of
adaptive control problems poses a future challenge.
REFERENCES
˚
Astr¨om, K. and Wittenmark, B. (1995). Adaptive Control.
Prentice Hall, 2nd edition.
Bradtke, S. (1993). Reinforcement learning applied to lin-
ear quadratic control. Advances in Neural Information
Processing Systems 5.
Campi, M. and Kumar, P. (1996). Optimal adaptive control
of an lqg system. Proc. 35th Conf. on Decision and
Control, pages 349–353.
Engel, Y., Mannor, S., and Meir, R. (2005). Reinforcement
learning with gaussian processes. In Proceedings of
the 22nd international conference on Machine learn-
ing, pages 201–208.
Haruno, M., Wolpert, D., and Kawato, M. (2001). Mosaic
model for sensorimotor learning and control. Neural
Computation, 13:2201–2220.
Haykin, S. (2001). Kalman filtering and neural networks.
John Wiley and Sons.
Julier, S.J., U. J. and Durrant-Whyte, H. (1995). A new
approach for filtering nonlinear systems. Proc. Am.
Control Conference, pages 1628–1632.
Kappen, B., Gomez, V., and Opper, M. (2009). Opti-
mal control as a graphical model inference problem.
arXiv:0901.0633.
Ortega, P. and Braun, D. (2010). A bayesian rule for adap-
tive control based on causal interventions. In Proceed-
ings of the third conference on artificial general intel-
ligence, pages 121–126. Atlantis Press.
Pearl, J. (2000). Causality: Models, Reasoning, and Infer-
ence. Cambridge University Press, Cambridge, UK.
Stengel, R. (1993). Optimal control and estimation. Dover
Publications.
Todorov, E. (2009). Efficient computation of optimal ac-
tions. Proceedings of the National Academy of Sci-
ences U.S.A., 106:11478–11483.
Todorov, E. . and Jordan, M. (2002). Optimal feedback con-
trol as a theory of motor coordination. Nat. Neurosci.,
5:1226–1235.
Toussaint, M., Harmeling, S., and Storkey, A. (2006). Prob-
abilistic inference for solving (po)mdps. Technical
report, EDI-INF-RR-0934, University of Edinburgh,
School of Informatics.
Wittenmark, B. (1975). Stochastic adaptive control meth-
ods: a survey. International Journal of Control,
21:705–730.
ICINCO 2010 - 7th International Conference on Informatics in Control, Automation and Robotics
108