INTERNALLY DRIVEN Q-LEARNING - Convergence and Generalization Results

Eduardo Alonso, Esther Mondragón, Niclas Kjäll-Ohlsson



We present an approach to solving the reinforcement learning problem in which agents are provided with internal drives against which they evaluate the value of the states according to a similarity function. We extend Q-learning by substituting internally driven values for ad hoc rewards. The resulting algorithm, Internally Driven Q-learning (IDQ-learning), is experimentally proved to convergence to optimality and to generalize well. These results are preliminary yet encouraging: IDQ-learning is more psychologically plausible than Q-learning, and it devolves control and thus autonomy to agents that are otherwise at the mercy of the environment (i.e., of the designer).


  1. Sutton, R. S., and Barto, A. G., 1998. Reinforcement Learning: An Introduction. The MIT Press, Cambridge, MA.
  2. Watkins, C. J. C. H., and Dayan, P., 1992. Q-learning. Machine Learning, 8, 279-292.

Paper Citation

in Harvard Style

Alonso E., Mondragón E. and Kjäll-Ohlsson N. (2012). INTERNALLY DRIVEN Q-LEARNING - Convergence and Generalization Results . In Proceedings of the 4th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART, ISBN 978-989-8425-95-9, pages 491-494. DOI: 10.5220/0003736404910494

in Bibtex Style

author={Eduardo Alonso and Esther Mondragón and Niclas Kjäll-Ohlsson},
title={INTERNALLY DRIVEN Q-LEARNING - Convergence and Generalization Results},
booktitle={Proceedings of the 4th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,},

in EndNote Style

JO - Proceedings of the 4th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,
TI - INTERNALLY DRIVEN Q-LEARNING - Convergence and Generalization Results
SN - 978-989-8425-95-9
AU - Alonso E.
AU - Mondragón E.
AU - Kjäll-Ohlsson N.
PY - 2012
SP - 491
EP - 494
DO - 10.5220/0003736404910494