INTERNALLY DRIVEN Q-LEARNING - Convergence and Generalization Results
Eduardo Alonso, Esther Mondragón, Niclas Kjäll-Ohlsson
2012
Abstract
We present an approach to solving the reinforcement learning problem in which agents are provided with internal drives against which they evaluate the value of the states according to a similarity function. We extend Q-learning by substituting internally driven values for ad hoc rewards. The resulting algorithm, Internally Driven Q-learning (IDQ-learning), is experimentally proved to convergence to optimality and to generalize well. These results are preliminary yet encouraging: IDQ-learning is more psychologically plausible than Q-learning, and it devolves control and thus autonomy to agents that are otherwise at the mercy of the environment (i.e., of the designer).
References
- Sutton, R. S., and Barto, A. G., 1998. Reinforcement Learning: An Introduction. The MIT Press, Cambridge, MA.
- Watkins, C. J. C. H., and Dayan, P., 1992. Q-learning. Machine Learning, 8, 279-292.
Paper Citation
in Harvard Style
Alonso E., Mondragón E. and Kjäll-Ohlsson N. (2012). INTERNALLY DRIVEN Q-LEARNING - Convergence and Generalization Results . In Proceedings of the 4th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART, ISBN 978-989-8425-95-9, pages 491-494. DOI: 10.5220/0003736404910494
in Bibtex Style
@conference{icaart12,
author={Eduardo Alonso and Esther Mondragón and Niclas Kjäll-Ohlsson},
title={INTERNALLY DRIVEN Q-LEARNING - Convergence and Generalization Results},
booktitle={Proceedings of the 4th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,},
year={2012},
pages={491-494},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003736404910494},
isbn={978-989-8425-95-9},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 4th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,
TI - INTERNALLY DRIVEN Q-LEARNING - Convergence and Generalization Results
SN - 978-989-8425-95-9
AU - Alonso E.
AU - Mondragón E.
AU - Kjäll-Ohlsson N.
PY - 2012
SP - 491
EP - 494
DO - 10.5220/0003736404910494