OPTIMAL SAMPLE SELECTION FOR BATCH-MODE REINFORCEMENT LEARNING

Emmanuel Rachelson, François Schnitzler, Louis Wehenkel, Damien Ernst

Abstract

We introduce the Optimal Sample Selection (OSS) meta-algorithm for solving discrete-time Optimal Control problems. This meta- algorithm maps the problem of finding a near-optimal closed-loop policy to the identification of a small set of one-step system transitions, leading to high-quality policies when used as input of a batch-mode Reinforcement Learning (RL) algorithm. We detail a particular instance of this OSS meta-algorithm that uses tree-based Fitted Q-Iteration as a batch-mode RL algorithm and Cross Entropy search as a method for navigating efficiently in the space of sample sets. The results show that this particular instance of OSS algorithms is able to identify rapidly small sample sets leading to high-quality policies.

References

  1. Bertsekas, D. P. and Shreve, S. E. (1996). Stochastic Optimal Control: The Discrete-Time Case. Athena Scientific.
  2. Boyan, J. (1999). Least-squares temporal difference learning. In Int. Conf. Machine Learning, pages 49-56.
  3. Bus¸oniu, L., Babus?ka, R., Schutter, B. D., and Ernst, D. (2010). Reinforcement Learning and Dynamic Programming using Function Approximators. Taylor & Francis.
  4. Bus¸oniu, L., Ernst, D., Schutter, B. D., and Babus?ka, R. (2008). Fuzzy partition optimization for approximate fuzzy Q-iteration. In IFAC World Congress.
  5. Ernst, D. (2005). Selecting concise sets of samples for a reinforcement learning agent. In Int. Conf. on Computational Intelligence, Robotics and Autonomous Systems.
  6. Ernst, D., Geurts, P., and Wehenkel, L. (2005). Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, 6:503-556.
  7. Ernst, D., Stan, G. B., Goncalves, J., and Wehenkel, L. (2006). Clinical data based optimal STI strategies for HIV: a reinforcement learning approach. In IEEE Conference on Decision and Control.
  8. Kalyanakrishnan, S. and Stone, P. (2007). Batch reinforcement learning in a complex domain. In AAMAS, pages 650-657.
  9. Kroese, D. P., Rubinstein, R. Y., and Porotsky, S. (2006). The cross-entropy method for continuous multiextremal optimization. Meth. and Comp. in App. Prob., 8:383-407.
  10. Lagoudakis, M. and Parr, R. (2003a). Least-squares policy iteration. Journal of Machine Learning Research, 4:1107-1149.
  11. Lagoudakis, M. G. and Parr, R. (2003b). Reinforcement learning as classification: Leveraging modern classifiers. In 20th Int. Conf. on Machine Learning, pages 424-431.
  12. Menache, A., Mannor, S., and Shimkin, N. (2005). Basis function adaptation in temporal difference reinforcement learning. Annals of Operations Research, 134(1):215-238.
  13. Neumann, G. and Peters, J. (2009). Fitted Q-iteration by advantage weighted regression. In Neural Information Processing Systems.
  14. Ormoneit, D. and Sen, S. (2002). Kernel-based reinforcement learning. Machine Learning Journal, 49:161- 178.
  15. Riedmiller, M. (2005). Neural fitted Q-iteration - first experiences with a data efficient neural reinforcement learning method. In 16th European Conference on Machine Learning, pages 317-328.
  16. Rubinstein, R. Y. and Kroese, D. P. (2004). The CrossEntropy Method: a Unified Approach to Monte Carlo Simulation, Randomized Optimization and Machine Learning. Springer Verlag.
  17. Sutton, R. S. and Barto, A. G. (1998). Reinforcement Learning: An Introduction. The MIT Press, Cambridge.
  18. Szita, I. and Lörincz, A. (2006). Learning tetris using the noisy cross-entropy method. Neural Computation, 18(12):2936-2941.
Download


Paper Citation


in Harvard Style

Rachelson E., Schnitzler F., Wehenkel L. and Ernst D. (2011). OPTIMAL SAMPLE SELECTION FOR BATCH-MODE REINFORCEMENT LEARNING . In Proceedings of the 3rd International Conference on Agents and Artificial Intelligence - Volume 1: ICAART, ISBN 978-989-8425-40-9, pages 41-50. DOI: 10.5220/0003133500410050


in Bibtex Style

@conference{icaart11,
author={Emmanuel Rachelson and François Schnitzler and Louis Wehenkel and Damien Ernst},
title={OPTIMAL SAMPLE SELECTION FOR BATCH-MODE REINFORCEMENT LEARNING},
booktitle={Proceedings of the 3rd International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,},
year={2011},
pages={41-50},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003133500410050},
isbn={978-989-8425-40-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 3rd International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,
TI - OPTIMAL SAMPLE SELECTION FOR BATCH-MODE REINFORCEMENT LEARNING
SN - 978-989-8425-40-9
AU - Rachelson E.
AU - Schnitzler F.
AU - Wehenkel L.
AU - Ernst D.
PY - 2011
SP - 41
EP - 50
DO - 10.5220/0003133500410050