Reducing Sample Complexity in Reinforcement Learning by Transferring Transition and Reward Probabilities
Kouta Oguni, Kazuyuki Narisawa, Ayumi Shinohara
2014
Abstract
Most existing reinforcement learning algorithms require many trials until they obtain optimal policies. In this study, we apply transfer learning to reinforcement learning to realize greater efficiency. We propose a new algorithm called TR-MAX, based on the R-MAX algorithm. TR-MAX transfers the transition and reward probabilities from a source task to a target task as prior knowledge. We theoretically analyze the sample complexity of TR-MAX. Moreover, we show that TR-MAX performs much better in practice than R-MAX in maze tasks.
References
- Brafman, R. I. and Tennenholtz, M. (2003). R-max - a general polynomial time algorithm for near-optimal reinforcement learning. The Journal of Machine Learning, 3:213-231.
- Kakade, S. M. (2003). On the Sample Complexity of Reinforceent Learning. PhD thesis, University College London.
- Konidaris, G. and Barto, A. (2006). Autonomous shaping: Knowledge transfer in reinforcement learning. In Proc of ICML, pages 489-496.
- Kretchmar., R. M. (2002). Parallel reinforcement learning. In Proc. of SCI, pages 114-118.
- Mann, T. A. and Choe, Y. (2012). Directed exploration in reinforcement learning with transferred knowledge. In Proc. of EWRL, pages 59-76.
- Miyazaki, K., Yamamura, M., and Kobayashi, S. (1997). k-certainty exploration method: an action selector to identify the environment in reinforcement learning. Artificial intelligence, 91(1):155-171.
- Rummery, G. A. and Niranjan, M. (1994). On-line Qlearning using connectionist systems. Cambridge University.
- Saito, J., Narisawa, K., and Shinohara, A. (2012). Prediction for control delay on reinforcement learning. In Proc. of ICAART, pages 579-586.
- Schuitema, E., Busoniu, L., Babuska, R., and Jonker, P. (2010). Control delay in reinforcement learning for real-time dynamic systems: a memoryless approach. In Intelligent Robots and Systems (IROS), 2010 IEEE/RSJ International Conference on, pages 3226-3231. IEEE.
- Strehl, A. L., Li, L., and Littman, M. L. (2009). Reinforcement learning in finite mdps: Pac analysis. The Journal of Machine Learning Research, 10:2413-2444.
- Strehl, A. L. and Littman, M. L. (2005). A theoretical analysis of model-based interval estimation. In Proc. of ICML, pages 857-864.
- Sutton, R. S. and Barto, A. G. (1998). Reinforcement learning. MIT Press.
- Tan, M. (1993). Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proc. of ICML, pages 330-337.
- Taylor, M. E. and Stone, P. (2009). Transfer learning for reinforcement learning domains: A survey. The Journal of Machine Learning Research, 10:1633-1685.
- Taylor, M. E., Stone, P., and Liu, Y. (2007). Transfer learning via inter-task mappings for temporal difference learning. Journal of Machine Learning Research, 8(1):2125-2167.
Paper Citation
in Harvard Style
Oguni K., Narisawa K. and Shinohara A. (2014). Reducing Sample Complexity in Reinforcement Learning by Transferring Transition and Reward Probabilities . In Proceedings of the 6th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART, ISBN 978-989-758-015-4, pages 632-638. DOI: 10.5220/0004915606320638
in Bibtex Style
@conference{icaart14,
author={Kouta Oguni and Kazuyuki Narisawa and Ayumi Shinohara},
title={Reducing Sample Complexity in Reinforcement Learning by
Transferring Transition and Reward Probabilities},
booktitle={Proceedings of the 6th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,},
year={2014},
pages={632-638},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004915606320638},
isbn={978-989-758-015-4},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 6th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,
TI - Reducing Sample Complexity in Reinforcement Learning by
Transferring Transition and Reward Probabilities
SN - 978-989-758-015-4
AU - Oguni K.
AU - Narisawa K.
AU - Shinohara A.
PY - 2014
SP - 632
EP - 638
DO - 10.5220/0004915606320638