# Reducing Sample Complexity in Reinforcement Learning by Transferring Transition and Reward Probabilities

### Kouta Oguni, Kazuyuki Narisawa, Ayumi Shinohara

#### Abstract

Most existing reinforcement learning algorithms require many trials until they obtain optimal policies. In this study, we apply transfer learning to reinforcement learning to realize greater efficiency. We propose a new algorithm called TR-MAX, based on the R-MAX algorithm. TR-MAX transfers the transition and reward probabilities from a source task to a target task as prior knowledge. We theoretically analyze the sample complexity of TR-MAX. Moreover, we show that TR-MAX performs much better in practice than R-MAX in maze tasks.

#### References

- Brafman, R. I. and Tennenholtz, M. (2003). R-max - a general polynomial time algorithm for near-optimal reinforcement learning. The Journal of Machine Learning, 3:213-231.
- Kakade, S. M. (2003). On the Sample Complexity of Reinforceent Learning. PhD thesis, University College London.
- Konidaris, G. and Barto, A. (2006). Autonomous shaping: Knowledge transfer in reinforcement learning. In Proc of ICML, pages 489-496.
- Kretchmar., R. M. (2002). Parallel reinforcement learning. In Proc. of SCI, pages 114-118.
- Mann, T. A. and Choe, Y. (2012). Directed exploration in reinforcement learning with transferred knowledge. In Proc. of EWRL, pages 59-76.
- Miyazaki, K., Yamamura, M., and Kobayashi, S. (1997). k-certainty exploration method: an action selector to identify the environment in reinforcement learning. Artificial intelligence, 91(1):155-171.
- Rummery, G. A. and Niranjan, M. (1994). On-line Qlearning using connectionist systems. Cambridge University.
- Saito, J., Narisawa, K., and Shinohara, A. (2012). Prediction for control delay on reinforcement learning. In Proc. of ICAART, pages 579-586.
- Schuitema, E., Busoniu, L., Babuska, R., and Jonker, P. (2010). Control delay in reinforcement learning for real-time dynamic systems: a memoryless approach. In Intelligent Robots and Systems (IROS), 2010 IEEE/RSJ International Conference on, pages 3226-3231. IEEE.
- Strehl, A. L., Li, L., and Littman, M. L. (2009). Reinforcement learning in finite mdps: Pac analysis. The Journal of Machine Learning Research, 10:2413-2444.
- Strehl, A. L. and Littman, M. L. (2005). A theoretical analysis of model-based interval estimation. In Proc. of ICML, pages 857-864.
- Sutton, R. S. and Barto, A. G. (1998). Reinforcement learning. MIT Press.
- Tan, M. (1993). Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proc. of ICML, pages 330-337.
- Taylor, M. E. and Stone, P. (2009). Transfer learning for reinforcement learning domains: A survey. The Journal of Machine Learning Research, 10:1633-1685.
- Taylor, M. E., Stone, P., and Liu, Y. (2007). Transfer learning via inter-task mappings for temporal difference learning. Journal of Machine Learning Research, 8(1):2125-2167.

#### Paper Citation

#### in Harvard Style

Oguni K., Narisawa K. and Shinohara A. (2014). **Reducing Sample Complexity in Reinforcement Learning by
Transferring Transition and Reward Probabilities** . In *Proceedings of the 6th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,* ISBN 978-989-758-015-4, pages 632-638. DOI: 10.5220/0004915606320638

#### in Bibtex Style

@conference{icaart14,

author={Kouta Oguni and Kazuyuki Narisawa and Ayumi Shinohara},

title={Reducing Sample Complexity in Reinforcement Learning by
Transferring Transition and Reward Probabilities},

booktitle={Proceedings of the 6th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,},

year={2014},

pages={632-638},

publisher={SciTePress},

organization={INSTICC},

doi={10.5220/0004915606320638},

isbn={978-989-758-015-4},

}

#### in EndNote Style

TY - CONF

JO - Proceedings of the 6th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,

TI - Reducing Sample Complexity in Reinforcement Learning by
Transferring Transition and Reward Probabilities

SN - 978-989-758-015-4

AU - Oguni K.

AU - Narisawa K.

AU - Shinohara A.

PY - 2014

SP - 632

EP - 638

DO - 10.5220/0004915606320638