PREDICTION FOR CONTROL DELAY ON REINFORCEMENT LEARNING

Junya Saito, Kazuyuki Narisawa, Ayumi Shinohara

Abstract

This paper addresses reinforcement learning problems with constant control delay, both for known case and unknown case. First, we propose an algorithm for known delay, which is a simple extension of the model-free learning algorithm introduced by (Schuitema et al., 2010). We extend it to predict current states explicitly, and empirically show that it is more efficient than existing algorithms. Next, we consider the case that the delay is unknown but its maximum value is bounded. We propose an algorithm using accuracy of prediction of states for this case. We show that the algorithm performs as efficient as the one which knows the real delay.

References

  1. Abbeel, P., Coates, A., Qugley, M., and Ng, A. Y. (2007). An application of reinforcement learning to aerobatic helicopter flight. In In Advances in Neural Information Processing Systems 19, pages 1-8.
  2. Brafman, R. I. and Tennenholtz, M. (2003). R-max-a general polynomial time algorithm for near-optimal reinforcement learning. The Journal of Machine Learning Research, 3:213-231.
  3. Freund, Y. and Schapire, R. (1997). A desicion-theoretic generalization of on-line learning and an application to boosting. volume 55, pages 119-139.
  4. Katsikopoulos, K. and Engelbrecht, S. (2003). Markov decision processes with delays and asynchronous cost collection. IEEE Transactions on Automatic Control, 48(4):568-574.
  5. Loch, J. and Singh, S. (1998). Using eligibility traces to find the best memoryless policy in partially observable markov decision processes. In Proceedings of the 15th International Conference on Machine Learning (ICML 7898), pages 323-331.
  6. Schuitema, E., Busoniu, L., Babus?ka, R., and Jonker, P. (2010). Control delay in reinforcement learning for real-time dynamic systems: a memoryless approach. In Proceedings of the 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 3226-3231.
  7. Sutton, R. S. and Barto, A. G. (1998). Reinforcement Learning: An Introduction (Adaptive Computation and Machine Learning). The MIT Press.
  8. Szita, I. and Szepesvári, C. (2010). Model-based reinforcement learning with nearly tight exploration complexity bounds. In Proceedings of the 27th International Conference on Machine Learning (ICML 7810), pages 1031-1038.
  9. Walsh, T. J., Nouri, A., Li, L., and L.Littman, M. (2007). Planning and learning in environments with delayed feedback. In Proceedings of the 18th European Conference on Machine Learning (ECML 7807), pages 442-453.
Download


Paper Citation


in Harvard Style

Saito J., Narisawa K. and Shinohara A. (2012). PREDICTION FOR CONTROL DELAY ON REINFORCEMENT LEARNING . In Proceedings of the 4th International Conference on Agents and Artificial Intelligence - Volume 1: SSML, (ICAART 2012) ISBN 978-989-8425-95-9, pages 579-586. DOI: 10.5220/0003883405790586


in Bibtex Style

@conference{ssml12,
author={Junya Saito and Kazuyuki Narisawa and Ayumi Shinohara},
title={PREDICTION FOR CONTROL DELAY ON REINFORCEMENT LEARNING},
booktitle={Proceedings of the 4th International Conference on Agents and Artificial Intelligence - Volume 1: SSML, (ICAART 2012)},
year={2012},
pages={579-586},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003883405790586},
isbn={978-989-8425-95-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 4th International Conference on Agents and Artificial Intelligence - Volume 1: SSML, (ICAART 2012)
TI - PREDICTION FOR CONTROL DELAY ON REINFORCEMENT LEARNING
SN - 978-989-8425-95-9
AU - Saito J.
AU - Narisawa K.
AU - Shinohara A.
PY - 2012
SP - 579
EP - 586
DO - 10.5220/0003883405790586