# CONTINUOUS ACTION REINFORCEMENT LEARNING AUTOMATA - Performance and Convergence

### Abdel Rodríguez, Ricardo Grau, Ann Nowé

#### Abstract

Reinforcement Learning is a powerful technique for agents to solve unknown Markovian Decision Processes, from the possibly delayed signals that they receive. Most RL work, in particular for multi-agent settings, assume a discrete action set. Learning automata are reinforcement learners, belonging to the category of policy iterators, that exhibit nice convergence properties in discrete action settings. Unfortunately, most applications assume continuous actions. A formulation for a continuous action reinforcement learning automaton already exists, but there is no convergence guarantee to optimal decisions. An improve of the performance of the method is proposed in this paper as well as the proof for the local convergence.

#### References

- Bush, R. and Mosteller, F. (1955). Stochastic Models for Learning. Wiley.
- Hilgard, E. (1948). Theories of Learning. Appleton-Century-Crofts.
- Hilgard, E. and Bower, G. (1966). Theories of Learning. New Jersey: Prentice Hall.
- Howell, M. N., Frost, G. P., Gordon, T. J., and Wu, Q. H. (1997). Continuous action reinforcement learning applied to vehicle suspension control. Mechatronics, 7(3):263 - 276.
- Parzen, E. (1960). Modern Probability Theory And Its Applications. Wiley-interscience, wiley classics edition edition.
- Thathachar, M. A. L. and Sastry, P. S. (2004). Networks of Learning Automata: Techniques for Online Stochastic Optimization. Kluwer Academic Publishers.
- Tsetlin, M. (1961). The behavior of finite automata in random media. Avtomatika i Telemekhanika, pages 1345- 1354.
- Tsetlin, M. (1962). The behavior of finite automata in random media. Avtomatika i Telemekhanika, pages 1210- 1219.
- Tsypkin, Y. Z. (1971). Adaptation and learning in automatic systems. New York: Academic Press.
- Tsypkin, Y. Z. (1973). Foundations of the theory of learning systems. New York: Academic Press.
- von Neumann, J. and Morgenstern, O. (1944). Theory of games and economic behavior.

#### Paper Citation

#### in Harvard Style

Rodríguez A., Grau R. and Nowé A. (2011). **CONTINUOUS ACTION REINFORCEMENT LEARNING AUTOMATA - Performance and Convergence** . In *Proceedings of the 3rd International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,* ISBN 978-989-8425-41-6, pages 473-478. DOI: 10.5220/0003287104730478

#### in Bibtex Style

@conference{icaart11,

author={Abdel Rodríguez and Ricardo Grau and Ann Nowé},

title={CONTINUOUS ACTION REINFORCEMENT LEARNING AUTOMATA - Performance and Convergence},

booktitle={Proceedings of the 3rd International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,},

year={2011},

pages={473-478},

publisher={SciTePress},

organization={INSTICC},

doi={10.5220/0003287104730478},

isbn={978-989-8425-41-6},

}

#### in EndNote Style

TY - CONF

JO - Proceedings of the 3rd International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,

TI - CONTINUOUS ACTION REINFORCEMENT LEARNING AUTOMATA - Performance and Convergence

SN - 978-989-8425-41-6

AU - Rodríguez A.

AU - Grau R.

AU - Nowé A.

PY - 2011

SP - 473

EP - 478

DO - 10.5220/0003287104730478