Hierarchical Reinforcement Learning Introducing Genetic Algorithm for POMDPs Environments
Kohei Suzuki, Shohei Kato
2019
Abstract
Perceptual aliasing is one of the major problems in applying reinforcement learning to the real world. Perceptual aliasing occurs in the POMDPs environment, where agents cannot observe states correctly, which makes reinforcement learning unsuccessful. HQ-learning is cited as a solution to perceptual aliasing. HQ-learning solves perceptual aliasing by using subgoals and subagent. However, subagents learn independently and have to relearn each time when subgoals change. In addition, the number of subgoals is fixed, and the number of episodes in reinforcement learning increases unless the number of subgoals is appropriate. In this paper, we propose the reinforcement learning method that generates subgoals using genetic algorithm. We also report the effectiveness of our method by some experiments with partially observable mazes.
DownloadPaper Citation
in Harvard Style
Suzuki K. and Kato S. (2019). Hierarchical Reinforcement Learning Introducing Genetic Algorithm for POMDPs Environments.In Proceedings of the 11th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, ISBN 978-989-758-350-6, pages 318-327. DOI: 10.5220/0007405403180327
in Bibtex Style
@conference{icaart19,
author={Kohei Suzuki and Shohei Kato},
title={Hierarchical Reinforcement Learning Introducing Genetic Algorithm for POMDPs Environments},
booktitle={Proceedings of the 11th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,},
year={2019},
pages={318-327},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0007405403180327},
isbn={978-989-758-350-6},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 11th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,
TI - Hierarchical Reinforcement Learning Introducing Genetic Algorithm for POMDPs Environments
SN - 978-989-758-350-6
AU - Suzuki K.
AU - Kato S.
PY - 2019
SP - 318
EP - 327
DO - 10.5220/0007405403180327