Deep Learning Policy Quantization
Jos van de Wolfshaar, Marco Wiering, Lambert Schomaker
2018
Abstract
We introduce a novel type of actor-critic approach for deep reinforcement learning which is based on learning vector quantization. We replace the softmax operator of the policy with a more general and more flexible operator that is similar to the robust soft learning vector quantization algorithm. We compare our approach to the default A3C architecture on three Atari 2600 games and a simplistic game called Catch. We show that the proposed algorithm outperforms the softmax architecture on Catch. On the Atari games, we observe a nonunanimous pattern in terms of the best performing model.
DownloadPaper Citation
in Harvard Style
van de Wolfshaar J., Wiering M. and Schomaker L. (2018). Deep Learning Policy Quantization.In Proceedings of the 10th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, ISBN 978-989-758-275-2, pages 122-130. DOI: 10.5220/0006592901220130
in Bibtex Style
@conference{icaart18,
author={Jos van de Wolfshaar and Marco Wiering and Lambert Schomaker},
title={Deep Learning Policy Quantization},
booktitle={Proceedings of the 10th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,},
year={2018},
pages={122-130},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006592901220130},
isbn={978-989-758-275-2},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 10th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,
TI - Deep Learning Policy Quantization
SN - 978-989-758-275-2
AU - van de Wolfshaar J.
AU - Wiering M.
AU - Schomaker L.
PY - 2018
SP - 122
EP - 130
DO - 10.5220/0006592901220130