Reward Prediction for Representation Learning and Reward Shaping
Hlynur Davíð Hlynsson, Laurenz Wiskott
2021
Abstract
One of the fundamental challenges in reinforcement learning (RL) is the one of data efficiency: modern algorithms require a very large number of training samples, especially compared to humans, for solving environments with high-dimensional observations. The severity of this problem is increased when the reward signal is sparse. In this work, we propose learning a state representation in a self-supervised manner for reward prediction. The reward predictor learns to estimate either a raw or a smoothed version of the true reward signal in an environment with a single terminating goal state. We augment the training of out-of-the-box RL agents in single-goal environments with visual inputs by shaping the reward using our reward predictor during policy learning. Using our representation for preprocessing high-dimensional observations, as well as using the predictor for reward shaping, is shown to facilitate faster learning of Actor Critic using Kronecker-factored Trust Region and Proximal Policy Optimization.
DownloadPaper Citation
in Harvard Style
Hlynsson H. and Wiskott L. (2021). Reward Prediction for Representation Learning and Reward Shaping. In Proceedings of the 13th International Joint Conference on Computational Intelligence (IJCCI 2021) - Volume 1: NCTA; ISBN 978-989-758-534-0, SciTePress, pages 267-276. DOI: 10.5220/0010640200003063
in Bibtex Style
@conference{ncta21,
author={Hlynur Davíð Hlynsson and Laurenz Wiskott},
title={Reward Prediction for Representation Learning and Reward Shaping},
booktitle={Proceedings of the 13th International Joint Conference on Computational Intelligence (IJCCI 2021) - Volume 1: NCTA},
year={2021},
pages={267-276},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010640200003063},
isbn={978-989-758-534-0},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 13th International Joint Conference on Computational Intelligence (IJCCI 2021) - Volume 1: NCTA
TI - Reward Prediction for Representation Learning and Reward Shaping
SN - 978-989-758-534-0
AU - Hlynsson H.
AU - Wiskott L.
PY - 2021
SP - 267
EP - 276
DO - 10.5220/0010640200003063
PB - SciTePress