Reward Prediction for Representation Learning and Reward Shaping

Hlynur Davíð Hlynsson, Laurenz Wiskott

2021

Abstract

One of the fundamental challenges in reinforcement learning (RL) is the one of data efficiency: modern algorithms require a very large number of training samples, especially compared to humans, for solving environments with high-dimensional observations. The severity of this problem is increased when the reward signal is sparse. In this work, we propose learning a state representation in a self-supervised manner for reward prediction. The reward predictor learns to estimate either a raw or a smoothed version of the true reward signal in an environment with a single terminating goal state. We augment the training of out-of-the-box RL agents in single-goal environments with visual inputs by shaping the reward using our reward predictor during policy learning. Using our representation for preprocessing high-dimensional observations, as well as using the predictor for reward shaping, is shown to facilitate faster learning of Actor Critic using Kronecker-factored Trust Region and Proximal Policy Optimization.

Download


Paper Citation


in Harvard Style

Hlynsson H. and Wiskott L. (2021). Reward Prediction for Representation Learning and Reward Shaping. In Proceedings of the 13th International Joint Conference on Computational Intelligence (IJCCI 2021) - Volume 1: NCTA; ISBN 978-989-758-534-0, SciTePress, pages 267-276. DOI: 10.5220/0010640200003063


in Bibtex Style

@conference{ncta21,
author={Hlynur Davíð Hlynsson and Laurenz Wiskott},
title={Reward Prediction for Representation Learning and Reward Shaping},
booktitle={Proceedings of the 13th International Joint Conference on Computational Intelligence (IJCCI 2021) - Volume 1: NCTA},
year={2021},
pages={267-276},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010640200003063},
isbn={978-989-758-534-0},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 13th International Joint Conference on Computational Intelligence (IJCCI 2021) - Volume 1: NCTA
TI - Reward Prediction for Representation Learning and Reward Shaping
SN - 978-989-758-534-0
AU - Hlynsson H.
AU - Wiskott L.
PY - 2021
SP - 267
EP - 276
DO - 10.5220/0010640200003063
PB - SciTePress