Authors:
Juan M. Montoya
1
;
Imant Daunhawer
2
;
Julia E. Vogt
2
and
Marco Wiering
3
Affiliations:
1
Department of Computer Science, University of Konstanz, Germany
;
2
Department of Computer Science, ETH Zurich, Switzerland
;
3
Bernoulli Institute for Mathematics, Computer Science and Artificial Intelligence, University of Groningen, The Netherlands
Keyword(s):
Deep Reinforcement Learning, State Representation Learning, Variational Autoencoders, Constrastive Learning.
Abstract:
In the quest for efficient and robust learning methods, combining unsupervised state representation learning and reinforcement learning (RL) could offer advantages for scaling RL algorithms by providing the models with a useful inductive bias. For achieving this, an encoder is trained in an unsupervised manner with two state representation methods, a variational autoencoder and a contrastive estimator. The learned features are then fed to the actor-critic RL algorithm Proximal Policy Optimization (PPO) to learn a policy for playing Open AI’s car racing environment. Hence, such procedure permits to decouple state representations from RL-controllers. For the integration of RL with unsupervised learning, we explore various designs for variational autoencoders and contrastive learning. The proposed method is compared to a deep network trained directly on pixel inputs with PPO. The results show that the proposed method performs slightly worse than directly learning from pixel inputs; howe
ver, it has a more stable learning curve, a substantial reduction of the buffer size, and requires optimizing 88% fewer parameters. These results indicate that the use of pre-trained state representations has several benefits for solving RL tasks.
(More)