Authors:
Stefan J. L. Knegt
1
;
Madalina M. Drugan
2
and
Marco A. Wiering
1
Affiliations:
1
University of Groningen, Netherlands
;
2
ITLearns.Online, Netherlands
Keyword(s):
Reinforcement Learning, Opponent Modelling, Q-learning, Computer Games.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Biomedical Engineering
;
Biomedical Signal Processing
;
Computational Intelligence
;
Evolutionary Computing
;
Health Engineering and Technology Applications
;
Human-Computer Interaction
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Machine Learning
;
Methodologies and Methods
;
Neural Networks
;
Neurocomputing
;
Neurotechnology, Electronics and Informatics
;
Pattern Recognition
;
Physiological Computing Systems
;
Sensor Networks
;
Signal Processing
;
Soft Computing
;
Symbolic Systems
;
Theory and Methods
Abstract:
In this paper we propose the use of vision grids as state representation to learn to play the game Tron using
neural networks and reinforcement learning. This approach speeds up learning by significantly reducing the
number of unique states. Furthermore, we introduce a novel opponent modelling technique, which is used to
predict the opponent’s next move. The learned model of the opponent is subsequently used in Monte-Carlo
roll-outs, in which the game is simulated n-steps ahead in order to determine the expected value of conducting a
certain action. Finally, we compare the performance using two different activation functions in the multi-layer
perceptron, namely the sigmoid and exponential linear unit (Elu). The results show that the Elu activation
function outperforms the sigmoid activation function in most cases. Furthermore, vision grids significantly
increase learning speed and in most cases this also increases the agent’s performance compared to when the
full grid is used as sta
te representation. Finally, the opponent modelling technique allows the agent to learn
a predictive model of the opponent’s actions, which in combination with Monte-Carlo roll-outs significantly
increases the agent’s performance.
(More)