Authors:
Joseph Groot Kormelink
1
;
Madalina M. Drugan
2
and
Marco A. Wiering
1
Affiliations:
1
University of Groningen, Netherlands
;
2
ITLearns.Online, Netherlands
Keyword(s):
Reinforcement Learning, Computer Games, Exploration Methods, Neural Networks.
Related
Ontology
Subjects/Areas/Topics:
Agents
;
Artificial Intelligence
;
Artificial Intelligence and Decision Support Systems
;
Autonomous Systems
;
Biomedical Engineering
;
Biomedical Signal Processing
;
Computational Intelligence
;
Distributed and Mobile Software Systems
;
Enterprise Information Systems
;
Evolutionary Computing
;
Health Engineering and Technology Applications
;
Human-Computer Interaction
;
Knowledge Discovery and Information Retrieval
;
Knowledge Engineering and Ontology Development
;
Knowledge-Based Systems
;
Machine Learning
;
Methodologies and Methods
;
Multi-Agent Systems
;
Neural Networks
;
Neurocomputing
;
Neurotechnology, Electronics and Informatics
;
Pattern Recognition
;
Physiological Computing Systems
;
Sensor Networks
;
Signal Processing
;
Soft Computing
;
Software Engineering
;
Symbolic Systems
;
Theory and Methods
Abstract:
In this paper, we investigate which exploration method yields the best performance in the game Bomberman.
In Bomberman the controlled agent has to kill opponents by placing bombs. The agent is represented by
a multi-layer perceptron that learns to play the game with the use of Q-learning. We introduce two novel
exploration strategies: Error-Driven-e and Interval-Q, which base their explorative behavior on the temporal-difference
error of Q-learning. The learning capabilities of these exploration strategies are compared to five
existing methods: Random-Walk, Greedy, e-Greedy, Diminishing e-Greedy, and Max-Boltzmann. The results
show that the methods that combine exploration with exploitation perform much better than the Random-Walk
and Greedy strategies, which only select exploration or exploitation actions. Furthermore, the results show that
Max-Boltzmann exploration performs the best in overall from the different techniques. The Error-Driven-e
exploration strategy also perf
orms very well, but suffers from an unstable learning behavior.
(More)