the network’s input. This also allows a CNN to have
a higher resolution input while keeping its number
of parameters low, which helps explain why CNNs
reach higher performances faster than the MLP. The
deeper CNN architecture at 42 by 42 input resolution
has 118,969 trainable parameters while the MLP ar-
chitecture has 169,753.
The semantic representations yield a surprising
performance in comparison to the RGB and grayscale
pixel values. Even at resolutions of 11 by 11, the MLP
yields a significantly higher performance than using
RGB or grayscale pixel inputs. It should be noted
that the pixel value representations have not fully con-
verged after 300,000 training steps (see Figure 11),
and that it could be the case that given enough train-
ing time, these could match the performance of vision
grids. The same conclusion can be drawn for higher
vision grid resolutions (particularly 84 by 84), which
by the end of the training period have also not con-
verged.
A reason that could explain why the 63 by 63 reso-
lutions performed worse, is that the input dimensions
are odd-numbered while the kernel strides are even.
This causes the network to ignore 3 columns on the
right of the input and 3 columns on the bottom of the
input, leading to a loss of possibly important informa-
tion.
6 CONCLUSION
Deep reinforcement learning has obtained many suc-
cesses for optimizing the behavior of an agent with
pixel information as input. This paper focused on us-
ing deep reinforcement learning for the pellet collec-
tion task in the game Agar.io. We researched different
types of state representations and their influence on
the learning process of the Q-learning algorithm com-
bined with deep neural networks. Furthermore, dif-
ferent resolutions of these state representations have
been examined and combined with different artificial
neural network architectures.
The results show that the use of a vision grid
representation, which transforms raw pixel inputs to
a more meaningful representation, helps to improve
training speed and final performance of the deep Q-
network. Furthermore, a lower resolution (of 42× 42)
for both the vision grid representation and the raw
pixel inputs leads to higher performances. Finally,
the results show that a convolutional neural network
with 3 convolutional layers generally outperforms a
smaller CNN with 2 convolutional layers.
In future work, we aim to extend the algorithm so
that the agent learns to play the full game of Agar.io.
For this it would also be interesting to use the sam-
pled policy gradient (SPG) algorithm from (Wiehe
et al., 2018) and combine it with the used methods
from this paper. Finally, we want to compare the deep
Q-network approach from this paper to Deep Quality-
Value (DQV)-Learning (Sabatelli et al., 2018) for
learning to play the game Agar.io.
ACKNOWLEDGMENTS
We would like to thank the Center for Information
Technology of the University of Groningen for their
support and for providing access to the Peregrine high
performance computing cluster.
REFERENCES
Bom, L., Henken, R., and Wiering, M. A. (2013). Rein-
forcement learning to train Ms. Pac-Man using higher-
order action-relative inputs. In Adaptive Dynamic
Programming and Reinforcement Learning (ADPRL),
pages 156–163.
Chollet, F. et al. (2015). Keras.
Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert,
M., Radford, A., Schulman, J., Sidor, S., and Wu, Y.
(2017). OpenAI baselines.
Hasselt, H. V. (2010). Double Q-learning. In Neural Infor-
mation Processing Systems (NIPS), pages 2613–2621.
Curran Associates, Inc.
Hochreiter, S. and Schmidhuber, J. (1997). Long Short-
Term Memory. Neural Computation, 9(8):1735–
1780.
Knegt, S. J. L., Drugan, M. M., and Wiering, M. A. (2018).
Opponent modelling in the game of Tron using re-
inforcement learning. In International Conference
on Agents and Artificial Intelligence (ICAART), pages
29–40. SciTePress.
Lample, G. and Chaplot, D. S. (2017). Playing FPS games
with deep reinforcement learning. In AAAI Confer-
ence on Artificial Intelligence, pages 2140–2146.
Lin, L.-J. (1992). Self-improving reactive agents based on
reinforcement learning, planning and teaching. Ma-
chine Learning, 8(3):293–321.
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A.,
Antonoglou, I., Wierstra, D., and Riedmiller, M.
(2013). Playing atari with deep reinforcement learn-
ing. arXiv preprint arXiv:1312.5602.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness,
J., Bellemare, M. G., Graves, A., Riedmiller, M., Fid-
jeland, A. K., Ostrovski, G., et al. (2015). Human-
level control through deep reinforcement learning.
Nature, 518(7540):529.
Sabatelli, M., Louppe, G., Geurts, P., and Wiering, M. A.
(2018). Deep quality-value (DQV) learning. ArXiv
e-prints: 1810.00368.
ICAART 2019 - 11th International Conference on Agents and Artificial Intelligence
132