PV-MCTS eliminates the need for a huge database of
game records, reduces the burden of designing neural
networks, and improves learning efficiency.
The design of the search tree had to be modified
to allow the neural network to output complex data
structures representing the actions of the units of turn-
based strategy games. By moving the data of the op-
erating unit from the output of the neural network to
the input, we succeeded in significantly reducing the
design load of the neural network. The unit selection
problem necessary for the multi-unit operation pecu-
liar to turn-based strategy games was also solved by
integrating unit selection into the search tree. Effec-
tive changes such as SepConv2D, B
attack
, and dropout
were also introduced and evaluated. The number of
blocks in the Residual layer was also evaluated.
The new method showed excellent performance as
compared to two simple and classical algorithms. It
also performed well on unlearned maps and showed
generalization by learning. However, when the num-
ber of units was increased, the operation time in-
creased, which is a problem that needs to be fixed in
the future.
In future research, we aim to test the method on a
wide variety of map situations and on maps with more
units.
REFERENCES
Fujiki, T., Ikeda, K., and Viennot, S. (2015). A platform
for turn-based strategy games, with a comparison of
monte-carlo algorithms. In 2015 IEEE Conference on
Computational Intelligence and Games (CIG), pages
407–414.
He, K., Zhang, X., Ren, S., and Sun, J. (2015). Deep resid-
ual learning for image recognition. 2016 IEEE Con-
ference on Computer Vision and Pattern Recognition
(CVPR), pages 770–778.
Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D.,
Wang, W., Weyand, T., Andreetto, M., and Adam,
H. (2017). Mobilenets: Efficient convolutional neu-
ral networks for mobile vision applications. CoRR,
abs/1704.04861.
Kato, C., Miwa, M., Tsuruoka, Y., and Chikayama, T.
(2013). Uct and its enhancement for tactical decisions
in turn-based strategy games. In Game Programming
Workshop 2013, pages 138–145.
Kimura, T. (2019). Application of reinforcement learning
algorithm using policy network and value network to
the turn-based strategy game. In Game Programming
Workshop 2019.
Kimura, T. and Ikeda, K. (2016). Offering new benchmark
maps for turn based strategy game. In Game Program-
ming Workshop 2016, pages 36–43.
Kimura, T. and Ikeda, K. (2019). Designing policy network
with deep learning in turn-based strategy games. In
16th Advances in Computer Games Conference.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A., Veness, J.,
Bellemare, M., Graves, A., Riedmiller, M., Fidjeland,
A., Ostrovski, G., Petersen, S., Beattie, C., Sadik,
A., Antonoglou, I., King, H., Kumaran, D., Wierstra,
D., Legg, S., and Hassabis, D. (2015). Human-level
control through deep reinforcement learning. Nature,
518:529–33.
Rosin, C. D. (2011). Multi-armed bandits with episode
context. Annals of Mathematics and Artificial Intel-
ligence, 61(3):203–230.
Sato, N., Fujiki, T., and Ikeda, K. (2015). An approach to
evaluate turn-based strategy game positions with of-
fline tree searches in simplified games. In Game Pro-
gramming Workshop 2015, pages 61–68.
Sato, N. and Ikeda, K. (2016). Three types of forward
pruning techniques to apply the alpha beta algorithm
to turn-based strategy games. In 2016 IEEE Con-
ference on Computational Intelligence and Games
(CIG), pages 1–8.
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L.,
van den Driessche, G., Schrittwieser, J., Antonoglou,
I., Panneershelvam, V., Lanctot, M., Dieleman, S.,
Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I.,
Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel,
T., and Hassabis, D. (2016). Mastering the game of
go with deep neural networks and tree search. Nature,
529:484–503.
Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai,
M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D.,
Graepel, T., Lillicrap, T., Simonyan, K., and Hass-
abis, D. (2018). A general reinforcement learning
algorithm that masters chess, shogi, and go through
self-play. Science, 362:1140–1144.
Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I.,
Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M.,
Bolton, A., Chen, Y., Lillicrap, T., Hui, F., Sifre, L.,
Driessche, G., Graepel, T., and Hassabis, D. (2017).
Mastering the game of go without human knowledge.
Nature, 550:354–359.
Zagoruyko, S. and Komodakis, N. (2016). Wide residual
networks. Computing Research Repository (CoRR),
abs/1605.07146.