The paper tested four agents in total: one DQN agent
enhanced with multi-head attention and three standard
DQN agents. Let them play 1,000 games per episode,
with win rates calculated over a total of 100 episodes.
Figure 5 outlines that the win rate of the Multi-Head
Attention-enhanced DQN agent is significantly higher
compared to the baseline DQN agents. This
enhancement demonstrates a substantial improvement
in model performance.
5 CONCLUSION
In this article, a mahjong based on Deep Q Network is
chosen as the baseline. By combining DQN with
multi-head attention mechanism, the paper promote
the model’s performance. The outcomes of the
experiments show that the algorithm improves the win
rate on RL mahjong by 12 percentage points compared
to the original implemented DQN algorithm. The
superior performance shows that multi-head attention
mechanism can significantly improve the decision
making ability of DQN models in adversarial
environments.
It is acknowledged that various of problems in
reality have a lot of traits in common. For example,
imperfect information, intricate rules for operation and
complex distribution of incentives. Therefore, the
paper gives a new way of optimising decision making
models in real life problems.
REFERENCES
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep
learning. nature, 521(7553), 436-444.
Bowling, M., Burch, N., Johanson, M., & Tammelin, O.
(2015). Heads-up limit hold’em poker is
solved. Science, 347(6218), 145-149.
Zha, D., Xie, J., Ma, W., Zhang, S., Lian, X., Hu, X., & Liu,
J. (2021, July). Douzero: Mastering doudizhu with self-
play deep reinforcement learning. In international Conf.
on machine learning (pp. 12333-12344). PMLR.
Berner, C., Brockman, G., Chan, B., Cheung, V., Dębiak,
P., Dennison, C., ... & Zhang, S. (2019). Dota 2 with
large scale deep Reinforcement Learning. arXiv
preprint arXiv:1912.06680.
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A.,
Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013).
Playing atari with deep reinforcement learning. arXiv
preprint arXiv:1312.5602.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,
L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention
is all you need. Advances in neural information
processing systems, 30.
Sun, Yiling. Based on Expectimax Search, research on
algorithm for games with incomplete information. [D].
Beijing: Beijing Jiaotong University, 2021.
Lei, Jiewei, Jiayang Wang, Hang Ren, Tianwei Yan, & Wei
Huang. (2021). An incomplete information game
algorithm based on Expectimax search with Double
DQN. Computer Engineering, 47(3), 304-310.
Zha, D., Lai, K. H., Cao, Y., Huang, S., Wei, R., Guo, J., &
Hu, X. (2019). Rlcard: A toolkit for reinforcement
learning in card games. arXiv preprint arXiv:
1910.04376.
Van Hasselt, H., Guez, A., & Silver, D. (2016, March).
Deep reinforcement learning with double q-learning.
In Proceedings of the AAAI conference on artificial
intelligence (Vol. 30, No. 1).