cases where redun dant additiona l information wors-
ens learning accuracy and identified priorities for ad-
ditional information in fu lly an d partially observable
environments.
In this paper, experiments were conduc ted with
additional informa tion limited to agent observations
and actions. Future research should consider ap-
plying this approach to other types of additional in-
formation. Furthermore, the inform ation selection
method in this study was defined at runtime and re-
mained fixed throughout the learning process. Given
the complexity of multi-agent reinf orcement learn-
ing (MARL), it is likely that the critical information
may vary depending on the learning stage. Therefore,
dynamic selection based on the progress of learning
would be a valuable direction for future work.
REFERENCES
Foerster, J., Farquhar, G., Afouras, T., Nardelli, N., and
Whiteson, S. (2018). Counterfactual multi-agent pol-
icy gradients. In Proceedings of the AAAI conference
on artificial intelligence, volume 32.
Gronauer, S. and D iepold, K. (2022). Multi-agent deep re-
inforcement learning: a survey. Artificial Intelligence
Review, 55(2):895–943.
Hansen, E. A., Bernstein, D. S., and Zilberstein, S.
(2004). Dynamic programming for partially observ-
able stochastic games. In AA AI, volume 4, pages 709–
715.
Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T.,
Tassa, Y., Silver, D., and Wierstra, D. (2015). Contin-
uous control w ith deep reinforcement learning. arXi v
preprint arXiv:1509.02971.
Littman, M. L. (1994). Markov games as a f ramework
for multi-agent reinforcement learning. In Machine
learning proceedings 1994, pages 157–163. Elsevier.
Lowe, R., Wu, Y. I., Tamar, A., Harb, J., Pieter Abbeel,
O., and Mordatch, I. (2017). Multi-agent actor-critic
for mixed cooperative-competitive environments. Ad-
vances in neural information processing systems, 30.
Rashid, T., Samvelyan, M., De Witt, C. S., Farquhar, G.,
Foerster, J., and Whiteson, S. (2020). Monotonic
value function factorisation for deep multi-agent re-
inforcement learning. Journal of Machine Learning
Research, 21(178):1–51.
Schulman, J., Wolski, F., Dhariwal, P., Radford, A. , and
Klimov, O. (2017). Proximal policy optimization al-
gorithms. arXiv preprint arXiv:1707.06347.
Son, K., Kim, D., Kang, W. J., Hostallero, D. E., and Yi, Y.
(2019). Qtran: Learning to factorize with transforma-
tion for cooperative multi-agent reinforcement learn-
ing. In International conference on machine learning,
pages 5887–5896. PMLR.
Sunehag, P., Lever, G., Gruslys, A., Czarnecki, W. M.,
Zambaldi, V., Jaderberg, M., Lanctot, M., Sonnerat,
N., L eibo, J. Z., Tuyls, K., and Graepel, T. (2018).
Value-decomposition networks for cooperative multi-
agent learning based on team reward. In Proceedings
of the 17th International Conference on Autonomous
Agents and MultiAgent Systems, AAMAS ’18, pages
2085–2087, Richland, SC. I nternational Foundation
for Autonomous Agents and Multiagent Systems.
Wolpert, D. H. and Tumer, K. (2001). Optimal payoff func-
tions for members of collectives. Advances in Com-
plex Systems, 4(02n03):265–279.
Yu, C., Velu, A ., Vinitsky, E., Gao, J., Wang, Y., Bayen, A.,
and Wu, Y. (2022). The surprising effectiveness of ppo
in cooperative multi-agent games. Advances i n Neural
Information Processing Systems, 35:24611–24624.
Zhang, K., Yang, Z., and Bas¸ ar, T. (2021). Multi-agent rein-
forcement learning: A selective overview of theories
and algorithms. Handbook of Reinforcement Learning
and Control, pages 321–384.