
mine the broader effectiveness of PIMAEX in multi-
agent reinforcement learning.
ACKNOWLEDGEMENTS
This work is part of the Munich Quantum Valley,
which is supported by the Bavarian state government
with funds from the Hightech Agenda Bayern Plus.
This paper was partly funded by the German Federal
Ministry of Education and Research through the fund-
ing program “quantum technologies — from basic re-
search to market” (contract number: 13N16196).
REFERENCES
Bellemare, M. G., Srinivasan, S., Ostrovski, G., Schaul, T.,
Saxton, D., and Munos, R. (2016). Unifying count-
based exploration and intrinsic motivation.
Bradbury, J., Frostig, R., Hawkins, P., Johnson, M. J., Leary,
C., Maclaurin, D., Necula, G., Paszke, A., VanderPlas,
J., Wanderman-Milne, S., and Zhang, Q. (2018). JAX:
composable transformations of Python+NumPy pro-
grams.
Burda, Y., Edwards, H., Storkey, A., and Klimov, O. (2018).
Exploration by random network distillation.
Espeholt, L., Soyer, H., Munos, R., Simonyan, K., Mnih, V.,
Ward, T., Doron, Y., Firoiu, V., Harley, T., Dunning,
I., Legg, S., and Kavukcuoglu, K. (2018). Impala:
Scalable distributed deep-rl with importance weighted
actor-learner architectures.
Foerster, J., Farquhar, G., Afouras, T., Nardelli, N., and
Whiteson, S. (2017). Counterfactual multi-agent pol-
icy gradients.
Fu, W., Yu, C., Xu, Z., Yang, J., and Wu, Y. (2022). Re-
visiting some common practices in cooperative multi-
agent reinforcement learning.
Hoffman, M. W., Shahriari, B., Aslanides, J., Barth-Maron,
G., Momchev, N., Sinopalnikov, D., Sta
´
nczyk, P.,
Ramos, S., Raichuk, A., Vincent, D., Hussenot, L.,
Dadashi, R., Dulac-Arnold, G., Orsini, M., Jacq, A.,
Ferret, J., Vieillard, N., Ghasemipour, S. K. S., Girgin,
S., Pietquin, O., Behbahani, F., Norman, T., Abdol-
maleki, A., Cassirer, A., Yang, F., Baumli, K., Hen-
derson, S., Friesen, A., Haroun, R., Novikov, A., Col-
menarejo, S. G., Cabi, S., Gulcehre, C., Paine, T. L.,
Srinivasan, S., Cowie, A., Wang, Z., Piot, B., and
de Freitas, N. (2020). Acme: A research framework
for distributed reinforcement learning. arXiv preprint
arXiv:2006.00979.
Jaques, N., Lazaridou, A., Hughes, E., Gulcehre, C., Or-
tega, P. A., Strouse, D., Leibo, J. Z., and de Freitas,
N. (2018). Social influence as intrinsic motivation for
multi-agent deep reinforcement learning.
Leibo, J. Z., Zambaldi, V., Lanctot, M., Marecki, J., and
Graepel, T. (2017). Multi-agent reinforcement learn-
ing in sequential social dilemmas.
Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., and Mor-
datch, I. (2017). Multi-agent actor-critic for mixed
cooperative-competitive environments.
Ostrovski, G., Bellemare, M. G., Oord, A. v. d., and Munos,
R. (2017). Count-based exploration with neural den-
sity models.
Oudeyer, P.-Y., Kaplan, F., and Hafner, V. V. (2007). Intrin-
sic motivation systems for autonomous mental devel-
opment. IEEE Transactions on Evolutionary Compu-
tation, 11(2):265–286.
Pathak, D., Agrawal, P., Efros, A. A., and Darrell, T. (2017).
Curiosity-driven exploration by self-supervised pre-
diction.
Peng, P., Wen, Y., Yang, Y., Yuan, Q., Tang, Z., Long,
H., and Wang, J. (2017). Multiagent bidirectionally-
coordinated nets: Emergence of human-level coordi-
nation in learning to play starcraft combat games.
Rashid, T., Samvelyan, M., de Witt, C. S., Farquhar, G.,
Foerster, J., and Whiteson, S. (2018). Qmix: Mono-
tonic value function factorisation for deep multi-agent
reinforcement learning.
Schmid, K., Belzner, L., and Linnhoff-Popien, C. (2021).
Learning to penalize other learning agents. In Pro-
ceedings of the Artificial Life Conference 2021, vol-
ume 2021. MIT Press.
Schmidhuber, J. (1991). A possibility for implementing
curiosity and boredom in model-building neural con-
trollers. In Proceedings of the First International Con-
ference on Simulation of Adaptive Behavior on From
Animals to Animats, page 222–227, Cambridge, MA,
USA. MIT Press.
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and
Klimov, O. (2017). Proximal policy optimization al-
gorithms.
Stadie, B. C., Levine, S., and Abbeel, P. (2015). Incentiviz-
ing exploration in reinforcement learning with deep
predictive models.
Sukhbaatar, S., Szlam, A., and Fergus, R. (2016). Learning
multiagent communication with backpropagation.
Tang, H., Houthooft, R., Foote, D., Stooke, A., Chen, X.,
Duan, Y., Schulman, J., De Turck, F., and Abbeel, P.
(2016). #exploration: A study of count-based explo-
ration for deep reinforcement learning.
Wang, T., Wang, J., Wu, Y., and Zhang, C. (2019).
Influence-based multi-agent exploration.
Yang, J., Li, A., Farajtabar, M., Sunehag, P., Hughes, E.,
and Zha, H. (2020). Learning to incentivize other
learning agents.
ICAART 2025 - 17th International Conference on Agents and Artificial Intelligence
578