the survival of non-swarming individuals, thus urging
them to join the swarm even when it is suboptimal
to all individuals’ survival. Note that this affects ra-
tional agents, i.e., swarm participants that act locally
optimal at every single one of their decisions.
Interestingly, the phenomenon of pressure aris-
ing from lack of communication and control struc-
tures has been observed in natural evolution as
well (Dawkins, 1976). Thus, swarms can (un-
der certain conditions) also be interpreted as self-
perpetuating, which means that they should be han-
dled with additional care when employing them
in practical applications. Self-perpetuating swarms
might introduce additional targets for emergent be-
havior that affect the system designer’s intended pur-
pose. It is up to future research to examine the inter-
play between using such emergent behavior and con-
trolling it to employ useful swarm applications.
REFERENCES
Bellman, R. (1957). Dynamic Programming. Princeton
University Press, Princeton, NJ, USA, 1 edition.
Brambilla, M., Ferrante, E., Birattari, M., and Dorigo, M.
(2013). Swarm robotics: a review from the swarm
engineering perspective. Swarm Intelligence, 7(1):1–
41.
Christensen, A. L., Oliveira, S., Postolache, O., De Oliveira,
M. J., Sargento, S., Santana, P., Nunes, L., Velez, F. J.,
Sebasti
˜
ao, P., Costa, V., et al. (2015). Design of com-
munication and control for swarms of aquatic surface
drones. In ICAART (2), pages 548–555.
Dawkins, R. (1976). The Selfish Gene. Oxford University
Press, Oxford, UK.
de Cote, E. M., Lazaric, A., and Restelli, M. (2006). Learn-
ing to cooperate in multi-agent social dilemmas. In
Proceedings of the Fifth International Joint Confer-
ence on Autonomous Agents and Multiagent Systems,
AAMAS ’06, pages 783–785, New York, NY, USA.
ACM.
Egorov, M. (2016). Multi-agent deep reinforcement learn-
ing. CS231n: Convolutional Neural Networks for Vi-
sual Recognition.
Foerster, J., Assael, I. A., de Freitas, N., and Whiteson, S.
(2016). Learning to communicate with deep multi-
agent reinforcement learning. In Advances in Neural
Information Processing Systems, pages 2137–2145.
Foerster, J. N., Farquhar, G., Afouras, T., Nardelli, N., and
Whiteson, S. (2018). Counterfactual multi-agent pol-
icy gradients. In Thirty-Second AAAI Conference on
Artificial Intelligence.
Hahn, C., Phan, T., Gabor, T., Belzner, L., and Linnhoff-
Popien, C. (2019). Emergent escape-based flock-
ing behavior using multi-agent reinforcement learn-
ing. The 2019 Conference on Artificial Life, (31):598–
605.
Hausknecht, M. and Stone, P. (2015). Deep recurrent q-
learning for partially observable mdps. In 2015 AAAI
Fall Symposium Series.
Howard, R. A. (1961). Dynamic Programming and Markov
Processes. The MIT Press.
Leibo, J. Z., Zambaldi, V., Lanctot, M., Marecki, J., and
Graepel, T. (2017). Multi-agent reinforcement learn-
ing in sequential social dilemmas. In Proceedings of
the 16th Conference on Autonomous Agents and Mul-
tiAgent Systems, pages 464–473. International Foun-
dation for Autonomous Agents and Multiagent Sys-
tems.
Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, O. P.,
and Mordatch, I. (2017). Multi-agent actor-critic
for mixed cooperative-competitive environments. In
Advances in Neural Information Processing Systems,
pages 6379–6390.
McKelvey, R. D., McLennan, A. M., and Turocy, T. L.
(2016). Gambit: Software tools for game theory.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness,
J., Bellemare, M. G., Graves, A., Riedmiller, M., Fid-
jeland, A. K., Ostrovski, G., et al. (2015). Human-
level control through deep reinforcement learning.
Nature, 518(7540):529–533.
Morihiro, K., Nishimura, H., Isokawa, T., and Matsui,
N. (2008). Learning grouping and anti-predator be-
haviors for multi-agent systems. In Int’l Conf. on
Knowledge-Based and Intelligent Information and
Engineering Systems. Springer.
¨
Ozg
¨
uler, A. B. and Yıldız, A. (2013). Foraging swarms as
nash equilibria of dynamic games. IEEE transactions
on cybernetics, 44(6):979–987.
Phillips, S. J., Anderson, R. P., and Schapire, R. E. (2006).
Maximum entropy modeling of species geographic
distributions. Ecological modelling, 190(3-4).
Pinciroli, C. and Beltrame, G. (2016). Swarm-oriented pro-
gramming of distributed robot networks. Computer,
49(12):32–41.
Plappert, M. (2016). keras-rl. https://github.com/keras-
rl/keras-rl.
Puterman, M. L. (2014). Markov decision processes: dis-
crete stochastic dynamic programming. John Wiley &
Sons.
Rashid, T., Samvelyan, M., Witt, C. S., Farquhar, G., Foer-
ster, J., and Whiteson, S. (2018). Qmix: Monotonic
value function factorisation for deep multi-agent rein-
forcement learning. In International Conference on
Machine Learning, pages 4292–4301.
Reynolds, C. W. (1987). Flocks, herds and schools: A dis-
tributed behavioral model. In ACM SIGGRAPH com-
puter graphics, volume 21. ACM.
Tan, M. (1993). Multi-agent reinforcement learning: in-
dependent versus cooperative agents. In Proceed-
ings of the Tenth International Conference on Interna-
tional Conference on Machine Learning, pages 330–
337. Morgan Kaufmann Publishers Inc.
Watkins, C. J. C. H. (1989). Learning from Delayed Re-
wards. PhD thesis, King’s College, Cambridge, UK.
Nash Equilibria in Multi-Agent Swarms
241