Decentralized Multi-agent Formation Control via Deep
Reinforcement Learning
Aniket Gutpa
1,*
and Raghava Nallanthighal
2,†
1
Department of Electrical Engineering, Delhi Technological University, New Delhi, India
2
Department of Electronics and Communication Engineering, Delhi Technological University, New Delhi, India
Keywords: Multi-agent Systems, Swarm Robotics, Formation Control, Policy Gradient Methods.
Abstract: Multi-agent formation control has been a much-researched topic and while several methods from control
theory exist, they require astute expertise to tune properly which is highly resource-intensive and often fails
to adapt properly to slight changes in the environment. This paper presents an end-to-end decentralized
approach towards multi-agent formation control with the information available from onboard sensors by using
a Deep Reinforcement learning framework. The proposed method directly utilizes the raw sensor readings to
calculate the agent’s movement velocity using a Deep Neural Network. The approach utilizes Policy gradient
methods to generalize efficiently on various simulation scenarios and is trained over a large number of agents.
We validate the performance of the learned policy using numerous simulated scenarios and a comprehensive
evaluation. Finally, the performance of the learned policy is demonstrated in new scenarios with non-
cooperative agents that were not introduced during the training process.
1 INTRODUCTION
Multi-agent systems are rapidly gaining momentum
due to their several real-world applications in disaster
relief scenarios, rescue operations, military
operations, warehouse management, agriculture and
many more. All these tasks require the teams of
robots to cooperate autonomously to produce the
desired results as displayed by many animals and
insects in nature which emulate swarming behaviour.
One of the major challenges for multi-agent
systems is autonomous navigation while adhering to
the three rules of Reynolds (Reynolds and Craig,
1987). Modern control theory presents numerous
solutions, supported with rigorous proofs, to this
problem and demonstrates the feasibility of multi-
agent formation control and obstacle avoidance.
(Marko and Stiepan, 2012), (Anuj et al., 2020),
(Egerestedt, 2007) present an artificial potential field
approach and have demonstrated stable autonomous
navigation of a swarm of unmanned aerial vehicles
(UAVs). While this approach does the work, it
neglects the non-linearities in the system dynamics
and thus fails to demonstrate optimal behaviour in
*
http://www.dtu.ac.in/Web/Departments/Electrical/about/
†
http://www.dtu.ac.in/Web/Departments/Electronics/about/
conjunction with the vehicle dynamics. Further, these
approaches require extensive tuning to provide
desired performance.
Further, in (Hung and Givigi, 2017), a Q-learning
based controller is presented for flocking of a swarm
of UAVs in unknown environments using a leader-
follower approach. This approach does not yield an
optimal solution as state space and action space is
discretized to limit the size of the Q-table. As the
number of states increases, computing the Q-table
becomes increasingly inefficient. Another
disadvantage of this approach is the single point of
failure offered by the leader agent.
In (Johns and Rasmus, 2018), Reinforcement
learning is utilized to improve the performance of a
behaviour-based control algorithm which serves as
both a baseline from which the RL algorithm
compares with and a base from which the RL
algorithm starts training. This approach produces
significant results but is not an end to end solution
which can be directly applied to any kind of system.
Appropriate tuning of the behaviour-based control
algorithm is still required for the efficient
performance of the complete controller.