tal training time. The additional overhead involves
DNN layer computation for the encoding and extrac-
tion of an agent’s belief. This additional communica-
tion overhead justifies the need for fine-grain acceler-
ation of the various added kernels.
In this work, we provided an extensive analysis of var-
ious MARL algorithms based on a taxonomy. This is
also the first work in the field of MARL that character-
izes key algorithms from a parallelization and acceler-
ation perspective. We proposed the need for latency-
bounded throughput to be considered a key optimiza-
tion metric in future literature. Based on our observa-
tion, the need for communication brings a non-trivial
overhead that needs fine-grained optimization and ac-
celeration depending on the category of the algorithm
described in our taxonomy. There is a plethora of fu-
ture work that can be conducted on MARL in terms
of acceleration:
• Specialized accelerator design for reducing com-
munication overheads: Specialized acceleration
platforms such as Field Programmable Gate Ar-
rays (FPGA) offer pipeline parallelism along with
large distributed on-chip memory that features
single-cycle data access. To take full advantage
of low-latency memory, specialized data layout
and partition for the communicated message pool
need to be exploited.
• Fine-grained Task Mapping using heterogeneous
platforms: We have seen the success of bringing
single-agent RL algorithm to heterogeneous plat-
forms composed of CPU, GPU and FPGA (Meng
et al., 2021; Zhang et al., 2023) and plan to extend
this to MARL.
