tal training time. The additional overhead involves
DNN layer computation for the encoding and extrac-
tion of an agent’s belief. This additional communica-
tion overhead justifies the need for fine-grain acceler-
ation of the various added kernels.
5 DISCUSSION & CONCLUSION
In this work, we provided an extensive analysis of var-
ious MARL algorithms based on a taxonomy. This is
also the first work in the field of MARL that character-
izes key algorithms from a parallelization and acceler-
ation perspective. We proposed the need for latency-
bounded throughput to be considered a key optimiza-
tion metric in future literature. Based on our observa-
tion, the need for communication brings a non-trivial
overhead that needs fine-grained optimization and ac-
celeration depending on the category of the algorithm
described in our taxonomy. There is a plethora of fu-
ture work that can be conducted on MARL in terms
of acceleration:
• Specialized accelerator design for reducing com-
munication overheads: Specialized acceleration
platforms such as Field Programmable Gate Ar-
rays (FPGA) offer pipeline parallelism along with
large distributed on-chip memory that features
single-cycle data access. To take full advantage
of low-latency memory, specialized data layout
and partition for the communicated message pool
need to be exploited.
• Fine-grained Task Mapping using heterogeneous
platforms: We have seen the success of bringing
single-agent RL algorithm to heterogeneous plat-
forms composed of CPU, GPU and FPGA (Meng
et al., 2021; Zhang et al., 2023) and plan to extend
this to MARL.
REFERENCES
Cho, H., Oh, P., Park, J., Jung, W., and Lee, J. (2019). Fa3c:
Fpga-accelerated deep reinforcement learning. In Pro-
ceedings of the Twenty-Fourth International Confer-
ence on Architectural Support for Programming Lan-
guages and Operating Systems, pages 499–513.
Choi, H.-B., Kim, J.-B., Han, Y.-H., Oh, S.-W., and Kim,
K. (2022). Marl-based cooperative multi-agv con-
trol in warehouse systems. IEEE Access, 10:100478–
100488.
Chu, T., Chinchali, S., and Katti, S. (2020). Multi-agent
reinforcement learning for networked system control.
arXiv preprint arXiv:2004.01339.
Ivan, M.-C. and Ivan, G. (2020). Methods of exercising
the surveillance of criminal prosecution. Rev. Stiinte
Juridice, page 160.
Jiang, J., Dun, C., Huang, T., and Lu, Z. (2018). Graph
convolutional reinforcement learning. arXiv preprint
arXiv:1810.09202.
Kaelbling, L. P., Littman, M. L., and Moore, A. W.
(1996). Reinforcement learning: A survey. CoRR,
cs.AI/9605103.
Kim, W., Park, J., and Sung, Y. (2021). Communication in
multi-agent reinforcement learning: Intention sharing.
In International Conference on Learning Representa-
tions.
Liang, E., Liaw, R., Nishihara, R., Moritz, P., Fox, R.,
Goldberg, K., Gonzalez, J., Jordan, M., and Stoica, I.
(2018). Rllib: Abstractions for distributed reinforce-
ment learning. In International Conference on Ma-
chine Learning, pages 3053–3062. PMLR.
Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T.,
Tassa, Y., Silver, D., and Wierstra, D. (2015). Contin-
uous control with deep reinforcement learning.
Lowe, R., Wu, Y. I., Tamar, A., Harb, J., Pieter Abbeel,
O., and Mordatch, I. (2017). Multi-agent actor-critic
for mixed cooperative-competitive environments. Ad-
vances in neural information processing systems, 30.
Meng, Y., Kuppannagari, S., Kannan, R., and Prasanna,
V. (2021). Ppoaccel: A high-throughput acceler-
ation framework for proximal policy optimization.
IEEE Transactions on Parallel and Distributed Sys-
tems, 33(9):2066–2078.
Meng, Y., Yang, Y., Kuppannagari, S., Kannan, R., and
Prasanna, V. (2020). How to efficiently train your ai
agent? characterizing and evaluating deep reinforce-
ment learning on heterogeneous platforms. In 2020
IEEE High Performance Extreme Computing Confer-
ence (HPEC), pages 1–7. IEEE.
Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T.,
Harley, T., Silver, D., and Kavukcuoglu, K. (2016).
Asynchronous methods for deep reinforcement learn-
ing. In International conference on machine learning,
pages 1928–1937. PMLR.
Rocki, K. and Suda, R. (2011). Parallel monte carlo tree
search on gpu. In Eleventh Scandinavian Conference
on Artificial Intelligence, pages 80–89. IOS Press.
Shapley, L. S. (1953). Stochastic games. Proceedings of the
national academy of sciences, 39(10):1095–1100.
Wang, X., Zhang, Z., and Zhang, W. (2022). Model-based
multi-agent reinforcement learning: Recent progress
and prospects. arXiv preprint arXiv:2203.10603.
Wang, Y., Zhong, F., Xu, J., and Wang, Y. (2021).
Tom2c: Target-oriented multi-agent communication
and cooperation with theory of mind. arXiv preprint
arXiv:2111.09189.
Zhang, C., Kuppannagari, S. R., and Prasanna, V. K. (2021).
Parallel actors and learners: A framework for gen-
erating scalable rl implementations. In 2021 IEEE
28th International Conference on High Performance
Computing, Data, and Analytics (HiPC), pages 1–10.
IEEE.
Zhang, C., Meng, Y., and Prasanna, V. (2023). A framework
for mapping drl algorithms with prioritized replay
buffer onto heterogeneous platforms. IEEE Transac-
tions on Parallel and Distributed Systems.
DATA 2023 - 12th International Conference on Data Science, Technology and Applications
334