
of knowing resource profiles in advance, which rarely
holds in practice, but tackling this is a necessity in
every practical approach.
In conclusion, our findings suggest that while
cluster scheduling remains a particularly difficult
problem, modern MARL methods are not only effec-
tive in managing resources, they are also a step to-
wards realizing fully autonomous and scalable cloud
systems. The ability of RL agents to learn and im-
prove over time, without human intervention or hand-
crafted features, aligns with the goal of developing
cloud systems that can self-manage and dynamically
adapt to changing conditions and requirements.
REFERENCES
Brockman, G., Cheung, V., Pettersson, L., Schneider, J.,
Schulman, J., Tang, J., and Zaremba, W. (2016). Ope-
nai gym. arxiv. arXiv preprint arXiv:1606.01540, 10.
Dauz
`
ere-P
´
er
`
es, S., Ding, J., Shen, L., and Tamssaouet, K.
(2023). The flexible job shop scheduling problem: A
review. European Journal of Operational Research.
Fan, Y., Li, B., Favorite, D., Singh, N., Childers, T., Rich, P.,
Allcock, W., Papka, M. E., and Lan, Z. (2022). Dras:
Deep reinforcement learning for cluster scheduling in
high performance computing. IEEE Transactions on
Parallel and Distributed Systems, 33(12):4903–4917.
Lanctot, M., Lockhart, E., Lespiau, J.-B., Zambaldi, V.,
Upadhyay, S., P
´
erolat, J., Srinivasan, S., Timbers, F.,
Tuyls, K., Omidshafiei, S., et al. (2019). Openspiel: A
framework for reinforcement learning in games. arXiv
preprint arXiv:1908.09453.
Liu, C.-L., Chang, C.-C., and Tseng, C.-J. (2020). Actor-
critic deep reinforcement learning for solving job shop
scheduling problems. Ieee Access, 8:71752–71762.
Mao, H., Alizadeh, M., Menache, I., and Kandula, S.
(2016). Resource management with deep reinforce-
ment learning. In Proceedings of the 15th ACM work-
shop on hot topics in networks, pages 50–56.
Mirhoseini, A., Goldie, A., Yazgan, M., Jiang, J. W.,
Songhori, E., Wang, S., Lee, Y.-J., Johnson, E.,
Pathak, O., Nazi, A., et al. (2021). A graph place-
ment methodology for fast chip design. Nature,
594(7862):207–212.
OpenAI, M. A., Baker, B., Chociej, M., J
´
ozefowicz, R.,
McGrew, B., Pachocki, J. W., Pachocki, J., Petron,
A., Plappert, M., Powell, G., et al. (2018). Learning
dexterous in-hand manipulation. corr abs/1808.00177
(2018). arXiv preprint arXiv:1808.00177.
Paduraru, C., Patilea, C. C., and Iordache, S. (2023). Task
scheduling: A reinforcement learning based approach.
In ICAART (3), pages 948–955.
Schulman, J., Moritz, P., Levine, S., Jordan, M., and
Abbeel, P. (2015). High-dimensional continuous con-
trol using generalized advantage estimation. arXiv
preprint arXiv:1506.02438.
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and
Klimov, O. (2017). Proximal policy optimization al-
gorithms. arXiv preprint arXiv:1707.06347.
Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I.,
Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M.,
Bolton, A., et al. (2017). Mastering the game of go
without human knowledge. nature, 550(7676):354–
359.
Sutton, R. S. and Barto, A. G. (2018). Reinforcement learn-
ing: An introduction. MIT press.
Sutton, R. S., McAllester, D., Singh, S., and Mansour, Y.
(1999). Policy gradient methods for reinforcement
learning with function approximation. Advances in
neural information processing systems, 12.
Tassel, P., Gebser, M., and Schekotihin, K. (2021). A rein-
forcement learning environment for job-shop schedul-
ing. arXiv preprint arXiv:2104.03760.
Terry, J., Black, B., Grammel, N., Jayakumar, M., Hari, A.,
Sullivan, R., Santos, L. S., Dieffendahl, C., Horsch,
C., Perez-Vicente, R., et al. (2021). Pettingzoo: Gym
for multi-agent reinforcement learning. Advances in
Neural Information Processing Systems, 34:15032–
15043.
Wang, M., Zhang, J., Zhang, P., Cui, L., and Zhang, G.
(2022). Independent double dqn-based multi-agent
reinforcement learning approach for online two-stage
hybrid flow shop scheduling with batch machines.
Journal of Manufacturing Systems, 65:694–708.
Ye, Y., Ren, X., Wang, J., Xu, L., Guo, W., Huang, W.,
and Tian, W. (2018). A new approach for resource
scheduling with deep reinforcement learning. arXiv
preprint arXiv:1806.08122.
Yu, C., Velu, A., Vinitsky, E., Gao, J., Wang, Y., Bayen, A.,
and Wu, Y. (2022). The surprising effectiveness of ppo
in cooperative multi-agent games. Advances in Neural
Information Processing Systems, 35:24611–24624.
Zhang, J.-D., He, Z., Chan, W.-H., and Chow, C.-Y.
(2023). Deepmag: Deep reinforcement learning with
multi-agent graphs for flexible job shop scheduling.
Knowledge-Based Systems, 259:110083.
Zhao, X. and Wu, C. (2021). Large-scale machine learn-
ing cluster scheduling via multi-agent graph reinforce-
ment learning. IEEE Transactions on Network and
Service Management.
Zhu, X., Xu, J., Ge, J., Wang, Y., and Xie, Z. (2023). Multi-
task multi-agent reinforcement learning for real-time
scheduling of a dual-resource flexible job shop with
robots. Processes, 11(1):267.
Multi-Agent Deep Reinforcement Learning for Collaborative Task Scheduling
1083