
vances in Neural Information Processing Systems 34,
pages 11108–11122.
Fujii, K., Takeuchi, K., Kuribayashi, A., Takeishi, N.,
Kawahara, Y., and Takeda, K. (2022). Estimat-
ing counterfactual treatment outcomes over time
in complex multi-agent scenarios. arXiv preprint
arXiv:2206.01900.
Gangwani, T., Zhou, Y., and Peng, J. (2022). Imitation
learning from observations under transition model dis-
parity. In International Conference on Learning Rep-
resentations.
Helbing, D. and Molnar, P. (1995). Social force model for
pedestrian dynamics. Physical Review E, 51(5):4282.
Hester, T., Vecerik, M., Pietquin, O., Lanctot, M., Schaul,
T., Piot, B., Horgan, D., Quan, J., Sendonaris, A., Os-
band, I., et al. (2018). Deep q-learning from demon-
strations. In Proceedings of the Thirty-Second AAAI
Conference on Artificial Intelligence and Thirtieth In-
novative Applications of Artificial Intelligence Con-
ference, pages 3223–3230.
Ho, J. and Ermon, S. (2016). Generative adversarial imi-
tation learning. In Proceedings of the 30th Interna-
tional Conference on Neural Information Processing
Systems, pages 4572–4580.
Hu, Y., Li, J., Li, X., Pan, G., and Xu, M. (2018).
Knowledge-guided agent-tactic-aware learning for
starcraft micromanagement. In Proceedings of the
27th International Joint Conference on Artificial In-
telligence, pages 1471–1477.
Hua, J., Zeng, L., Li, G., and Ju, Z. (2021). Learning for
a robot: Deep reinforcement learning, imitation learn-
ing, transfer learning. Sensors, 21(4):1278.
Hussein, A., Elyan, E., and Jayne, C. (2018). Deep imita-
tion learning with memory for robocup soccer simula-
tion. In International Conference on Engineering Ap-
plications of Neural Networks, pages 31–43. Springer.
Ishiwaka, Y., Zeng, X. S., Ogawa, S., Westwater, D. M.,
Tone, T., and Nakada, M. (2022). Deepfoids: Adap-
tive bio-inspired fish simulation with deep reinforce-
ment learning. Advances in Neural Information Pro-
cessing Systems, 35.
Kitano, H., Asada, M., Kuniyoshi, Y., Noda, I., and Osawa,
E. (1997). Robocup: The robot world cup initiative.
In Proceedings of the First International Conference
on Autonomous Agents, pages 340–347.
Kolter, J., Abbeel, P., and Ng, A. (2007). Hierarchical ap-
prenticeship learning with application to quadruped
locomotion. Advances in Neural Information Process-
ing Systems, 20.
Kraemer, L. and Banerjee, B. (2016). Multi-agent reinforce-
ment learning as a rehearsal for decentralized plan-
ning. Neurocomputing, 190:82–94.
Kurach, K., Raichuk, A., Sta
´
nczyk, P., Zaj ˛ac, M., Bachem,
O., Espeholt, L., Riquelme, C., Vincent, D., Michal-
ski, M., Bousquet, O., et al. (2020). Google re-
search football: A novel reinforcement learning envi-
ronment. In Proceedings of the AAAI Conference on
Artificial Intelligence, volume 34, pages 4501–4510.
Lakshminarayanan, A. S., Ozair, S., and Bengio, Y. (2016).
Reinforcement learning with few expert demonstra-
tions. In NIPS Workshop on Deep Learning for Action
and Interaction.
Le, H. M., Yue, Y., Carr, P., and Lucey, P. (2017). Coor-
dinated multi-agent imitation learning. In Proceed-
ings of the 34th International Conference on Machine
Learning-Volume 70, pages 1995–2003. JMLR. org.
Lee, H.-R. and Lee, T. (2019). Improved cooperative multi-
agent reinforcement learning algorithm augmented by
mixing demonstrations from centralized policy. In
Proceedings of the 18th International Conference on
Autonomous Agents and MultiAgent Systems, pages
1089–1098.
Li, C., Wang, T., Wu, C., Zhao, Q., Yang, J., and Zhang,
C. (2021). Celebrating diversity in shared multi-agent
reinforcement learning. Advances in Neural Informa-
tion Processing Systems, 34:3991–4002.
Liu, G., Luo, Y., Schulte, O., and Kharrat, T. (2020). Deep
soccer analytics: learning an action-value function for
evaluating soccer players. Data Mining and Knowl-
edge Discovery, 34(5):1531–1559.
Liu, G. and Schulte, O. (2018). Deep reinforcement learn-
ing in ice hockey for context-aware player evaluation.
arXiv preprint arXiv:1805.11088.
Liu, I.-J., Ren, Z., Yeh, R. A., and Schwing, A. G. (2021).
Semantic tracklets: An object-centric representation
for visual multi-agent reinforcement learning. In
2021 IEEE/RSJ International Conference on Intelli-
gent Robots and Systems (IROS), pages 5603–5610.
IEEE.
Lowe, R., Wu, Y. I., Tamar, A., Harb, J., Pieter Abbeel,
O., and Mordatch, I. (2017). Multi-agent actor-critic
for mixed cooperative-competitive environments. Ad-
vances in Neural Information Processing Systems,
30:6382–6393.
Luo, Y., Schulte, O., and Poupart, P. (2020). Inverse re-
inforcement learning for team sports: Valuing actions
and players. In Bessiere, C., editor, Proceedings of the
Twenty-Ninth International Joint Conference on Arti-
ficial Intelligence, IJCAI-20, pages 3356–3363. Inter-
national Joint Conferences on Artificial Intelligence
Organization.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness,
J., Bellemare, M. G., Graves, A., Riedmiller, M., Fid-
jeland, A. K., Ostrovski, G., et al. (2015). Human-
level control through deep reinforcement learning.
Nature, 518(7540):529–533.
Myers, C., Rabiner, L., and Rosenberg, A. (1980). Perfor-
mance tradeoffs in dynamic time warping algorithms
for isolated word recognition. IEEE Transactions on
Acoustics, Speech, and Signal Processing, 28(6):623–
635.
Nakahara, H., Tsutsui, K., Takeda, K., and Fujii, K. (2023).
Action valuation of on-and off-ball soccer players
based on multi-agent deep reinforcement learning.
IEEE Access, 11:131237–131244.
Nguyen, Q. D. and Prokopenko, M. (2020). Structure-
preserving imitation learning with delayed reward: An
evaluation within the robocup soccer 2d simulation
environment. Frontiers in Robotics and AI, 7:123.
Adaptive Action Supervision in Reinforcement Learning from Real-World Multi-Agent Demonstrations
37