a whole, leading to worst overall outcomes. In addi-
tion, agents with such undesirable behaviours would
not be qualified to be placed together with a new team
of other trained agents. For instance, if we get to train
a couple of agents to solve a certain task and want to
transfer them to a different team to help other agents,
they would not be qualified to do so.
5 CONCLUSION AND FUTURE
WORK
This paper introduced Independent Causal Learning
(ICL), a method for learning fully independent be-
haviours in cooperative MARL tasks that bridges the
concepts of causality and MARL. When there is some
prior knowledge of the environment, a causal rela-
tionship between the individual observations and the
team reward becomes perceptible. This allows the
proposed method to improve learning in a fully de-
centralised and fully independent manner. The re-
sults showed that providing an environment depen-
dent causality estimation allows agents to perform ef-
ficiently in a fully independent manner and achieve a
certain goal as a team. In addition, we showed how
using causality in MARL can improve individual be-
haviours, eliminating lazy agents that are present in
normal independent learners and enabling more in-
telligent behaviours, leading to better overall perfor-
mances in the tasks. These preliminary results are in-
spiring as they enlighten the potential of the link be-
tween causality and MARL.
In the future, we aim to study how causality esti-
mations can be used to also improve centralised learn-
ing. In addition, although the recognition of patterns
that identify causality relations has shown to be chal-
lenging in machine learning methods, we intend to ex-
tend this method to more cases and show that causal-
ity discovery can be generalised to MARL problems.
Furthermore, we intend to study how ICL can be ap-
plied in real scenarios that require online learning and
prohibit excessive trial-and-error episodes due to po-
tentially catastrophic events caused by the learning
agents. We believe that this can be a breakthrough for
online learning in real scenarios where reliable com-
munication or a centralised oracle is not available,
and agents must learn to coordinate independently. At
last, we expect that this link can bring more relevant
research questions to the field of MARL.
ACKNOWLEDGEMENTS
This work was funded by the Engineering and Physi-
cal Sciences Research Council in the United Kingdom
(EPSRC), under the grant number EP/T000783/1.
REFERENCES
Aziz, N. A. (2017). Transfer entropy as a tool for inferring
causality from observational studies in epidemiology.
preprint, Epidemiology.
Barnett, L., Barrett, A. B., and Seth, A. K. (2009). Granger
causality and transfer entropy are equivalent for gaus-
sian variables. Phys. Rev. Lett., 103:238701.
Canese, L., Cardarilli, G. C., Di Nunzio, L., Fazzolari,
R., Giardino, D., Re, M., and Span
`
o, S. (2021).
Multi-Agent Reinforcement Learning: A Review
of Challenges and Applications. Applied Sciences,
11(11):4948.
Glymour, C., Zhang, K., and Spirtes, P. (2019). Review of
causal discovery methods based on graphical models.
Frontiers in Genetics, 10.
Granger, C. W. J. (1969). Investigating causal relations
by econometric models and cross-spectral methods.
Econometrica, 37(3):424–438.
Gupta, J. K., Egorov, M., and Kochenderfer, M. (2017). Co-
operative Multi-agent Control Using Deep Reinforce-
ment Learning. In Sukthankar, G. and Rodriguez-
Aguilar, J. A., editors, Autonomous Agents and Multi-
agent Systems, volume 10642, pages 66–83. Springer
International Publishing, Cham. Series Title: Lecture
Notes in Computer Science.
Hlav
´
a
ˇ
ckov
´
a-Schindler, K., Palu
ˇ
s, M., Vejmelka, M., and
Bhattacharya, J. (2007). Causality detection based on
information-theoretic approaches in time series analy-
sis. Physics Reports, 441(1):1–46.
Huang, X., Zhu, F., Holloway, L., and Haidar, A. (2020).
Causal discovery from incomplete data using an en-
coder and reinforcement learning.
Kipf, T., Fetaya, E., Wang, K.-C., Welling, M., and Zemel,
R. (2018). Neural Relational Inference for Interact-
ing Systems. arXiv:1802.04687 [cs, stat]. arXiv:
1802.04687.
Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, O. P.,
and Mordatch, I. (2017). Multi-Agent Actor-Critic
for Mixed Cooperative-Competitive Environments. In
Proceedings of the 31st International Conference on
Neural Information Processing Systems, pages 6382–
6393.
L
¨
owe, S., Madras, D., Zemel, R., and Welling, M.
(2022). Amortized Causal Discovery: Learning
to Infer Causal Graphs from Time-Series Data.
arXiv:2006.10833 [cs, stat]. arXiv: 2006.10833.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Ve-
ness, J., Bellemare, M. G., Graves, A., Riedmiller,
M., Fidjeland, A. K., Ostrovski, G., Petersen, S.,
Beattie, C., Antonoglou, I., King, H., Kumaran, D.,
Wierstra, D., Legg, S., Hassabis, D., and Sadik, A.
ICPRAM 2023 - 12th International Conference on Pattern Recognition Applications and Methods
486