(a) Learns non-collision
trajectories
(b) Learns a slightly
shorter trajectories
(c) Learns short collision
trajectories
(d) Learns shorter non-
collision trajectories
Figure 9: Process of improving the archive trajectories
(Archive MA-AIRL).
proves the expert behaviors according to both the in-
dividual and cooperative behaviors to obtain the bet-
ter behaviors of the agents than those of experts. For
this purpose, the discriminator in Archive MA-AIRL
evaluates whether the behaviors generated by the gen-
erator are close to the behaviors of experts improved
from both individual and collective trajectories. To in-
vestigate the effectiveness of Archive MA-AIRL, this
paper applied it into the continuous maze problem and
the following implications have been revealed: (1)
The trajectories that can avoid the collision among
the agents can be acquired from the suboptimal expert
trajectories that may collide with the other agents (2)
Archive MA-AIRL outperforms MA-GAIL and MA-
AIRL as the conventional methods in addition to the
experts from the viewpoint of the number of collisions
of agents and expected return.
What should be noticed here is that these results
have only been obtained from the simple testbeds, i.e.,
the maze problem, therefore further careful qualifica-
tions and justifications, such as complex maze prob-
lems, are needed to generalized the obtained impli-
cations. Such important directions must be pursued
in the near future in addition to (1) an exploration of
the proper evaluation of trajectories because. the in-
correct evaluation of trajectories might deteriorate the
archived trajectories and (2) an increase of the number
of agents.
REFERENCES
Finn, C., Levine, S., and Abbeel, P. (2016). Guided cost
learning: Deep inverse optimal control via policy op-
timization. In the 33rd International Conference on
Machine Learning, volume 48, pages 49–58.
Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B.,
Warde-Farley, D., Ozair, S., Courville, A., and Ben-
gio, Y. (2014). Generative adversarial nets. In Ad-
vances in Neural Information Processing Systems,
pages 2672–2680.
Ng, A. Y. and Russell, S. (2000). Algorithms for inverse re-
inforcement learning. In the 17th International Con-
ference on Machine Learning, pages 663–670.
Ramachandran, D. and Amir, E. (2007). Bayesian in-
verse reinforcement learning. In the 20th international
joint conference on Artifical intelligence, pages 2586–
2591.
Russell, S. (1998). Learning agents for uncertain environ-
ments. In the eleventh annual conference on Compu-
tational learning theory, pages 101–F103.
Sutton, R. S. and Barto, A. G. (1998). Reinforcement learn-
ing: An introduction. A Bradford Book.
Wang, X. and Klabjan, D. (2018). Competitive multi-
agent inverse reinforcement learning with sub-optimal
demonstrations. In the 35th International Conference
on Machine Learning, volume 80, pages 5143–5151.
Watkins, C. J. C. H. and Dayan, P. (1992). Q-learning. Ma-
chine Learning, 8:279–292.
Wu, Y., Mansimov, E., Liao, S., Grosse, R., and Ba, J.
(2017). Scalable trust-region method for deep rein-
forcement learning using kronecker-factored approx-
imation. In the 31st International Conference on
Neural Information Processing Systems, pages 5285–
5294.
Yu, L., Song, J., and Ermon, S. (2019). Multi-agent ad-
versarial inverse reinforcement learning. In the 36th
International Conference on Machine Learning, vol-
ume 97, pages 7194–7201.
Ziebart, B. D., Maas, A., Bagnell, J., and Dey, A. K. (2008).
Maximum entropy inverse reinforcement learning. In
the 23rd AAAI Conference on Artificial Intelligence,
pages 1433–1438.
Multi-Agent Archive-Based Inverse Reinforcement Learning by Improving Suboptimal Experts
1369