(a) Learns non-collision
(b) Learns a slightly
shorter trajectories
(c) Learns short collision
(d) Learns shorter non-
collision trajectories
Figure 9: Process of improving the archive trajectories
(Archive MA-AIRL).
proves the expert behaviors according to both the in-
dividual and cooperative behaviors to obtain the bet-
ter behaviors of the agents than those of experts. For
this purpose, the discriminator in Archive MA-AIRL
evaluates whether the behaviors generated by the gen-
erator are close to the behaviors of experts improved
from both individual and collective trajectories. To in-
vestigate the effectiveness of Archive MA-AIRL, this
paper applied it into the continuous maze problem and
the following implications have been revealed: (1)
The trajectories that can avoid the collision among
the agents can be acquired from the suboptimal expert
trajectories that may collide with the other agents (2)
Archive MA-AIRL outperforms MA-GAIL and MA-
AIRL as the conventional methods in addition to the
experts from the viewpoint of the number of collisions
of agents and expected return.
What should be noticed here is that these results
have only been obtained from the simple testbeds, i.e.,
the maze problem, therefore further careful qualifica-
tions and justifications, such as complex maze prob-
lems, are needed to generalized the obtained impli-
cations. Such important directions must be pursued
in the near future in addition to (1) an exploration of
the proper evaluation of trajectories because. the in-
correct evaluation of trajectories might deteriorate the
archived trajectories and (2) an increase of the number
of agents.
Multi-Agent Archive-Based Inverse Reinforcement Learning by Improving Suboptimal Experts