Behavior Cloning (BC) is a crucial method within Im-
itation Learning, enabling agents to be trained safely
using a dataset of pre-collected state-action pairs pro-
vided by an expert. However, when applied in an en-
semble framework, BC can suffer from the issue of in-
creasing action differences, particularly in states that
are underrepresented in the training data D = (s
These large mean action differences among the en-
semble policies can lead to suboptimal aggregated ac-
tions, which degrade the overall performance of the
In this paper, we proposed Swarm Behavior
Cloning (Swarm BC) to address this challenge. By
fostering greater alignment among the policies while
preserving the diversity of their computations, our ap-
proach encourages the ensemble to learn more similar
hidden feature representations. This adjustment effec-
tively reduces action prediction divergence, allowing
the ensemble to retain its inherent strengths—such as
robustness and varied decision-making—while pro-
ducing more consistent and reliable actions.
We evaluated Swarm BC across eight diverse
OpenAI Gym environments, demonstrating that it ef-
fectively reduces mean action differences and signif-
icantly improves the agent’s test performance, mea-
sured by episode returns.
Finally, we provided a theoretical analysis show-
ing that our method approximates the hidden fea-
ture activations with the highest probability den-
sity, effectively learning the global mode h
; p(h
;|; D) based on the training data D. This
theoretical insight further supports the practical per-
formance gains observed in our experiments.
Swarm Behavior Cloning