
8 CONCLUSION
Behavior Cloning (BC) is a crucial method within Im-
itation Learning, enabling agents to be trained safely
using a dataset of pre-collected state-action pairs pro-
vided by an expert. However, when applied in an en-
semble framework, BC can suffer from the issue of in-
creasing action differences, particularly in states that
are underrepresented in the training data D = (s
t
,a
t
)
t
.
These large mean action differences among the en-
semble policies can lead to suboptimal aggregated ac-
tions, which degrade the overall performance of the
agent.
In this paper, we proposed Swarm Behavior
Cloning (Swarm BC) to address this challenge. By
fostering greater alignment among the policies while
preserving the diversity of their computations, our ap-
proach encourages the ensemble to learn more similar
hidden feature representations. This adjustment effec-
tively reduces action prediction divergence, allowing
the ensemble to retain its inherent strengths—such as
robustness and varied decision-making—while pro-
ducing more consistent and reliable actions.
We evaluated Swarm BC across eight diverse
OpenAI Gym environments, demonstrating that it ef-
fectively reduces mean action differences and signif-
icantly improves the agent’s test performance, mea-
sured by episode returns.
Finally, we provided a theoretical analysis show-
ing that our method approximates the hidden fea-
ture activations with the highest probability den-
sity, effectively learning the global mode h
∗
k
=
argmax
h
k
; p(h
k
;|; D) based on the training data D. This
theoretical insight further supports the practical per-
formance gains observed in our experiments.
REFERENCES
Arora, S. and Doshi, P. (2021). A survey of inverse
reinforcement learning: Challenges, methods and
progress. Artificial Intelligence, 297:103500.
Bain, M. and Sammut, C. (1995). A framework for be-
havioural cloning. In Machine Intelligence 15, pages
103–129.
Bojarski, M., Del Testa, D., Dworakowski, D., Firner,
B., Flepp, B., Goyal, P., Jackel, L. D., Monfort,
M., Muller, U., Zhang, J., et al. (2016). End to
end learning for self-driving cars. arXiv preprint
arXiv:1604.07316.
Brantley, K., Sun, W., and Henaff, M. (2019).
Disagreement-regularized imitation learning. In
International Conference on Learning Representa-
tions.
Brockman, G., Cheung, V., Pettersson, L., Schneider, J.,
Schulman, J., Tang, J., and Zaremba, W. (2016). Ope-
nai gym. arXiv preprint arXiv:1606.01540.
Codevilla, F., Santana, E., L
´
opez, A. M., and Gaidon,
A. (2019). Exploring the limitations of behavior
cloning for autonomous driving. In Proceedings of
the IEEE/CVF International Conference on Computer
Vision, pages 9329–9338.
Dietterich, T. G. et al. (2002). Ensemble learning.
The handbook of brain theory and neural networks,
2(1):110–125.
Dong, X., Yu, Z., Cao, W., Shi, Y., and Ma, Q. (2020). A
survey on ensemble learning. Frontiers of Computer
Science, 14:241–258.
Eschmann, J. (2021). Reward function design in reinforce-
ment learning. Reinforcement Learning Algorithms:
Analysis and Applications, pages 25–33.
Finn, C., Levine, S., and Abbeel, P. (2016). Guided cost
learning: Deep inverse optimal control via policy op-
timization. In International conference on machine
learning, pages 49–58. PMLR.
Florence, P., Lynch, C., Zeng, A., Ramirez, O. A., Wahid,
A., Downs, L., Wong, A., Lee, J., Mordatch, I.,
and Tompson, J. (2022). Implicit behavioral cloning.
In Conference on Robot Learning, pages 158–168.
PMLR.
Giusti, A., Guzzi, J., Cires¸an, D. C., He, F.-L., Rodr
´
ıguez,
J. P., Fontana, F., Faessler, M., Forster, C., Schmidhu-
ber, J., Di Caro, G., et al. (2015). A machine learning
approach to visual perception of forest trails for mo-
bile robots. IEEE Robotics and Automation Letters,
1(2):661–667.
Ho, J. and Ermon, S. (2016). Generative adversarial imi-
tation learning. Advances in neural information pro-
cessing systems, 29.
Hussein, M., Crowe, B., Petrik, M., and Begum, M. (2021).
Robust maximum entropy behavior cloning. arXiv
preprint arXiv:2101.01251.
Knox, W. B., Allievi, A., Banzhaf, H., Schmitt, F., and
Stone, P. (2023). Reward (mis) design for autonomous
driving. Artificial Intelligence, 316:103829.
Ng, A. Y., Russell, S., et al. (2000). Algorithms for inverse
reinforcement learning. In Icml, volume 1, page 2.
N
¨
ußlein, J., Illium, S., M
¨
uller, R., Gabor, T., and Linnhoff-
Popien, C. (2022). Case-based inverse reinforcement
learning using temporal coherence. In International
Conference on Case-Based Reasoning, pages 304–
317. Springer.
Phan, T., Ritz, F., Altmann, P., Zorn, M., N
¨
ußlein, J.,
K
¨
olle, M., Gabor, T., and Linnhoff-Popien, C. (2023).
Attention-based recurrence for multi-agent reinforce-
ment learning under stochastic partial observability.
In International Conference on Machine Learning,
pages 27840–27853. PMLR.
Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus,
M., and Dormann, N. (2021). Stable-baselines3: Reli-
able reinforcement learning implementations. Journal
of Machine Learning Research, 22(268):1–8.
Swarm Behavior Cloning
31