multi-environment setting, randomly selecting envi-
ronments for agent interaction. By averaging rewards
across multiple environments, our approach effec-
tively mitigates the impact of poisoning and enhances
agent robustness. Our method ensures the preserva-
tion of true reward performance while providing prov-
able guarantees for defense policy effectiveness, en-
suring safety and reliability in critical applications.
This contribution represents a significant advance-
ment in the development of robust and secure deep
reinforcement learning systems for real-world scenar-
ios. Future goals include conducting experiments to
compare our approach with existing defenses, vali-
dating its effectiveness and practicality, and leverag-
ing the insights gained to further enhance our defense
Multi-Environment Training Against Reward Poisoning Attacks on Deep Reinforcement Learning