
9 CONCLUSIONS
This work designed agents with pre-trained LLMs to
solve cybersecurity environments. The LLM agents
solved two security environments without additional
training steps and without learning between episodes,
differing from traditional RL agents that require tens
of thousands of training episodes.
Pre-trained LLMs have limitations and costs, in-
cluding shortcomings in reproducing the results of
black-box commercial models. However, there is po-
tential in using LLMs for high-level planning of au-
tonomous cybersecurity agents. Future work will fo-
cus on more complex scenarios and environments.
NetSecGame is designed to be realistic while pro-
viding a high-level interaction API for agents. It im-
plements a modular configuration for topologies, a
goal definition, and a reward system without leaking
information to the agents. It also implements a de-
fender for the testing of agents in adversarial settings.
ACKNOWLEDGMENTS
The authors acknowledge support by the Strategic
Support for the Development of Security Research
in the Czech Republic 2019–2025 (IMPAKT 1) pro-
gram, by the Ministry of the Interior of the Czech
Republic under No. VJ02010020 – AI-Dojo: Multi-
agent testbed for the research and testing of AI-driven
cyber security technologies.
REFERENCES
Andrew, A., Spillard, S., Collyer, J., and Dhir, N. (2022).
Developing Optimal Causal Cyber-Defence Agents
via Cyber Security Simulation. arXiv:2207.12355.
Brockman, G., Cheung, V., Pettersson, L., Schneider, J.,
Schulman, J., Tang, J., and Zaremba, W. (2016). Ope-
nAI Gym. arXiv:1606.01540 [cs].
Dra
ˇ
sar, M., Moskal, S., Yang, S., and Zat’ko, P. (2020).
Session-level Adversary Intent-Driven Cyberattack
Simulator. In 2020 IEEE/ACM 24th International
Symposium on Distributed Simulation and Real Time
Applications (DS-RT), pages 1–9. ISSN: 1550-6525.
Du, Y., Watkins, O., Wang, Z., Colas, C., Darrell, T.,
Abbeel, P., Gupta, A., and Andreas, J. (2023). Guid-
ing Pretraining in Reinforcement Learning with Large
Language Models. In Proceedings of the 40th Inter-
national Conference on Machine Learning, Honolulu,
USA.
Elderman, R., J. J. Pater, L., S. Thie, A., M. Drugan, M., and
M. Wiering, M. (2017). Adversarial Reinforcement
Learning in a Cyber Security Simulation:. In Proceed-
ings of the 9th International Conference on Agents
and Artificial Intelligence, pages 559–566, Porto, Por-
tugal. SCITEPRESS.
Hammar, K. and Stadler, R. (2020). Finding Effective Se-
curity Strategies through Reinforcement Learning and
Self-Play. In 2020 16th International Conference on
Network and Service Management (CNSM), pages 1–
9. ISSN: 2165-963X.
Janisch, J., Pevn
´
y, T., and Lis
´
y, V. (2023). NASimEmu:
Network Attack Simulator & Emulator for Train-
ing Agents Generalizing to Novel Scenarios.
arXiv:2305.17246.
Microsoft (2021). CyberBattleSim. Microsoft Defender
Reasearch Team.
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A.,
Antonoglou, I., Wierstra, D., and Riedmiller, M.
(2013). Playing Atari with Deep Reinforcement
Learning. arXiv:1312.5602 [cs].
OpenAI (2023). GPT-4 Technical Report.
arXiv:2303.08774 [cs].
Park, J. S., O’Brien, J. C., Cai, C. J., Morris, M. R.,
Liang, P., and Bernstein, M. S. (2023). Generative
Agents: Interactive Simulacra of Human Behavior.
arXiv:2304.03442 [cs].
Shinn, N., Cassano, F., Labash, B., Gopinath, A.,
Narasimhan, K., and Yao, S. (2023). Reflexion: Lan-
guage Agents with Verbal Reinforcement Learning.
arXiv:2303.11366 [cs].
Standen, M., Lucas, M., Bowman, D., Richer, T. J., Kim,
J., and Marriott, D. (2021). CybORG: A Gym
for the Development of Autonomous Cyber Agents.
arXiv:2108.09118.
Wang, G., Xie, Y., Jiang, Y., Mandlekar, A., Xiao, C., Zhu,
Y., Fan, L., and Anandkumar, A. (2023a). Voyager:
An Open-Ended Embodied Agent with Large Lan-
guage Models. arXiv:2305.16291 [cs].
Wang, Z., Cai, S., Liu, A., Ma, X., and Liang, Y. (2023b).
Describe, Explain, Plan and Select: Interactive Plan-
ning with Large Language Models Enables Open-
World Multi-Task Agents. arXiv:2302.01560 [cs].
Watkins, C. J. C. H. and Dayan, P. (1992). Q-learning. Ma-
chine Learning, 8(3):279–292.
Wu, Y., Min, S. Y., Prabhumoye, S., Bisk, Y., Salakhutdi-
nov, R., Azaria, A., Mitchell, T., and Li, Y. (2023).
SPRING: GPT-4 Out-performs RL Algorithms by
Studying Papers and Reasoning. arXiv:2305.15486.
Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan,
K., and Cao, Y. (2023). ReAct: Synergiz-
ing Reasoning and Acting in Language Models.
arXiv:2210.03629 [cs].
Out of the Cage: How Stochastic Parrots Win in Cyber Security Environments
781