
5 PROMISING RESEARCH
DIRECTIONS
5.1 Multiplayer Experiments
PenQuestEnv enables to train an agent against differ-
ent opponents, one at a time. This style of training
bears the potential to overfit on a specific opponent
strategy where a specific weakness is exploited. This
can result in winning against one specific (advanced)
strategy but due to a lack of generalization at the same
time loosing against a rather simple opponent. Such
behaviour was already observed in other games where
no single best strategy exists, like football (Kurach
et al., 2020), or StarCraft2 (Vinyals et al., 2019). This
non-transitivity of strategies is also characteristic for
real-world cyber incidents, where an ever evolving
arms race between attackers and defenders, constantly
adapting to the opponents strategy, takes place. We
therefore think that PenQuestEnv is a fitting opportu-
nity to inspire research in this area, as little similar
work currently exists in IT security in this manner.
5.2 Risk Assessment Experiments
Risk assessment in computer systems is a non-trivial
task. By finding strategies in given scenarios that are
most likely to succeed, trained agents can also be used
to support decisions for risk assessment. Such agents
may provide additional information for security man-
agement decisions on where to put resources like per-
sonnel attention or money. The insights gathered by
these agents, adaptable to different risk-tolerances can
be invaluable resources to decision makers. We be-
lieve PenQuestEnv provides a unique setting to enable
future research into this area.
6 CONCLUSION
In this paper we introduced PenQuestEnv, a novel
open source reinforcement learning environment ex-
tension to the partial-information, turn-based, digi-
tal, cyber security board game PenQuest. It is non-
symmetric in its action choices, highly diverse and
challenging to win against a wide variety of oppo-
nents. PenQuestEnv comes with a diverse set of dif-
ferent scenarios making it a fitting environment for
training multipurpose cyber agents, as well as two
baseline bots that help evaluating new RL agents. We
expect that this environment will be useful to AI and
security researchers alike to investigate current scien-
tific challenges.
ACKNOWLEDGEMENTS
This research was primarily funded by the Austrian
Science Fund (FWF) [P 33656-N]. Additionally the
financial support by the Austrian Federal Ministry
of Labour and Economy, the National Foundation
for Research, Technology and Development and the
Christian Doppler Research Association is gratefully
acknowledged. For the purpose of open access, the
author has applied a CC BY public copyright licence
to any Author Accepted Manuscript version arising
from this submission.
REFERENCES
Bellemare, M. G., Naddaf, Y., Veness, J., and Bowling, M.
(2013). The arcade learning environment: An evalua-
tion platform for general agents. Journal of Artificial
Intelligence Research, 47:253–279.
Brockman, G., Cheung, V., Pettersson, L., Schneider, J.,
Schulman, J., Tang, J., and Zaremba, W. (2016). Ope-
nai gym.
Caturano, F., Perrone, G., and Romano, S. P. (2021). Dis-
covering reflected cross-site scripting vulnerabilities
using a multiobjective reinforcement learning envi-
ronment. Computers & Security, 103:102204.
Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., and
Koltun, V. (2017). CARLA: An open urban driving
simulator. In Proceedings of the 1st Annual Confer-
ence on Robot Learning, pages 1–16.
Hammar, K. and Stadler, R. (2020). Finding effective se-
curity strategies through reinforcement learning and
self-play. In 2020 16th International Conference on
Network and Service Management (CNSM), pages 1–
9. IEEE.
Kunz, T., Fisher, C., La Novara-Gsell, J., Nguyen, C., and
Li, L. (2022). A multiagent cyberbattlesim for rl cyber
operation agents. In 2022 International Conference
on Computational Science and Computational Intelli-
gence (CSCI), pages 897–903. IEEE.
Kurach, K., Raichuk, A., Sta
´
nczyk, P., Zaj ˛ac, M., Bachem,
O., Espeholt, L., Riquelme, C., Vincent, D., Michal-
ski, M., Bousquet, O., et al. (2020). Google re-
search football: A novel reinforcement learning en-
vironment. In Proceedings of the AAAI conference on
artificial intelligence, volume 34, pages 4501–4510.
Liu, X.-Y., Yang, H., Chen, Q., Zhang, R., Yang, L., Xiao,
B., and Wang, C. D. (2020). Finrl: A deep rein-
forcement learning library for automated stock trad-
ing in quantitative finance. In Proceedings of the 34th
Conference on Neural Information Processing Sys-
tems (NeurIPS 2020.
Luh, R., Eresheim, S., Größbacher, S., Petelin, T., Mayr, F.,
Tavolato, P., and Schrittwieser, S. (2022). Penquest
reloaded: A digital cyber defense game for technical
education. In 2022 IEEE Global Engineering Educa-
tion Conference (EDUCON), pages 906–914. IEEE.
PenQuestEnv: A Reinforcement Learning Environment for Cyber Security
223