and world representation model can be easily ex-
tended to a more complex setup in size of the network
and action space, the scalability and computational
feasibility of such extensions have yet to be evaluated.
Therefore, the natural direction for future research
is to expand our approach towards larger environ-
ments, which will require subsequent scalability test-
ing due to those complex setups. We also plan to in-
corporate other types of cyber attacks into Mitre tax-
onomy and model the defender as a rational entity
with its own set of actions in the interaction. In ad-
dition, we plan to test the performance of our agent in
a simulated environment.
Along with increasing the environmental com-
plexity, the problem of more complex goals for the
attacker is also in the pipeline resulting in the need
for more reconnaissance from the agent.
ACKNOWLEDGMENTS
The authors acknowledge support from the Re-
search Center for Informatics (CZ.02.1.01/0.0/0.0/
16 019/0000765) and Strategic Support for the De-
velopment of Security Research in the Czech Repub-
lic 2019–2025 (IMPAKT 1) program, by the Min-
istry of the Interior of the Czech Republic under No.
VJ02010020 – AI-Dojo: Multi-agent testbed for the
research and testing of AI-driven cyber security tech-
nologies.
REFERENCES
Anwar, A. H. and Kamhoua, C. A. (2022). Cyber deception
using honeypot allocation and diversity: A game theo-
retic approach. In 2022 IEEE 19th Annual Consumer
Communications & Networking Conference (CCNC).
Bonet, B. and Geffner, H. (2001). Planning as heuristic
search. Artificial Intelligence, 129(1):5–33.
Chung, K., Kamhoua, C. A., Kwiat, K. A., Kalbarczyk,
Z. T., and Iyer, R. K. (2016). Game theory with learn-
ing for cyber security monitoring. In 2016 IEEE 17th
International Symposium on High Assurance Systems
Engineering (HASE), pages 1–8.
Dehghan, M., Sadeghiyan, B., Khosravian, E., Moghad-
dam, A. S., and Nooshi, F. (2022). ProAPT: Projection
of APT Threats with Deep Reinforcement Learning.
arXiv:2209.07215 [cs].
Dra
ˇ
sar, M., Moskal, S., Yang, S., and Zat’ko, P. (2020).
Session-level adversary intent-driven cyberattack sim-
ulator. In 2020 IEEE/ACM 24th International Sympo-
sium on Distributed Simulation and Real Time Appli-
cations (DS-RT), pages 1–9.
Du, Y., Song, Z., Milani, S., Gonzales, C., and Fang, F.
(2022). Learning to play an adaptive cyber decep-
tion game. In The 13th Workshop on Optimization
and Learning in Multiagent Systems, AAMAS.
Durkota, K., Lisy, V., Kiekintveld, C., Bosansky, B., and
Pechoucek, M. (2016). Case studies of network de-
fense with attack graph games. IEEE Intelligent Sys-
tems.
ENISA (2022). ENISA threat landscape for ransomware
attacks. Technical report, ENISA, LU.
Fikes, R. E. and Nilsson, N. J. (1971). Strips: A new ap-
proach to the application of theorem proving to prob-
lem solving. Artificial Intelligence, 2(3):189–208.
Guo, M., Li, J., Neumann, A., Neumann, F., and Nguyen,
H. (2021). Practical fixed-parameter algorithms for
defending active directory style attack graphs.
Hammar, K. and Stadler, R. (2020). Finding effective se-
curity strategies through reinforcement learning and
self-play. In 2020 16th International Conference on
Network and Service Management (CNSM). IEEE.
Hasselt, H. (2010). Double q-learning. Advances in neural
information processing systems, 23.
Hou, X., Jiang, Z., and Tian, X. (2010). The detection and
prevention for arp spoofing based on snort. In 2010 In-
ternational Conference on Computer Application and
System Modeling (ICCASM 2010), volume 5, pages
V5–137–V5–139.
Huang, Y.-T., Lin, C. Y., Guo, Y.-R., Lo, K.-C., Sun, Y. S.,
and Chen, M. C. (2022). Open source intelligence for
malicious behavior discovery and interpretation. IEEE
Transactions on Dependable and Secure Computing.
Jang, B., Kim, M., Harerimana, G., and Kim, J. W. (2019a).
Q-learning algorithms: A comprehensive classifica-
tion and applications. IEEE Access.
Jang, B., Kim, M., Harerimana, G., and Kim, J. W. (2019b).
Q-learning algorithms: A comprehensive classifica-
tion and applications. IEEE access.
Liu, P., Zang, W., and Yu, M. (2005). Incentive-based mod-
eling and inference of attacker intent, objectives, and
strategies. ACM Transactions on Information and Sys-
tem Security (TISSEC), 8(1):78–118.
Mitchell, R. and Healy, B. (2018). A game theoretic
model of computer network exploitation campaigns.
In 2018 IEEE 8th Annual Computing and Communi-
cation Workshop and Conference (CCWC).
Moskal, S., Yang, S. J., and Kuhl, M. E. (2018). Cyber
threat assessment via attack scenario simulation us-
ing an integrated adversary and network modeling ap-
proach. Journal of Defense Modeling and Simulation.
Niculae, S., Dichiu, D., Yang, K., and B
¨
ack, T. (2020).
Automating penetration testing using reinforcement
learning.
Patil, A., Bharath, S., and Annigeri, N. (2018). Applica-
tions of game theory for cyber security system: A sur-
vey. International Journal of Applied Engineering Re-
search, 13(17):12987–12990.
Shiva, S., Roy, S., and Dasgupta, D. (2010). Game theory
for cyber security. In Proceedings of the Sixth Annual
Workshop on Cyber Security and Information Intelli-
gence Research, pages 1–4.
Sutton, R. S. and Barto, A. G. (2018). Reinforcement learn-
ing: An introduction. MIT press.
Catch Me if You Can: Improving Adversaries in Cyber-Security with Q-Learning Algorithms
449