
tailed comparison of the performance of these differ-
ent strategy combinations is presented in Table 1.
5 CONCLUSION AND FUTURE
WORK
In conclusion, this work demonstrated the potential of
quantum-inspired techniques—QAOA and QER—to
improve the training efficiency of defensive agents
in Autonomous Cyber Defence. As cyber-attacks
grow increasingly complex, the integration of these
methods with DRL can enhance decision-making and
responsiveness against threats, including APTs and
zero-day exploits (Li and Hankin, 2017).
QER buffers represent a substantial improvement
in experience sampling, leveraging quantum-inspired
principles to produce more diverse and representa-
tive memory retrieval. This leads to more effective
learning and enhanced defensive capabilities. Mean-
while, integrating QAOA with DRL helps solve com-
plex optimization tasks, enabling agents to navigate
intricate decision spaces and yield globally optimized
solutions. This combined approach strengthens agent
adaptability, producing more robust strategies for
managing sophisticated cyber threats.
However, quantum-inspired methods impose
computational demands and complexity. Although
employing DDQN, Boltzmann strategies, and PER
yielded a balance between performance and feasibil-
ity, current quantum resources are limited. Simulating
quantum computing on classical hardware can intro-
duce bottlenecks that affect scalability and realism.
Future research should focus on larger, more com-
plex environments and real-world scenarios. Val-
idating these techniques outside simulated settings
will help identify challenges and guide practical de-
ployments. Further exploration of QER’s underly-
ing mechanisms, dynamic parameter tuning, and effi-
cient resource management can refine these quantum-
inspired approaches. Ultimately, these methods of-
fer promising avenues for advancing cyber defence
strategies and resilience.
REFERENCES
Andrychowicz, M., Wolski, F., Ray, A., Schneider, J., Fong,
R., Welinder, P., McGrew, B., Tobin, J., Abbeel, O. P.,
and Zaremba, W. (2017a). Hindsight experience re-
play. In Advances in neural information processing
systems, volume 30.
Andrychowicz, M., Wolski, F., Ray, A., Schneider, J., Fong,
R., Welinder, P., McGrew, B., Tobin, J., Abbeel, P.,
and Zaremba, W. (2017b). Hindsight experience re-
play. CoRR, abs/1707.01495.
Baillie, C., Standen, M., Schwartz, J., Docking, M., Bow-
man, D., and Kim, J. (2020). Cyborg: An au-
tonomous cyber operations research gym. arXiv
preprint arXiv:2002.10667.
Benaddi, H., Elhajji, S., Benaddi, A., Amzazi, S., Benaddi,
H., and Oudani, H. (2022). Robust enhancement of
intrusion detection systems using deep reinforcement
learning and stochastic game. IEEE Transactions on
Vehicular Technology, 71(10):11089–11102.
Cercignani, C. and Cercignani, C. (1988). The Boltzmann
Equation. Springer New York.
Hasselt, H. V., Guez, A., and Silver, D. (2016). Deep re-
inforcement learning with double q-learning. In Pro-
ceedings of the AAAI Conference on Artificial Intelli-
gence, volume 30.
Kiely, M., Bowman, D., Standen, M., and Moir, C. (2023).
On autonomous agents in a cyber defence environ-
ment. arXiv preprint arXiv:2309.07388.
Lagoudakis, M. and Parr, R. (2012). Value function approx-
imation in zero-sum markov games. arXiv preprint
arXiv:1301.0580.
Li, T. and Hankin, C. (2017). Effective defence against
zero-day exploits using bayesian networks. In Crit-
ical Information Infrastructures Security: 11th Inter-
national Conference, pages 123–136. Springer.
Liu, X., Zhang, H., Dong, S., and Zhang, Y. (2021). Net-
work defense decision-making based on a stochastic
game system and a deep recurrent q-network. Com-
puters & Security, 111:102480.
Schaul, T., Quan, J., Antonoglou, I., and Silver, D. (2015).
Prioritized experience replay.
Shen, Y., Shepherd, C., Ahmed, C. M., Yu, S., and Li,
T. (2024). Comparative dqn-improved algorithms for
stochastic games-based automated edge intelligence-
enabled iot malware spread-suppression strategies.
IEEE Internet of Things Journal, 11(12):22550–
22561.
Standen, M., Bowman, D., Son Hoang, T. R., Lucas, M.,
Tassel, R. V., Vu, P., Kiely, M., Konschnik, K. C. N.,
and Collyer, J. (2022). Cyber operations research
gym. https://github.com/cage-challenge/CybORG.
Vyas, S., Hannay, J., Bolton, A., and Burnap, P. P. (2023).
Automated cyber defence: A review. arXiv preprint
arXiv:2303.04926.
Wei, Q., Ma, H., Chen, C., and Dong, D. (2021). Deep
reinforcement learning with quantum-inspired expe-
rience replay. IEEE Transactions on Cybernetics,
52(9):9326–9338.
Zhou, L., Wang, S. T., Choi, S., Pichler, H., and Lukin,
M. D. (2020). Quantum approximate optimization
algorithm: Performance, mechanism, and implemen-
tation on near-term devices. Physical Review X,
10(2):021067.
Autonomous Cyber Defence by Quantum-Inspired Deep Reinforcement Learning
191