
trol of tokamak plasmas through deep reinforcement
learning. Nature, 602(7897):414–419.
Dulac-Arnold, G., Mankowitz, D., and Hester, T. (2019).
Challenges of real-world reinforcement learning.
arXiv preprint arXiv:1904.12901.
Farhi, E. and Harrow, A. W. (2016). Quantum supremacy
through the quantum approximate optimization algo-
rithm. arXiv preprint arXiv:1602.07674.
Glover, F., Kochenberger, G., and Du, Y. (2019). Quantum
bridge analytics I: A tutorial on formulating and using
QUBO models.
Gu, S., Yang, L., Du, Y., Chen, G., Walter, F., Wang, J.,
Yang, Y., and Knoll, A. (2022). A review of safe re-
inforcement learning: Methods, theory and applica-
tions. arXiv preprint arXiv:2205.10330.
Koseoglu, M. and Ozcelikkale, A. (2020). How to miss
data? reinforcement learning for environments with
high observation cost. In ICML Workshop on the Art
of Learning with Missing Values (Artemiss).
Liang, Y., Sun, Y., Zheng, R., and Huang, F. (2022). Ef-
ficient adversarial training without attacking: Worst-
case-aware robust reinforcement learning. Advances
in Neural Information Processing Systems, 35:22547–
22561.
Lodewijks, B. (2020). Mapping NP-hard and NP-complete
optimisation problems to quadratic unconstrained bi-
nary optimisation problems.
Lucas, A. (2014). Ising formulations of many NP problems.
L
¨
utjens, B., Everett, M., and How, J. P. (2020). Certified ad-
versarial robustness for deep reinforcement learning.
In conference on Robot Learning, pages 1328–1337.
PMLR.
Mandlekar, A., Zhu, Y., Garg, A., Fei-Fei, L., and Savarese,
S. (2017). Adversarially robust policy learning:
Active construction of physically-plausible perturba-
tions. In 2017 IEEE/RSJ International Conference on
Intelligent Robots and Systems (IROS), pages 3932–
3939. IEEE.
Martello, S. and Toth, P. (1990). Knapsack problems: algo-
rithms and computer implementations. John Wiley &
Sons, Inc.
Moos, J., Hansel, K., Abdulsamad, H., Stark, S., Clever, D.,
and Peters, J. (2022). Robust reinforcement learning:
A review of foundations and recent advances. Ma-
chine Learning and Knowledge Extraction, 4(1):276–
315.
Morita, S. and Nishimori, H. (2008). Mathematical founda-
tion of quantum annealing. Journal of Mathematical
Physics, 49(12).
Nam, H. A., Fleming, S., and Brunskill, E. (2021). Re-
inforcement learning with state observation costs in
action-contingent noiselessly observable markov deci-
sion processes. Advances in Neural Information Pro-
cessing Systems, 34:15650–15666.
N
¨
ußlein, J., Roch, C., Gabor, T., Stein, J., Linnhoff-Popien,
C., and Feld, S. (2023a). Black box optimization using
qubo and the cross entropy method. In International
Conference on Computational Science, pages 48–55.
Springer.
N
¨
ußlein, J., Zielinski, S., Gabor, T., Linnhoff-Popien,
C., and Feld, S. (2023b). Solving (max) 3-sat via
quadratic unconstrained binary optimization. In Inter-
national Conference on Computational Science, pages
34–47. Springer.
Pattanaik, A., Tang, Z., Liu, S., Bommannan, G., and
Chowdhary, G. (2017). Robust deep reinforcement
learning with adversarial attacks. arXiv preprint
arXiv:1712.03632.
Pinto, L., Davidson, J., Sukthankar, R., and Gupta, A.
(2017). Robust adversarial reinforcement learning.
In International Conference on Machine Learning,
pages 2817–2826. PMLR.
Quintero, R. A. and Zuluaga, L. F. (2021). Characterizing
and benchmarking qubo reformulations of the knap-
sack problem. Technical report, Technical Report.
Department of Industrial and Systems Engineering,
Lehigh . . . .
Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus,
M., and Dormann, N. (2021). Stable-baselines3: Reli-
able reinforcement learning implementations. Journal
of Machine Learning Research, 22(268):1–8.
Salkin, H. M. and De Kluyver, C. A. (1975). The knapsack
problem: a survey. Naval Research Logistics Quar-
terly, 22(1):127–144.
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and
Klimov, O. (2017). Proximal policy optimization al-
gorithms. arXiv preprint arXiv:1707.06347.
Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai,
M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D.,
Graepel, T., et al. (2018). A general reinforcement
learning algorithm that masters chess, shogi, and go
through self-play. Science, 362(6419):1140–1144.
Sutton, R. S. and Barto, A. G. (2018). Reinforcement learn-
ing: An introduction. MIT press.
Valdenegro-Toro, M. and Mori, D. S. (2022). A deeper look
into aleatoric and epistemic uncertainty disentangle-
ment. In 2022 IEEE/CVF Conference on Computer
Vision and Pattern Recognition Workshops (CVPRW),
pages 1508–1516. IEEE.
ICAART 2025 - 17th International Conference on Agents and Artificial Intelligence
252