
5 DISCUSSION
In this study, we conducted a comprehensive evalua-
tion of three QRL classes (PQC-QRL with QPG and
QDQN, FE-QRL and AA-QRL). Our evaluation ex-
tends beyond previous works by the number of con-
sidered QRL algorithms and the incorporation of ad-
ditional metrics such as circuit executions and quan-
tum clock time, providing a more holistic and realistic
assessment of these algorithms’ practical feasibility.
For PQC-QRL, we observed only a minor depen-
dence on quantum entanglement, with performance
deteriorating only slightly when entanglement was re-
moved. Interestingly, our investigation of FE-QRL
showed no clear correlation between performance
and the number of replicas used to approximate the
Hamiltonian of the QBM H
QBM
v
, but rather a great
dependence on hyperparameters. These findings sug-
gest that most QRL approaches may not greatly rely
on their quantum components.
QRL, particularly when applied to gridworld
games, demonstrates promising scalability to larger
problems through binary encoding, even with cur-
rent hardware limitations. However, the algorithms
we evaluated still require substantial improvement to
achieve competitive performance levels. Our work
can serve as an underlying benchmarking reference
for this future development.
Future work should aim to include the evalua-
tion of noise resilience as an additional metric in or-
der to assess these algorithms’ practical viability in
real quantum hardware implementations. Addition-
ally, not only the quantum clock time, but also the
overall clock time of these hybrid algorithms should
be considered when comparing QRL to classical RL.
CODE AVAILABILITY
The code to reproduce the results as well as the data
used to generate the plots in this work can be found
here: https://github.com/georgkruse/cleanqrl
ACKNOWLEDGEMENTS
The research is part of the Munich Quantum Valley,
which is supported by the Bavarian state government
with funds from the Hightech Agenda Bayern Plus.
REFERENCES
Abbas, A., Sutter, D., Zoufal, C., Lucchi, A., Figalli, A.,
and Woerner, S. (2021). The power of quantum neural
networks. Nature Computational Science, 1(6):403–
409.
Ackley, D. H., Hinton, G. E., and Sejnowski, T. J. (1985). A
learning algorithm for boltzmann machines. Cognitive
science, 9(1):147–169.
Amin, M. H. (2015). Searching for quantum speedup in
quasistatic quantum annealers. Physical Review A,
92(5):052323.
Amin, M. H., Andriyash, E., Rolfe, J., Kulchytskyy, B.,
and Melko, R. (2018). Quantum boltzmann machine.
Physical Review X, 8(2):0541.
Babaeizadeh, M., Frosio, I., Tyree, S., Clemons, J., and
Kautz, J. (2016). Reinforcement learning through
asynchronous advantage actor-critic on a gpu. arXiv
preprint arXiv:1611.06256.
Bermejo, P., Braccia, P., Rudolph, M. S., Holmes, Z., Cin-
cio, L., and Cerezo, M. (2024). Quantum convo-
lutional neural networks are (effectively) classically
simulable. arXiv preprint arXiv:2408.12739.
Bowles, J., Ahmed, S., and Schuld, M. (2024). Bet-
ter than classical? the subtle art of benchmarking
quantum machine learning models. arXiv preprint
arXiv:2403.07059.
Chen, S. Y.-C., Yang, C.-H. H., Qi, J., Chen, P.-Y., Ma,
X., and Goan, H.-S. (2020). Variational quantum cir-
cuits for deep reinforcement learning. IEEE access,
8:141007–141024.
Coelho, R., Sequeira, A., and Paulo Santos, L. (2024). Vqc-
based reinforcement learning with data re-uploading:
performance and trainability. Quantum Machine In-
telligence, 6(2):53.
Crawford, D., Levit, A., Ghadermarzy, N., Oberoi, J. S.,
and Ronagh, P. (2018). Reinforcement learning using
quantum boltzmann machines. Quantum Information
& Computation.
Dong, D., Chen, C., Chu, J., and Tarn, T.-J. (2010).
Robust quantum-inspired reinforcement learning for
robot navigation. IEEE/ASME transactions on mecha-
tronics, 17(1):86–97.
Dong, D., Chen, C., Li, H., and Tarn, T.-J. (2008). Quan-
tum reinforcement learning. IEEE Transactions on
Systems, Man, and Cybernetics, Part B (Cybernetics),
38(5):1207–1220.
Dr
˘
agan, T.-A., Monnet, M., Mendl, C. B., and Lorenz, J. M.
(2022). Quantum reinforcement learning for solv-
ing a stochastic frozen lake environment and the im-
pact of quantum architecture choices. arXiv preprint
arXiv:2212.07932.
Hu, Y., Tang, F., Chen, J., and Wang, W. (2021). Quantum-
enhanced reinforcement learning for control: A pre-
liminary study. Control Theory and Technology,
19:455–464.
Jerbi, S., Gyurik, C., Marshall, S., Briegel, H., and Dunjko,
V. (2021a). Parametrized quantum policies for rein-
forcement learning. Advances in Neural Information
Processing Systems, 34:28362–28375.
Benchmarking Quantum Reinforcement Learning
781