
ACKNOWLEDGEMENTS
This research has also received funding from the KU
Leuven Research Funds (C14/24/092, iBOF/21/075),
and from the Flemish Government under the ”Onder-
zoeksprogramma Artifici
¨
ele Intelligentie (AI) Vlaan-
deren” programme.
REFERENCES
Browne, C. B., Powley, E., Whitehouse, D., Lucas, S. M.,
Cowling, P. I., Rohlfshagen, P., Tavener, S., Perez, D.,
Samothrakis, S., and Colton, S. (2012). A survey of
monte carlo tree search methods. IEEE Transactions on
Computational Intelligence and AI in games, 4(1):1–43.
Charikar, M. S. (2002). Similarity estimation techniques
from rounding algorithms. In Proceedings of the thiry-
fourth annual ACM symposium on Theory of computing,
pages 380–388.
Chen, L., Esfandiari, H., Fu, G., and Mirrokni, V. (2019).
Locality-sensitive hashing for f-divergences: Mutual in-
formation loss and beyond. Advances in Neural Informa-
tion Processing Systems, 32.
Choe, J. S. B. and Kim, J.-K. (2019). Enhancing monte
carlo tree search for playing hearthstone. In 2019 IEEE
conference on games (CoG), pages 1–7. IEEE.
Cohen, D. et al. (1997). Precalculus: A problems-oriented
approach. (No Title).
Czech, J., Korus, P., and Kersting, K. (2021). Improving al-
phazero using monte-carlo graph search. In Proceedings
of the International Conference on Automated Planning
and Scheduling, volume 31, pages 103–111.
Datar, M., Immorlica, N., Indyk, P., and Mirrokni, V. S.
(2004). Locality-sensitive hashing scheme based on p-
stable distributions. In Proceedings of the twentieth
annual symposium on Computational geometry, pages
253–262.
Fischer, J. and Tas,
¨
O. S. (2020). Information particle filter
tree: An online algorithm for pomdps with belief-based
rewards on continuous domains. In International Confer-
ence on Machine Learning, pages 3177–3187. PMLR.
Kaelbling, L. P., Littman, M. L., and Cassandra, A. R.
(1998). Planning and acting in partially observable
stochastic domains. Artificial intelligence, 101(1-2):99–
134.
Kullback, S. (1951). Kullback-leibler divergence.
Levine, J. (2017). Monte Carlo Tree Search. https://www.
youtube.com/watch?v=UXW2yZndl7U. [Accessed 19-
07-2024].
Mao, X.-L., Feng, B.-S., Hao, Y.-J., Nie, L., Huang, H., and
Wen, G. (2017). S2jsd-lsh: A locality-sensitive hash-
ing schema for probability distributions. In Proceedings
of the AAAI Conference on Artificial Intelligence, vol-
ume 31.
Mundhenk, M., Goldsmith, J., Lusena, C., and Allender,
E. (2000). Complexity of finite-horizon markov deci-
sion process problems. Journal of the ACM (JACM),
47(4):681–720.
Papadimitriou, C. H. and Tsitsiklis, J. N. (1987). The com-
plexity of markov decision processes. Mathematics of
operations research, 12(3):441–450.
Pineau, J., Gordon, G., Thrun, S., et al. (2003). Point-based
value iteration: An anytime algorithm for pomdps. In
Ijcai, volume 3, pages 1025–1032.
Saffidine, A., Cazenave, T., and M
´
ehat, J. (2012). Ucd: Up-
per confidence bound for rooted directed acyclic graphs.
Knowledge-Based Systems, 34:26–33.
Schrittwieser, J., Antonoglou, I., Hubert, T., Simonyan, K.,
Sifre, L., Schmitt, S., Guez, A., Lockhart, E., Hassabis,
D., Graepel, T., et al. (2020). Mastering atari, go, chess
and shogi by planning with a learned model. Nature,
588(7839):604–609.
Shlens, J. (2014). Notes on kullback-leibler divergence and
likelihood. arXiv preprint arXiv:1404.2000.
Silver, D. and Veness, J. (2010). Monte-carlo planning in
large pomdps. Advances in neural information process-
ing systems, 23.
Smallwood, J. E. and Sondik, E. J. (1973). The optimal
control of partially observable markov processes over a
finite horizon. Operations Research, 21(5):1071–1088.
Smith, T. and Simmons, R. (2012). Heuristic search value
iteration for pomdps. arXiv preprint arXiv:1207.4166.
Somani, A., Ye, N., Hsu, D., and Lee, W. S. (2013). Despot:
Online pomdp planning with regularization. Advances in
neural information processing systems, 26.
Sunberg, Z. et al. (2024). GitHub - Juli-
aPOMDP/BasicPOMCP.jl: The PO-UCT algorithm
(aka POMCP) implemented in Julia — github.com.
https://github.com/JuliaPOMDP/BasicPOMCP.jl. [Ac-
cessed 22-07-2024].
Svore
ˇ
nov
´
a, M., Chmel
´
ık, M., Leahy, K., Eniser, H. F., Chat-
terjee, K.,
ˇ
Cern
´
a, I., and Belta, C. (2015). Temporal logic
motion planning using pomdps with parity objectives:
Case study paper. In Proceedings of the 18th Interna-
tional Conference on Hybrid Systems: Computation and
Control, pages 233–238.
Belief Re-Use in Partially Observable Monte Carlo Tree Search
645