Berner, C., Brockman, G., Chan, B., Cheung, V., Debiak,
P., Dennison, C., Farhi, D., Fischer, Q., Hashme, S.,
Hesse, C., J
´
ozefowicz, R., Gray, S., Olsson, C., Pa-
chocki, J., Petrov, M., Pinto, H. P. d. O., Raiman,
J., Salimans, T., Schlatter, J., Schneider, J., Sidor,
S., Sutskever, I., Tang, J., Wolski, F., and Zhang,
S. (2019). Dota 2 with Large Scale Deep Rein-
forcement Learning. CoRR, abs/1912.06680. arXiv:
1912.06680.
Brockman, G. and others (2016). OpenAI Gym. eprint:
arXiv:1606.01540.
Haarnoja, T., Zhou, A., Hartikainen, K., Tucker, G., Ha, S.,
Tan, J., Kumar, V., Zhu, H., Gupta, A., Abbeel, P.,
and Levine, S. (2018). Soft Actor-Critic Algorithms
and Applications. CoRR, abs/1812.05905. arXiv:
1812.05905.
Hamid, O. H. (2014). The role of temporal statistics in the
transfer of experience in context-dependent reinforce-
ment learning. In 2014 14th International Conference
on Hybrid Intelligent Systems, pages 123–128.
Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T.,
Tassa, Y., Silver, D., and Wierstra, D. (2016). Con-
tinuous control with deep reinforcement learning. In
Bengio, Y. and LeCun, Y., editors, 4th International
Conference on Learning Representations, ICLR 2016,
San Juan, Puerto Rico, May 2-4, 2016, Conference
Track Proceedings.
Lin, L.-J. (1992). Self-improving reactive agents based on
reinforcement learning, planning and teaching. Ma-
chine Learning, 8(3):293–321.
Lin, L.-J. (1993). Reinforcement learning for robots us-
ing neural networks. Technical report, CARNEGIE-
MELLON UNIV PITTSBURGH PA SCHOOL OF
COMPUTER SCIENCE.
McClelland, J. L., McNaughton, B. L., and O’Reilly, R. C.
(1995). Why there are complementary learning sys-
tems in the hippocampus and neocortex: insights
from the successes and failures of connectionist mod-
els of learning and memory. Psychological review,
102(3):419. Publisher: American Psychological As-
sociation.
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A.,
Antonoglou, I., Wierstra, D., and Riedmiller, M. A.
(2013). Playing Atari with Deep Reinforcement
Learning. CoRR, abs/1312.5602.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A.,
Veness, J., Bellemare, M. G., Graves, A., Ried-
miller, M., Fidjeland, A. K., Ostrovski, G., and others
(2015). Human-level control through deep reinforce-
ment learning. nature, 518(7540):529–533. Publisher:
Nature Publishing Group.
Omohundro, S. M. (1989). Five balltree construction al-
gorithms. International Computer Science Institute
Berkeley.
O’Neill, J., Pleydell-Bouverie, B., Dupret, D., and
Csicsvari, J. (2010). Play it again: reactivation of wak-
ing experience and memory. Trends in Neurosciences,
33(5):220 – 229.
Pilar von Pilchau, W. (2019). Averaging rewards as a
first approach towards Interpolated Experience Re-
play. In Draude, C., Lange, M., and Sick, B., edi-
tors, INFORMATIK 2019: 50 Jahre Gesellschaft f
¨
ur
Informatik – Informatik f
¨
ur Gesellschaft (Workshop-
Beitr
¨
age), pages 493–506, Bonn. Gesellschaft f
¨
ur In-
formatik e.V.
Pilar von Pilchau, W., Stein, A., and H
¨
ahner, J. (2020).
Bootstrapping a DQN replay memory with synthetic
experiences. In Merelo, J. J., Garibaldi, J., Wagner, C.,
B
¨
ack, T., Madani, K., and Warwick, K., editors, Pro-
ceedings of the 12th International Joint Conference on
Computational Intelligence (IJCCI 2020), November
2-4, 2020.
Pilar von Pilchau, W., Stein, A., and H
¨
ahner, J. (2021).
Synthetic Experiences for Accelerating DQN Perfor-
mance in Discrete Non-Deterministic Environments.
Algorithms, 14(8).
Rolnick, D., Ahuja, A., Schwarz, J., Lillicrap, T. P., and
Wayne, G. (2018). Experience Replay for Con-
tinual Learning. CoRR, abs/1811.11682.
eprint:
1811.11682.
Sander, R. M. (2021). Interpolated Experience Replay for
Improved Sample Efficiency of Model-Free Deep Re-
inforcement Learning Algorithms. PhD Thesis, Mas-
sachusetts Institute of Technology.
Schaul, T., Quan, J., Antonoglou, I., and Silver, D. (2015).
Prioritized Experience Replay. arXiv e-prints, page
arXiv:1511.05952. eprint: 1511.05952.
Shepard, D. (1968). A Two-Dimensional Interpolation
Function for Irregularly-Spaced Data. In Proceedings
of the 1968 23rd ACM National Conference, ACM
’68, pages 517–524, New York, NY, USA. Associa-
tion for Computing Machinery.
Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I.,
Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M.,
Bolton, A., Chen, Y., Lillicrap, T., Hui, F., Sifre, L.,
van den Driessche, G., Graepel, T., and Hassabis, D.
(2017). Mastering the game of Go without human
knowledge. Nature, 550(7676):354–359.
Stein, A., Menssen, S., and H
¨
ahner, J. (2018). What about
Interpolation? A Radial Basis Function Approach to
Classifier Prediction Modeling in XCSF. In Proc.
of the GECCO, GECCO ’18, pages 537–544, New
York, NY, USA. Association for Computing Machin-
ery. event-place: Kyoto, Japan.
Stein, A., Rauh, D., Tomforde, S., and H
¨
ahner, J. (2017).
Interpolation in the eXtended Classifier System: An
architectural perspective. Journal of Systems Archi-
tecture, 75:79–94. Publisher: Elsevier.
Sutton, R. S. and Barto, A. G. (2018). Reinforcement learn-
ing: An introduction. MIT press.
Tsitsiklis, J. N. and Roy, B. V. (1997). An analysis of
temporal-difference learning with function approxi-
mation. IEEE Transactions on Automatic Control,
42(5):674–690.
Vinyals, O., Babuschkin, I., Chung, J., Mathieu, M., Jader-
berg, M., Czarnecki, W. M., Dudzik, A., Huang, A.,
Georgiev, P., Powell, R., Ewalds, T., Horgan, D.,
Kroiss, M., Danihelka, I., Agapiou, J., Oh, J., Dal-
ibard, V., Choi, D., Sifre, L., Sulsky, Y., Vezhnevets,
S., Molloy, J., Cai, T., Budden, D., Paine, T., Gul-
cehre, C., Wang, Z., Pfaff, T., Pohlen, T., Wu, Y.,
Interpolated Experience Replay for Continuous Environments
247