
Dulac-Arnold, G., Mankowitz, D. J., and Hester, T. (2019).
Challenges of Real-World Reinforcement Learning.
ArXiv, abs/1904.12901.
Edwards, A. D., Downs, L., and Davidson, J. C. (2018).
Forward-Backward Reinforcement Learning. ArXiv,
abs/1803.10227.
Fu, J., Kumar, A., Nachum, O., Tucker, G., and Levine, S.
(2020). D4RL: Datasets for Deep Data-Driven Rein-
forcement Learning. ArXiv, abs/2004.07219.
Fujimoto, S. and Gu, S. S. (2021). A Minimalist Ap-
proach to Offline Reinforcement Learning. ArXiv,
abs/2106.06860.
Fujimoto, S., Meger, D., and Precup, D. (2018). Off-Policy
Deep Reinforcement Learning without Exploration.
In International Conference on Machine Learning.
Gelada, C. and Bellemare, M. G. (2019). Off-Policy Deep
Reinforcement Learning by Bootstrapping the Covari-
ate Shift. In AAAI Conference on Artificial Intelli-
gence.
Goyal, A., Brakel, P., Fedus, W., Singhal, S., Lillicrap, T. P.,
Levine, S., Larochelle, H., and Bengio, Y. (2018). Re-
call Traces: Backtracking Models for Efficient Rein-
forcement Learning. ArXiv, abs/1804.00379.
Hasselt, H. V., Hessel, M., and Aslanides, J. (2019). When
to use parametric models in reinforcement learning?
ArXiv, abs/1906.05243.
He, H. (2023). A Survey on Offline Model-Based Rein-
forcement Learning. ArXiv, abs/2305.03360.
Holyoak, K. J. and Simon, D. (1999). Bidirectional
reasoning in decision making by constraint satisfac-
tion. Journal of Experimental Psychology: General,
128:3–31.
Jafferjee, T., Imani, E., Talvitie, E. J., White, M., and
Bowling, M. (2020). Hallucinating Value: A Pitfall
of Dyna-style Planning with Imperfect Environment
Models. ArXiv, abs/2006.04363.
Jain, V. and Ravanbakhsh, S. (2023). Learning to Reach
Goals via Diffusion. ArXiv, abs/2310.02505.
Janner, M., Li, Q., and Levine, S. (2021). Offline Rein-
forcement Learning as One Big Sequence Modeling
Problem. In Neural Information Processing Systems.
Jiang, M., Dennis, M., Parker-Holder, J., Foerster, J. N.,
Grefenstette, E., and Rocktaschel, T. (2021). Replay-
Guided Adversarial Environment Design. In Neural
Information Processing Systems.
Kidambi, R., Rajeswaran, A., Netrapalli, P., and Joachims,
T. (2020). MOReL : Model-Based Offline Reinforce-
ment Learning. ArXiv, abs/2005.05951.
Kostrikov, I., Nair, A., and Levine, S. (2021). Offline Rein-
forcement Learning with Implicit Q-Learning. ArXiv,
abs/2110.06169.
Kumar, A., Agarwal, R., Ma, T., Courville, A. C., Tucker,
G., and Levine, S. (2021). DR3: Value-Based Deep
Reinforcement Learning Requires Explicit Regular-
ization. ArXiv, abs/2112.04716.
Kumar, A., Zhou, A., Tucker, G., and Levine, S. (2020).
Conservative Q-Learning for Offline Reinforcement
Learning. ArXiv, abs/2006.04779.
Kuttler, H., Nardelli, N., Miller, A. H., Raileanu, R.,
Selvatici, M., Grefenstette, E., and Rockt
¨
aschel, T.
(2020). The NetHack Learning Environment. ArXiv,
abs/2006.13760.
Lai, H., Shen, J., Zhang, W., and Yu, Y. (2020). Bidi-
rectional Model-based Policy Optimization. ArXiv,
abs/2007.01995.
Lambert, N., Wulfmeier, M., Whitney, W. F., Byravan, A.,
Bloesch, M., Dasagi, V., Hertweck, T., and Ried-
miller, M. A. (2022). The Challenges of Explo-
ration for Offline Reinforcement Learning. ArXiv,
abs/2201.11861.
Laroche, R. and Trichelair, P. (2017). Safe Policy Improve-
ment with Baseline Bootstrapping. In International
Conference on Machine Learning.
Lee, K., Seo, Y., Lee, S., Lee, H., and Shin, J. (2020).
Context-aware Dynamics Model for Generalization
in Model-Based Reinforcement Learning. ArXiv,
abs/2005.06800.
Levine, S., Kumar, A., Tucker, G., and Fu, J. (2020). Offline
Reinforcement Learning: Tutorial, Review, and Per-
spectives on Open Problems. ArXiv, abs/2005.01643.
Li, G., Shi, L., Chen, Y., Chi, Y., and Wei, Y. (2022). Set-
tling the Sample Complexity of Model-Based Offline
Reinforcement Learning. ArXiv, abs/2204.05275.
Liu, J., Zhang, H., and Wang, D. (2022). DARA:
Dynamics-Aware Reward Augmentation in Offline
Reinforcement Learning. ArXiv, abs/2203.06662.
Liu, J., Zhang, Z., Wei, Z., Zhuang, Z., Kang, Y., Gai,
S., and Wang, D. (2023). Beyond OOD State Ac-
tions: Supported Cross-Domain Offline Reinforce-
ment Learning. ArXiv, abs/2306.12755.
Liu, S., See, K. C., Ngiam, K. Y., Celi, L. A., Sun, X., and
Feng, M. (2020a). Reinforcement Learning for Clini-
cal Decision Support in Critical Care: Comprehensive
Review. Journal of Medical Internet Research, 22.
Liu, Y., Swaminathan, A., Agarwal, A., and Brunskill, E.
(2019). Off-Policy Policy Gradient with Stationary
Distribution Correction. ArXiv, abs/1904.08473.
Liu, Y., Swaminathan, A., Agarwal, A., and Brunskill, E.
(2020b). Provably Good Batch Off-Policy Reinforce-
ment Learning Without Great Exploration. In Neural
Information Processing Systems.
Lowrey, K., Rajeswaran, A., Kakade, S. M., Todorov, E.,
and Mordatch, I. (2018). Plan Online, Learn Offline:
Efficient Learning and Exploration via Model-Based
Control. ArXiv, abs/1811.01848.
Lu, C., Ball, P. J., Parker-Holder, J., Osborne, M. A., and
Roberts, S. J. (2021). Revisiting Design Choices in
Offline Model Based Reinforcement Learning. In In-
ternational Conference on Learning Representations.
Luo, F., Xu, T., Lai, H., Chen, X.-H., Zhang, W., and Yu,
Y. (2022). A Survey on Model-based Reinforcement
Learning. Sci. China Inf. Sci., 67.
Lyu, J., Li, X., and Lu, Z. (2022). Double Check Your
State Before Trusting It: Confidence-Aware Bidi-
rectional Offline Model-Based Imagination. ArXiv,
abs/2206.07989.
Ma, X., Yang, Y., Hu, H., Liu, Q., Yang, J., Zhang, C.,
Zhao, Q., and Liang, B. (2021a). Offline Reinforce-
ment Learning with Value-based Episodic Memory.
ArXiv, abs/2110.09796.
Cross-Domain Generalization with Reverse Dynamics Models in Offline Model-Based Reinforcement Learning
67