(Mnih et al., 2016b) or A2C (Mnih et al., 2016a), to
see if similar success is achieved. Thirdly, the effect
of treating the parameters of the guide policy as addi-
tional trainable parameters within the Reinforcement
Learning algorithm in order to fine-tune the actions
of the guide policy might also be an interesting av-
enue for future research. Lastly, COIL could be fur-
ther tested and compared to other Imitation Learning
algorithms in various tasks in order to get a broader
understanding of how it compares to other existing
methods.
REFERENCES
Aslan, M. F., Unlersen, M. F., Sabanci, K., and Durdu,
A. (2021). Cnn-based transfer learning–bilstm net-
work: A novel approach for covid-19 infection detec-
tion. Applied Soft Computing, 98:106912.
Breiman, L. (2001). Random forests. Machine Learning,
45(1):5–32.
Chen, J., Yuan, B., and Tomizuka, M. (2019). Deep imita-
tion learning for autonomous driving in generic urban
scenarios with enhanced safety. In 2019 IEEE/RSJ In-
ternational Conference on Intelligent Robots and Sys-
tems (IROS), pages 2884–2890.
Durrleman, S. and Simon, R. (1989). Flexible regression
models with cubic splines. Statistics in medicine,
8(5):551–561.
Edward, J. (2021). Coarse-to-fine imitation learning : Robot
manipulation from a single demonstration. In 2021
IEEE International Conference on Robotics and Au-
tomation (ICRA), pages 4613–4619.
Fang, B., Jia, S., Guo, D., Xu, M., Wen, S., and Sun,
F. (2019). Survey of imitation learning for robotic
manipulation. International Journal of Intelligent
Robotics and Applications, 3(4):362–369.
Halbert, D. (1984). Programming by example. PhD thesis,
University of California, Berkeley.
Ho, J. and Ermon, S. (2016). Generative adversarial im-
itation learning. In Advances in neural information
processing systems 29.
Hornik, K., Stinchcombe, M., and White, H. (1989). Multi-
layer feedforward networks are universal approxima-
tors. Elsevier.
Liu, Y., Liu, Q., Zhao, H., Pan, Z., and Liu, C. (2020).
Adaptive quantitative trading: An imitative deep re-
inforcement learning approach. In Proceedings of the
AAAI conference on artificial intelligence, volume 34,
pages 2128–2135.
Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T.,
Harley, T., Silver, D., and Kavukcuoglu, K. (2016a).
Asynchronous methods for deep reinforcement learn-
ing. In International conference on machine learning,
pages 1928–1937.
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A.,
Antonoglou, I., Wierstra, D., and Riedmiller, M.
(2016b). Playing atari with deep reinforcement learn-
ing. In International conference on machine learning,
pages 1928–1937.
Osa, T., Pajarinen, J., Neumann, G., Bagnell, J. A., Abbeel,
P., and Peters, J. (2018). An algorithmic perspec-
tive on imitation learning. Foundations and Trends
in Robotics.
Pan, S. J. and Yang, Q. (2009). A survey on transfer learn-
ing. IEEE Transactions on Knowledge and Data En-
gineering, 22(10):1345–1359.
Pomerleau, D. (1998). An autonomous land vehicle in a
neural network. In Advances in Neural Information
Processing Systems.
Ross, S., Gordon, G., and Bagnell, D. (2011). A reduc-
tion of imitation learning and structured prediction
to no-regret online learning. In Proceedings of the
fourteenth international conference on artificial intel-
ligence and statistics, pages 627–635. JMLR Work-
shop and Conference Proceedings.
Ruder, S., Peters, M. E., Swayamdipta, S., and Wolf, T.
(2019). Transfer learning in natural language process-
ing. In Proceedings of the 2019 Conference of the
North American Chapter of the Association for Com-
putational Linguistics: Tutorials, pages 15–18.
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and
Klimov, O. (2020). Proximal policy optimization al-
gorithms. In Uncertainty in Artificial Intelligence,
pages 113–122.
Uchendu, I., Xiao, T., Lu, Y., Zhu, B., Yan, M., Simon,
J., and Bennice, M. (2022). Jump-start reinforcement
learning. arXiv preprint arXiv:2204.02372.
Wan, Z., Yang, R., Huang, M., Zeng, N., and Liu, X. (2021).
A review on transfer learning in eeg signal analysis.
Neurocomputing, 421:1–14.
Wang, J., Zhuang, Z., Wang, Y., and Zhao, H. (2022). Ad-
versarially robust imitation learning. In Conference on
Robot Learning, pages 320–331.
Wang, Y., He, H., and Tan, X. (2020). Truly proximal policy
optimization. In Uncertainty in Artificial Intelligence,
pages 113–122.
Watkins, C. J. and Dayan, P. (1992). Q-learning. Machine
learning, 8(3):279–292.
Yu, C., Velu, A., Vinitsky, E., Wang, Y., Bayen, A., and
Wu, Y. (2021). The surprising effectiveness of ppo
in cooperative, multi-agent games. arXiv preprint
arXiv:2103.01955.
Zhang, S., Cao, Z., Sadigh, D., and Sui, Y. (2021).
Confidence-aware imitation learning from demonstra-
tions with varying optimality. In Advances in Neu-
ral Information Processing Systems 34, pages 12340–
12350.
Contextual Online Imitation Learning (COIL): Using Guide Policies in Reinforcement Learning
185