vides new research direction for QT-Opt learning over
real world robot tasks.
6 FUTURE WORK
In the future, we plan to use a hybrid action space in-
cluding both continuous joint control and discrete fin-
ger move. Based on current results, we expect to have
the significant improvement in hybrid action space as
well. In engineering, we also plan to mix simulation
and real robot, then the neural networks weight can be
transferred to real robot using transfer learning tech-
niques (Torrey and Shavlik, 2010). We expect that
the agent learn faster in further training in real world
compare to start training without simulation data.
REFERENCES
Andrychowicz, M., Wolski, F., Ray, A., Schneider, J., Fong,
R., Welinder, P., McGrew, B., Tobin, J., Abbeel, P.,
and Zaremba, W. (2017). Hindsight experience replay.
arXiv preprint arXiv:1707.01495.
Barron, E. and Ishii, H. (1989). The bellman equation for
minimizing the maximum cost. Nonlinear Analysis:
Theory, Methods & Applications, 13(9):1067–1090.
Bennett, C. C. and Hauser, K. (2013). Artificial intelligence
framework for simulating clinical decision-making: A
markov decision process approach. Artificial intelli-
gence in medicine, 57(1):9–19.
Bodnar, C., Li, A., Hausman, K., Pastor, P., and
Kalakrishnan, M. (2019). Quantile qt-opt for risk-
aware vision-based robotic grasping. arXiv preprint
arXiv:1910.02787.
Brockman, G., Cheung, V., Pettersson, L., Schneider, J.,
Schulman, J., Tang, J., and Zaremba, W. (2016). Ope-
nai gym. arXiv preprint arXiv:1606.01540.
Coumans, E. and Bai, Y. (2016–2021). Pybullet, a python
module for physics simulation for games, robotics and
machine learning. http://pybullet.org .
Fujimoto, S., van Hoof, H., and Meger, D. (2018). Ad-
dressing function approximation error in actor-critic
methods. CoRR, abs/1802.09477.
Fujita, Y., Uenishi, K., Ummadisingu, A., Nagarajan, P.,
Masuda, S., and Castro, M. Y. (2020). Distributed
reinforcement learning of targeted grasping with ac-
tive vision for mobile manipulators. arXiv preprint
arXiv:2007.08082.
Gallou
´
edec, Q., Cazin, N., Dellandr
´
ea, E., and Chen, L.
(2021). Multi-goal reinforcement learning environ-
ments for simulated franka emika panda robot.
Garcia, F. and Rachelson, E. (2013). Markov decision pro-
cesses. Markov Decision Processes in Artificial Intel-
ligence, pages 1–38.
Hammersley, J. (2013). Monte carlo methods. Springer
Science & Business Media.
Hasselt, H. (2010). Double q-learning. Advances in neural
information processing systems, 23:2613–2621.
Kalashnikov, D., Irpan, A., Pastor, P., Ibarz, J., Herzog, A.,
Jang, E., Quillen, D., Holly, E., Kalakrishnan, M.,
Vanhoucke, V., et al. (2018). Qt-opt: Scalable deep
reinforcement learning for vision-based robotic ma-
nipulation. arXiv preprint arXiv:1806.10293.
Li, Y. (2017). Deep reinforcement learning: An overview.
arXiv preprint arXiv:1701.07274.
Puterman, M. L. (1990). Markov decision processes. Hand-
books in operations research and management sci-
ence, 2:331–434.
Ren, Z., Dong, K., Zhou, Y., Liu, Q., and Peng, J. (2019).
Exploration via hindsight goal generation. arXiv
preprint arXiv:1906.04279.
Sutton, R. S. and Barto, A. G. (2018). Reinforcement learn-
ing: An introduction. MIT press.
Szepesv
´
ari, C. (2010). Algorithms for reinforcement learn-
ing. Synthesis lectures on artificial intelligence and
machine learning, 4(1):1–103.
Torrey, L. and Shavlik, J. (2010). Transfer learning. In
Handbook of research on machine learning appli-
cations and trends: algorithms, methods, and tech-
niques, pages 242–264. IGI global.
Van Hasselt, H., Guez, A., and Silver, D. (2016). Deep re-
inforcement learning with double q-learning. In Pro-
ceedings of the AAAI conference on artificial intelli-
gence, volume 30.
Vinyals, O., Babuschkin, I., Czarnecki, W. M., Mathieu,
M., Dudzik, A., Chung, J., Choi, D. H., Powell, R.,
Ewalds, T., Georgiev, P., et al. (2019). Grandmas-
ter level in starcraft ii using multi-agent reinforcement
learning. Nature, 575(7782):350–354.
Wiering, M. A. and Van Otterlo, M. (2012). Reinforce-
ment learning. Adaptation, learning, and optimiza-
tion, 12(3).
Zhang, F., Leitner, J., Milford, M., Upcroft, B., and Corke,
P. (2015). Towards vision-based deep reinforcement
learning for robotic motion control. arXiv preprint
arXiv:1511.03791.
Zhao, R. and Tresp, V. (2018). Energy-based hindsight
experience prioritization. In Conference on Robot
Learning, pages 113–122. PMLR.
Accelerate Training of Reinforcement Learning Agent by Utilization of Current and Previous Experience
705