
Controller to provide low-level control. On top of
this, reinforcement learning is integrated to enhance
the controller’s overall performance. To evaluate the
effectiveness of the platform and the proposed meth-
ods, we conducted one typical simulated experiment.
The results demonstrate the feasibility of the platform
and its associated methodologies. Furthermore, the
system exhibits potential for handling more complex
tasks in the future.
In our future work, we aim to extend this research
to real-world experiments based on the simulated re-
sults. Additionally, further investigation is needed
into multi-agent training with critic updates.
REFERENCES
Andrychowicz, M., Wolski, F., Ray, A., Schneider, J., Fong,
R., Welinder, P., McGrew, B., Tobin, J., Abbeel, P.,
and Zaremba, W. (2017). Hindsight experience replay.
arXiv preprint arXiv:1707.01495.
Brockman, G., Cheung, V., Pettersson, L., Schneider, J.,
Schulman, J., Tang, J., and Zaremba, W. (2016). Ope-
nai gym.
Carron, A., Arcari, E., Wermelinger, M., Hewing, L., Hut-
ter, M., and Zeilinger, M. N. (2019). Data-driven
model predictive control for trajectory tracking with a
robotic arm. IEEE Robotics and Automation Letters,
4:3758–3765.
Chollet, F. et al. (2015). Keras.
Coumans, E. and Bai, Y. (2016–2021). Pybullet, a python
module for physics simulation for games, robotics and
machine learning. http://pybullet.org.
Dietrich, A., Wimb
¨
ock, T., Albu-Sch
¨
affer, A. O., and
Hirzinger, G. (2012). Reactive whole-body control:
Dynamic mobile manipulation using a large number
of actuated degrees of freedom. IEEE Robotics & Au-
tomation Magazine, 19:20–33.
Fujimoto, S. and Gu, S. S. (2021). A minimalist ap-
proach to offline reinforcement learning. In Ran-
zato, M., Beygelzimer, A., Dauphin, Y., Liang, P.,
and Vaughan, J. W., editors, Advances in Neural Infor-
mation Processing Systems, volume 34, pages 20132–
20145. Curran Associates, Inc.
Fujimoto, S., Meger, D., and Precup, D. (2019). Off-policy
deep reinforcement learning without exploration.
Grabowski, A., Jankowski, J., and Wodzy
´
nski, M. (2021).
Teleoperated mobile robot with two arms: the in-
fluence of a human-machine interface, vr training
and operator age. International Journal of Human-
Computer Studies, 156:102707.
Guo, H., Su, K.-L., Hsia, K.-H., and Wang, J.-T. (2016).
Development of the mobile robot with a robot arm.
In 2016 IEEE International Conference on Industrial
Technology (ICIT), pages 1648–1653.
Guo, N., Li, C., Wang, D., Song, Y., Liu, G., and Gao, T.
(2021). Local path planning of mobile robot based on
long short-term memory neural network. Automatic
Control and Computer Sciences, 55:53–65.
Han, G., Wang, J., Ju, X., and Zhao, M. (2021). Recursive
hierarchical projection for whole-body control with
task priority transition. 2022 IEEE/RSJ International
Conference on Intelligent Robots and Systems (IROS),
pages 11312–11319.
Iriondo, A., Lazkano, E., Susperregi, L., Urain, J., Fernan-
dez, A., and Molina, J. (2019). Pick and place op-
erations in logistics using a mobile manipulator con-
trolled with deep reinforcement learning. 9(2):348.
Jauhri, S., Peters, J., and Chalvatzaki, G. (2022). Robot
learning of mobile manipulation with reachability be-
havior priors. IEEE Robotics and Automation Letters,
7:8399–8406.
Kalashnikov, D., Irpan, A., Pastor, P., Ibarz, J., Herzog, A.,
Jang, E., Quillen, D., Holly, E., Kalakrishnan, M.,
Vanhoucke, V., et al. (2018). Qt-opt: Scalable deep
reinforcement learning for vision-based robotic ma-
nipulation. arXiv preprint arXiv:1806.10293.
Kim, S., Jang, K., Park, S., Lee, Y., Lee, S. Y., and Park, J.
(2019). Whole-body control of non-holonomic mobile
manipulator based on hierarchical quadratic program-
ming and continuous task transition. 2019 IEEE 4th
International Conference on Advanced Robotics and
Mechatronics (ICARM), pages 414–419.
Kindle, J., Furrer, F., Novkovic, T., Chung, J. J., Sieg-
wart, R., and Nieto, J. (2020). Whole-body control of
a mobile manipulator using end-to-end reinforcement
learning. arXiv preprint arXiv:2003.02637.
Li, C., Liu, Y., Bing, Z., Schreier, F., Seyler, J., and
Eivazi, S. (2023a). Accelerate training of reinforce-
ment learning agent by utilization of current and pre-
vious experience. In ICAART (3), pages 698–705.
Li, C., Liu, Y., Bing, Z., Seyler, J., and Eivazi, S. (2022).
Correction to: A novel reinforcement learning sam-
pling method without additional environment feed-
back in hindsight experience replay. In Kim, J., En-
glot, B., Park, H.-W., Choi, H.-L., Myung, H., Kim, J.,
and Kim, J.-H., editors, Robot Intelligence Technology
and Applications 6, pages C1–C1, Cham. Springer In-
ternational Publishing.
Li, C., Liu, Y., Hu, Y., Schreier, F., Seyler, J., and Eivazi,
S. (2023b). Novel methods inspired by reinforcement
learning actor-critic mechanism for eye-in-hand cali-
bration in robotics. In 2023 IEEE International Con-
ference on Development and Learning (ICDL), pages
87–92.
Li, W. and Xiong, R. (2019). Dynamical obstacle avoidance
of task- constrained mobile manipulation using model
predictive control. IEEE Access, 7:88301–88311.
Lober, R., Padois, V., and Sigaud, O. (2016). Efficient re-
inforcement learning for humanoid whole-body con-
trol. In 2016 IEEE-RAS 16th International Confer-
ence on Humanoid Robots (Humanoids), pages 684–
689. IEEE.
Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., and Mor-
datch, I. (2020). Multi-agent actor-critic for mixed
cooperative-competitive environments.
ICAART 2025 - 17th International Conference on Agents and Artificial Intelligence
66