section 3) of triangles depending linearly on the angle
α and the remaining volume in the container.
A multi-layer perceptron with two hidden layers
and 2000 parameters is trained. To simplify the ex-
periments, only two actions are used: ∆α ∈ {−1, 1}.
A single experimental episode of the robot arm is
used as an offline training sequence, creating multiple
training sequences by applying random goal volumes
according to equation 6.
Experiments show a fast convergence. The result-
ing policy network pours liquid within the simulation
matching an arbitrary given goal volume.
5 CONCLUSION
A method has been outlined for constructing a con-
troller tasked with pouring a specified quantity of liq-
uid into a receiving container.
The approach comprises two primary steps: first, a
preprocessing phase that extracts relevant image fea-
tures, followed by the implementation of a policy net-
work. Importantly, the policy network operates with
a low input dimension, as image preprocessing is ap-
plied using a separate image processing tool.
In addition, only ground truth data measured from
the real laboratory setup is used. Therefore, no simu-
lation of the setup is required to train the policy net-
work. The policy network is trained in an offline man-
ner using the data from the laboratory setup.
A valuable aspect of the proposed approach is its
capacity to derive multiple training sequences from a
single experimental sequence.
The method presented in this paper exhibits ro-
bustness in handling variations in the initial volumes
within the source container, achieved through control
of relative pouring angles. Moreover, the method ex-
clusively measures the liquid exiting the source con-
tainer, enabling its applicability in scenarios where
the liquid within the target container is not visible or
measurable, such as when watering plants.
The method has been implemented within a pour-
ing liquid simulation and shows fast convergence and
independence of goal volumes. Next, the data from
the physical system will be included to generate a
controller for the UR5 robot arm.
REFERENCES
Arora, S. and Doshi, P. (2021). A survey of inverse
reinforcement learning: Challenges, methods and
progress. Artificial Intelligence, 297:103500.
Cobo, M., Heredia, I., Aguilar, F., Lloret Iglesias, L.,
Garc
´
ıa, D., Bartolom
´
e, B., Moreno-Arribas, M. V.,
Yuste, S., P
´
erez-Matute, P., and Motilva, M.-J. (2022).
Artificial intelligence to estimate wine volume from
single-view images. Heliyon, 8(9):e10557.
Kallus, N. and Uehara, M. (2020). Statistically efficient off-
policy policy gradients.
Kroese, D., Mannor, S., and Rubinstein, R. (2005). A tu-
torial on the cross-entropy method. Annals of Opera-
tions Research, 134.
Levine, S., Kumar, A., Tucker, G., and Fu, J. (2020). Offline
reinforcement learning: Tutorial, review, and perspec-
tives on open problems. CoRR, abs/2005.01643.
Liu, Y., Swaminathan, A., Agarwal, A., and Brunskill, E.
(2019). Off-policy policy gradient with state distribu-
tion correction.
Liu, Z., Liu, F., Zeng, Q., Yin, X., and Yang, Y. (2023). Es-
timation of drinking water volume of laboratory ani-
mals based on image processing. Scientific Reports,
13.
Moradi, H., Masouleh, M. T., and Moshiri, B. (2021).
Robots learn visual pouring task using deep reinforce-
ment learning with minimal human effort. pages 504–
510. Institute of Electrical and Electronics Engineers
Inc.
Pareigis, S. and Maaß, F. L. (2023). Improved Robust Neu-
ral Network for Sim2Real Gap in System Dynamics
for End-To-End Autonomous Driving. LNEE Series
published by Springer, to appear.
Schenck, C. and Fox, D. (2017). Visual closed-loop control
for pouring liquids. In 2017 IEEE International Con-
ference on Robotics and Automation (ICRA), pages
2629–2636.
Sutton, R. S. and Barto, A. G. (2018). Reinforcement Learn-
ing: An Introduction. The MIT Press, second edition.
Tamosiunaite, M., Nemec, B., Ude, A., and W
¨
org
¨
otter, F.
(2011). Learning to pour with a robot arm combining
goal and shape learning for dynamic movement prim-
itives. Robotics and Autonomous Systems, 59:910–
922.
Williams, R. J. (1992). Simple statistical gradient-following
algorithms for connectionist reinforcement learning.
Mach. Learn., 8(3–4):229–256.
Zhao, W., Pe
˜
na Queralta, J., and Westerlund, T. (2020).
Sim-to-Real Transfer in Deep Reinforcement Learn-
ing for Robotics: a Survey. arXiv e-prints, page
arXiv:2009.13303.
ICINCO 2023 - 20th International Conference on Informatics in Control, Automation and Robotics
326