small number of objects, but that becomes quickly in-
tractable. In fact, scenarios 2 and 3 described above
could not be solved in a reasonable amount of time.
Concerning the scenario 1 introduced above, 50 tri-
als were executed and an optimal solution is found in
≈ 60% of cases (100% in the case of learning tech-
niques). In the remaining 40% of trials, all obstacles
are relocated leading to a sub-optimal solutions.
5 CONCLUSION
In this paper, a Reinforcement Learning approach
aimed at performing robotic target relocation in clut-
tered environments is presented. In detail, the pro-
posed method exploits, at a high level, a Q-learning
approach on a dynamic tree structure in order to chose
optimal sequences of obstacles to relocate while, at a
low level, a constraint motion planning is adopted to
plan feasible trajectories for object relocation. Several
exploration strategies of the solution tree, based on
a Breadth-First-Search technique, are presented and
compared, showing that an ε-Greedy approach with
heuristics outperforms other baseline methods and ef-
ficiently solve the problem.
Concerning future work, a pragmatic comprise be-
tween elaboration time and optimality will be sought
by investigating also a Depth-First-Search paradigm,
as well as different and more elaborated reward func-
tions, so as to take into account energetic issues or
other properties of the objects. Similarly, a compari-
son with a Deep Q-Network approach will be done.
REFERENCES
Cheong, S. H., Cho, B. Y., Lee, J., Kim, C., and Nam, C.
(2020). Where to relocate?: Object rearrangement in-
side cluttered and confined environments for robotic
manipulation. arXiv preprint arXiv:2003.10863.
Dantam, N. T., Kingston, Z. K., Chaudhuri, S., and Kavraki,
L. E. (2016). Incremental task and motion planning: A
constraint-based approach. In Robotics: Science and
systems, volume 12. Ann Arbor, MI, USA.
Deng, Y., Guo, X., Wei, Y., Lu, K., Fang, B., Guo, D., Liu,
H., and Sun, F. (2019). Deep reinforcement learning
for robotic pushing and picking in cluttered environ-
ment. In 2019 IEEE/RSJ International Conf. on Intel-
ligent Robots and Systems (IROS), pages 619–626.
Di Lillo, P., Arrichiello, F., Di Vito, D., and Antonelli,
G. (2020). BCI-controlled assistive manipulator: de-
veloped architecture and experimental results. IEEE
Trans. on Cognitive and Developmental Systems.
Di Lillo, P., Simetti, E., Wanderlingh, F., Casalino, G., and
Antonelli, G. (2021). Underwater intervention with
remote supervision via satellite communication: De-
veloped control architecture and experimental results
within the dexrov project. IEEE Trans. on Control
Systems Technology, 29(1):108–123.
Di Vito, D., Bergeron, M., Meger, D., Dudek, G., and An-
tonelli, G. (2020). Dynamic planning of redundant
robots within a set-based task-priority inverse kine-
matics framework. In 2020 IEEE Conf. on Control
Technology and Applications (CCTA), pages 549–554.
Dogar, M. R., Koval, M. C., Tallavajhula, A., and Srini-
vasa, S. S. (2014). Object search by manipulation.
Autonomous Robots, 36(1-2):153–167.
Eppe, M., Nguyen, P. D., and Wermter, S. (2019). From se-
mantics to execution: Integrating action planning with
reinforcement learning for robotic causal problem-
solving. Frontiers in Robotics and AI, 6:123.
G
´
eron, A. (2019). Hands-on machine learning with Scikit-
Learn, Keras, and TensorFlow: Concepts, tools, and
techniques to build intelligent systems. O’Reilly Me-
dia.
Jang, B., Kim, M., Harerimana, G., and Kim, J. W. (2019).
Q-learning algorithms: A comprehensive classifi-
cation and applications. IEEE Access, 7:133653–
133667.
Lagriffoul, F., Dimitrov, D., Bidot, J., Saffiotti, A., and
Karlsson, L. (2014). Efficiently combining task
and motion planning using geometric constraints.
The International Journal of Robotics Research,
33(14):1726–1747.
LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learn-
ing. nature, 521(7553):436–444.
Lee, J., Cho, Y., Nam, C., Park, J., and Kim, C. (2019).
Efficient obstacle rearrangement for object manipula-
tion tasks in cluttered environments. In 2019 Inter-
national Conf. on Robotics and Automation (ICRA),
pages 183–189.
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A.,
Antonoglou, I., Wierstra, D., and Riedmiller, M.
(2013). Playing ATARI with deep reinforcement
learning. arXiv preprint arXiv:1312.5602.
Mohammed, M. Q., Chung, K. L., and Chyi, C. S. (2020).
Review of deep reinforcement learning-based object
grasping: Techniques, open challenges and recom-
mendations. IEEE Access.
Nam, C., Lee, J., Cheong, S. H., Cho, B. Y., and Kim,
C. (2020). Fast and resilient manipulation plan-
ning for target retrieval in clutter. arXiv preprint
arXiv:2003.11420.
Siciliano, B. and Slotine, J.-J. E. (1991). A general frame-
work for managing multiple tasks in highly redundant
robotic systems. In Proc. Fifth International Conf. on
Advanced Robotics (ICAR), pages 1211–1216.
Srivastava, S., Fang, E., Riano, L., Chitnis, R., Russell, S.,
and Abbeel, P. (2014). Combined task and motion
planning through an extensible planner-independent
interface layer. In 2014 IEEE International Conf. on
Robotics and Automation (ICRA), pages 639–646.
Watkins, C. J. and Dayan, P. (1992). Q-learning. Machine
learning, 8(3-4):279–292.
Yuan, W., Hang, K., Kragic, D., Wang, M. Y., and Stork,
J. A. (2019). End-to-end nonprehensile rearrangement
with deep reinforcement learning and simulation-to-
reality transfer. Robotics and Autonomous Systems,
119:119–134.
Task-motion Planning via Tree-based Q-learning Approach for Robotic Object Displacement in Cluttered Spaces
137