
5 CONCLUSION AND
PERSPECTIVES
In this work, we consider the autonomous navigation
of a single drone using a state-of-the-art reinforce-
ment learning algorithm called proximal policy op-
timization. We split the navigation goal into two sep-
arate tasks: takeoff to join a target takeoff point and
a simple navigation task. We then combine the two
tasks to perform complete and autonomous naviga-
tion from the ground to a destination point. To model
these problems, we propose an adapted Markov Deci-
sion Process with a new reward, that enables the drone
to accomplish each task. Moreover, we improve the
reward formulation to encourage the drone to perform
smoother and more stable movements.
The numerical simulations are conducted in the
Pybullet simulator for drones, and using a well-
known reinforcement learning library called Stable-
Baselines3. Results show that the successful reward
that enables the drone to accomplish the takeoff and
the navigation tasks is non-negative, differentiable
and bounded. We also learn from this numerical study
that the training time can be significantly decreased
when training the drone in parallel in different envi-
ronments. We conclude from this study that reinforce-
ment learning approaches are promising techniques
for drone navigation. However, they have two main
drawbacks: (i) the challenging problem of designing
relevant rewards that include all the objectives in one
formulation, (ii) the need for millions of interactions
with the simulator in order to learn meaningful poli-
cies. The latter is called sample inefficiency, and it is
a well-known problem in reinforcement learning.
Future tracks of research include the extension
of this work to consider the landing on a specific
point. Designing a reward formulation that can in-
tegrate all these objectives (takeoff, navigation, land-
ing) together with obstacle avoidance can also be a
perspective of this work.
ACKNOWLEDGEMENTS
The authors would like to thank Jimmy Debladis, Be-
sma Khalfoun and Fadi Namour From Capgemini En-
gineering, for the technical discussions that greatly
improved the quality of this study.
REFERENCES
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen,
Z., Citro, C., Corrado, G. S., Davis, A., Dean, J.,
Devin, M., Ghemawat, S., Goodfellow, I., Harp, A.,
Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser,
L., Kudlur, M., Levenberg, J., Man
´
e, D., Monga, R.,
Moore, S., Murray, D., Olah, C., Schuster, M., Shlens,
J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P.,
Vanhoucke, V., Vasudevan, V., Vi
´
egas, F., Vinyals,
O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y.,
and Zheng, X. (2015). TensorFlow: Large-scale ma-
chine learning on heterogeneous systems. https:
//www.tensorflow.org/tensorboard.
AlMahamid, F. and Grolinger, K. (2022). Autonomous un-
manned aerial vehicle navigation using reinforcement
learning: A systematic review. Engineering Applica-
tions of Artificial Intelligence, 115:24 pages.
Beeching, E., Debangoye, J., Simonin, O., and Wolf, C.
(2021). Deep reinforcement learning on a budget: 3D
control and reasoning without a supercomputer. In
25th International Conference on Pattern Recognition
(ICPR).
Brittain, M. and Wei, P. (2021). Autonomous aircraft se-
quencing and separation with hierarchical deep rein-
forcement learning. In Learning-based decision mak-
ing for safe and scalable autonomous separation as-
surance.
Dankwa, S. and Zheng, W. (2019). Twin-delayed DDPG:
A deep reinforcement learning technique to model a
continuous movement of an intelligent robot agent. In
Proceedings of the 3rd International Conference on
Vision, Image and Signal Processing.
Erwin, C. and Yunfei, B. (2019). Pybullet, a python mod-
ule for physics simulation for games, robotics and ma-
chine learning. http://pybullet.org.
Furrer, F., Burri, M., Achtelik, M., and Siegwart, R. (2016).
Rotors—a modular gazebo MAV simulator frame-
work. In Robot operating system (ROS).
Hill, A., Raffin, A., Ernestus, M., Gleave, A., Kanervisto,
A., Traore, R., Dhariwal, P., Hesse, C., Klimov, O.,
Nichol, A., Plappert, M., Radford, A., Schulman, J.,
Sidor, S., and Wu, Y. (2018). Stable baselines. https:
//github.com/hill-a/stable-baselines.
Ikli, S. (2022). A rolling horizon approach for the dynamic
scheduling of flying taxis. In Proceedings of the IJCCI
2022 Conference. SCITEPRESS.
Morad, S. D., Mecca, R., Poudel, R. P., Liwicki, S., and
Cipolla, R. (2021). Embodied visual navigation with
automatic curriculum learning in real environments.
IEEE Robotics and Automation Letters, 2:683–690.
Quigley, M., Conley, K., Gerkey, B., Faust, J., Foote, T.,
Leibs, J., Wheeler, R., Ng, A. Y., et al. (2009). ROS:
An open-source robot operating system. In ICRA
workshop on open source software.
Sampedro, C., Bavle, H., Rodriguez-Ramos, A.,
de La Puente, P., and Campoy, P. (2018). Laser-based
reactive navigation for multirotor aerial robots using
deep reinforcement learning. In IEEE/RSJ Interna-
tional Conference on Intelligent Robots and Systems
(IROS).
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and
Klimov, O. (2017). Proximal policy optimization al-
gorithms. In arXiv preprint arXiv:1707.06347g.
ICAART 2024 - 16th International Conference on Agents and Artificial Intelligence
70