measures, namely: action variation reward penalty
(AVP) for all methods and regularization of the ac-
tor network through CAPS for the MF-RL methods.
As a result, we retained CAPS as action regulation
method due to its consistency and ease of tuning
across all MF-RL agents. Another path to smooth ac-
tions for TD-MPC (for which CAPS is not applicable)
is MoDeM-v2 (Lancaster et al., 2023) which aims at
learning safe policies by biasing the initial data dis-
tribution towards a desired behavior, in our case an
action smooth controller.
We identify several future research directions:
one could focus on experimenting with probabilistic
models for TD-MPC in order to better capture the
stochasticity of turbulence dynamics, thus formulat-
ing a more rigorous transition function that outputs
a probability distribution. For both gusts and turbu-
lence, RL algorithms with recurrent networks appear
to be a good starting point (Ni et al., 2022; Asri and
Trischler, 2019) to solve PO-MDPs. One could also
use ideas from robust-RL to achieve disturbance re-
siliency of RL controllers (Hsu et al., 2024). Other
directions include learning-based adaptive control as
a field of potential mixed control methods (Shi et al.,
2019; Doukhi and Lee, 2019), where a closed form
of the nominal dynamics is used together with a feed-
forward component of the unknown, disturbance dy-
namics predicted by a learned model.
(1980). Flying qualities of piloted airplanes, military spec-
ification. Technical report, MIL-F-8785C.
Andrychowicz, O. M., Baker, B., Chociej, M., Jozefowicz,
R., McGrew, B., Pachocki, J., Petron, A., Plappert,
M., Powell, G., Ray, A., et al. (2020). Learning dex-
terous in-hand manipulation. The International Jour-
nal of Robotics Research, 39(1):3–20.
Asri, L. E. and Trischler, A. (2019). A study of state aliasing
in structured prediction with RNNs. arXiv preprint
Beard, R. W. and McLain, T. W. (2012). Small Unmanned
Aircraft: Theory and Practice. Princeton University
Becker-Ehmck, P., Karl, M., Peters, J., and van der Smagt,
P. (2020). Learning to fly via deep model-based rein-
forcement learning. arXiv preprint arXiv:2003.08876.
Berndt, J. (2004). JSBSim: An open source flight dynam-
ics model in C++. In AIAA Modeling and Simulation
Technologies Conference and Exhibit, page 4923.
Bøhn, E., Coates, E. M., Moe, S., and Johansen, T. A.
(2019). Deep reinforcement learning attitude control
of fixed-wing UAVs using proximal policy optimiza-
tion. In International Conference on Unmanned Air-
craft Systems (ICUAS), pages 523–533.
Bøhn, E., Coates, E. M., Reinhardt, D., and Johansen, T. A.
(2023). Data-efficient deep reinforcement learning
for attitude control of fixed-wing UAVs: Field exper-
iments. IEEE Transactions on Neural Networks and
Learning Systems.
De Marco, A., D’Onza, P. M., and Manfredi, S. (2023).
A deep reinforcement learning control approach for
high-performance aircraft. Nonlinear Dynamics,
Doukhi, O. and Lee, D. J. (2019). Neural network-
based robust adaptive certainty equivalent controller
for quadrotor UAV with unknown disturbances. Inter-
national Journal of Control, Automation and Systems,
Gryte, K., Hann, R., Alam, M., Roh
c, J., Johansen, T. A.,
and Fossen, T. I. (2018). Aerodynamic modeling of
the Skywalker x8 fixed-wing unmanned aerial vehi-
cle. In International Conference on Unmanned Air-
craft Systems (ICUAS), pages 826–835.
Haarnoja, T., Zhou, A., Abbeel, P., and Levine, S. (2018).
Soft actor-critic: Off-policy maximum entropy deep
reinforcement learning with a stochastic actor. In In-
ternational Conference on Machine Learning, pages
1861–1870. PMLR.
Hafner, D., Pasukonis, J., Ba, J., and Lillicrap, T. (2023).
Mastering diverse domains through world models.
arXiv preprint arXiv:2301.04104.
Hansen, N., Su, H., and Wang, X. (2024). TD-MPC2:
Scalable, robust world models for continuous control.
In International Conference on Learning Representa-
tions (ICLR).
Hansen, N., Wang, X., and Su, H. (2022). Temporal differ-
ence learning for model predictive control. In Inter-
national Conference on Machine Learning.
Hsu, H.-L., Meng, H., Luo, S., Dong, J., Tarokh, V., and
Pajic, M. (2024). REFORMA: Robust reinforcement
learning via adaptive adversary for drones flying un-
der disturbances. In 2024 IEEE International Confer-
ence on Robotics and Automation (ICRA).
Huang, S., Dossa, R. F. J., Ye, C., Braga, J., Chakraborty,
D., Mehta, K., and Ara
ujo, J. G. (2022). CleanRL:
High-quality single-file implementations of deep re-
inforcement learning algorithms. Journal of Machine
Learning Research, 23(274):1–18.
Hwangbo, J., Sa, I., Siegwart, R., and Hutter, M. (2017).
Control of a quadrotor with reinforcement learning.
IEEE Robotics and Automation Letters, 2(4):2096–
Koch, W., Mancuso, R., West, R., and Bestavros, A.
(2019). Reinforcement learning for UAV attitude con-
trol. ACM Transactions on Cyber-Physical Systems,
Lambert, N. O., Drew, D. S., Yaconelli, J., Levine, S., Ca-
landra, R., and Pister, K. S. (2019). Low-level con-
trol of a quadrotor with deep model-based reinforce-
ment learning. IEEE Robotics and Automation Let-
ters, 4(4):4224–4230.
Lancaster, P., Hansen, N., Rajeswaran, A., and Kumar,
V. (2023). MoDem-V2: Visuo-motor world mod-
ICINCO 2024 - 21st International Conference on Informatics in Control, Automation and Robotics