measures, namely: action variation reward penalty
(AVP) for all methods and regularization of the ac-
tor network through CAPS for the MF-RL methods.
As a result, we retained CAPS as action regulation
method due to its consistency and ease of tuning
across all MF-RL agents. Another path to smooth ac-
tions for TD-MPC (for which CAPS is not applicable)
is MoDeM-v2 (Lancaster et al., 2023) which aims at
learning safe policies by biasing the initial data dis-
tribution towards a desired behavior, in our case an
action smooth controller.
We identify several future research directions:
one could focus on experimenting with probabilistic
models for TD-MPC in order to better capture the
stochasticity of turbulence dynamics, thus formulat-
ing a more rigorous transition function that outputs
a probability distribution. For both gusts and turbu-
lence, RL algorithms with recurrent networks appear
to be a good starting point (Ni et al., 2022; Asri and
Trischler, 2019) to solve PO-MDPs. One could also
use ideas from robust-RL to achieve disturbance re-
siliency of RL controllers (Hsu et al., 2024). Other
directions include learning-based adaptive control as
a field of potential mixed control methods (Shi et al.,
2019; Doukhi and Lee, 2019), where a closed form
of the nominal dynamics is used together with a feed-
forward component of the unknown, disturbance dy-
namics predicted by a learned model.
REFERENCES
(1980). Flying qualities of piloted airplanes, military spec-
ification. Technical report, MIL-F-8785C.
Andrychowicz, O. M., Baker, B., Chociej, M., Jozefowicz,
R., McGrew, B., Pachocki, J., Petron, A., Plappert,
M., Powell, G., Ray, A., et al. (2020). Learning dex-
terous in-hand manipulation. The International Jour-
nal of Robotics Research, 39(1):3–20.
Asri, L. E. and Trischler, A. (2019). A study of state aliasing
in structured prediction with RNNs. arXiv preprint
arXiv:1906.09310.
Beard, R. W. and McLain, T. W. (2012). Small Unmanned
Aircraft: Theory and Practice. Princeton University
Press.
Becker-Ehmck, P., Karl, M., Peters, J., and van der Smagt,
P. (2020). Learning to fly via deep model-based rein-
forcement learning. arXiv preprint arXiv:2003.08876.
Berndt, J. (2004). JSBSim: An open source flight dynam-
ics model in C++. In AIAA Modeling and Simulation
Technologies Conference and Exhibit, page 4923.
Bøhn, E., Coates, E. M., Moe, S., and Johansen, T. A.
(2019). Deep reinforcement learning attitude control
of fixed-wing UAVs using proximal policy optimiza-
tion. In International Conference on Unmanned Air-
craft Systems (ICUAS), pages 523–533.
Bøhn, E., Coates, E. M., Reinhardt, D., and Johansen, T. A.
(2023). Data-efficient deep reinforcement learning
for attitude control of fixed-wing UAVs: Field exper-
iments. IEEE Transactions on Neural Networks and
Learning Systems.
De Marco, A., D’Onza, P. M., and Manfredi, S. (2023).
A deep reinforcement learning control approach for
high-performance aircraft. Nonlinear Dynamics,
111(18):17037–17077.
Doukhi, O. and Lee, D. J. (2019). Neural network-
based robust adaptive certainty equivalent controller
for quadrotor UAV with unknown disturbances. Inter-
national Journal of Control, Automation and Systems,
17(9):2365–2374.
Gryte, K., Hann, R., Alam, M., Roh
´
a
ˇ
c, J., Johansen, T. A.,
and Fossen, T. I. (2018). Aerodynamic modeling of
the Skywalker x8 fixed-wing unmanned aerial vehi-
cle. In International Conference on Unmanned Air-
craft Systems (ICUAS), pages 826–835.
Haarnoja, T., Zhou, A., Abbeel, P., and Levine, S. (2018).
Soft actor-critic: Off-policy maximum entropy deep
reinforcement learning with a stochastic actor. In In-
ternational Conference on Machine Learning, pages
1861–1870. PMLR.
Hafner, D., Pasukonis, J., Ba, J., and Lillicrap, T. (2023).
Mastering diverse domains through world models.
arXiv preprint arXiv:2301.04104.
Hansen, N., Su, H., and Wang, X. (2024). TD-MPC2:
Scalable, robust world models for continuous control.
In International Conference on Learning Representa-
tions (ICLR).
Hansen, N., Wang, X., and Su, H. (2022). Temporal differ-
ence learning for model predictive control. In Inter-
national Conference on Machine Learning.
Hsu, H.-L., Meng, H., Luo, S., Dong, J., Tarokh, V., and
Pajic, M. (2024). REFORMA: Robust reinforcement
learning via adaptive adversary for drones flying un-
der disturbances. In 2024 IEEE International Confer-
ence on Robotics and Automation (ICRA).
Huang, S., Dossa, R. F. J., Ye, C., Braga, J., Chakraborty,
D., Mehta, K., and Ara
´
ujo, J. G. (2022). CleanRL:
High-quality single-file implementations of deep re-
inforcement learning algorithms. Journal of Machine
Learning Research, 23(274):1–18.
Hwangbo, J., Sa, I., Siegwart, R., and Hutter, M. (2017).
Control of a quadrotor with reinforcement learning.
IEEE Robotics and Automation Letters, 2(4):2096–
2103.
Koch, W., Mancuso, R., West, R., and Bestavros, A.
(2019). Reinforcement learning for UAV attitude con-
trol. ACM Transactions on Cyber-Physical Systems,
3(2):1–21.
Lambert, N. O., Drew, D. S., Yaconelli, J., Levine, S., Ca-
landra, R., and Pister, K. S. (2019). Low-level con-
trol of a quadrotor with deep model-based reinforce-
ment learning. IEEE Robotics and Automation Let-
ters, 4(4):4224–4230.
Lancaster, P., Hansen, N., Rajeswaran, A., and Kumar,
V. (2023). MoDem-V2: Visuo-motor world mod-
ICINCO 2024 - 21st International Conference on Informatics in Control, Automation and Robotics
90