MDRL-BT keeps the mind of hierarchies by decom-
posing a main issue into several simple questions to
provide a rational alternative solution. As evident
from the result, MDRL-BT doesn’t need elaborate re-
ward design to guarantee the training convergence rel-
ative to general RL algorithms. In the design of mixed
models, its a better choice to use PPO action nodes
with a shared brain and SAC composite nodes, even
pre-train nodes. So as to a real available application,
general RL algorithms or normal BT can be used for
simple tasks and by the way, MDRL-BT can be a can-
didate for complex problems.
MDRL-BT has a certain extensibility because of
recusive BT framework and RL foundations. Some-
times further exploration for extending MDRL-BT by
importing other mechanisms such as curiosity in the
sparse reward distribution can be an exciting avenue.
However, there will be enormous work to finish from
the unconspicuous consequence. In the future work,
the correlative theory and applicable scene about the
additional algorithms can be investigated for better
performance.
REFERENCES
Bacon, P.-L., Harb, J., and Precup, D. (2017). The option-
critic architecture. In Thirty-First AAAI Conference
on Artificial Intelligence.
Burda, Y., Edwards, H., Pathak, D., Storkey, A., Dar-
rell, T., and Efros, A. A. (2018). Large-scale
study of curiosity-driven learning. arXiv preprint
arXiv:1808.04355.
de Pontes Pereira, R. and Engel, P. M. (2015). A framework
for constrained and adaptive behavior-based agents.
arXiv preprint arXiv:1506.02312.
Dey, R. and Child, C. (2013). Ql-bt: Enhancing behaviour
tree design and implementation with q-learning. In
2013 IEEE Conference on Computational Inteligence
in Games (CIG), pages 1–8.
Dromey, R. G. (2003). From requirements to design: for-
malizing the key steps. In International Conference
on Software Engineering and Formal Methods.
Florez-Puga, G., Gomez-Martin, M., Gomez-Martin, P.,
Diaz-Agudo, B., and Gonzalez-Calero, P. (2009).
Query-enabled behavior trees. IEEE Transactions
on Computational Intelligence and AI in Games,
1(4):298–308.
Fu, Y., Qin, L., and Yin, Q. (2016). A reinforcement learn-
ing behavior tree framework for game ai. In 2016 In-
ternational Conference on Economics, Social Science,
Arts, Education and Management Engineering, pages
573–579.
Haarnoja, T., Zhou, A., Abbeel, P., and Levine, S. (2018).
Soft actor-critic: Off-policy maximum entropy deep
reinforcement learning with a stochastic actor. In
ICLR 2018 : International Conference on Learning
Representations 2018.
Isla, D. (2005). Gdc 2005 proceeding: Handling complexity
in the halo 2 ai. Retrieved October, 21:2009.
Juliani, A., Berges, V., Vckay, E., Gao, Y., Henry, H., Mat-
tar, M., and Lange, D. (2018). Unity: A general plat-
form for intelligent agents. arXiv:1809.02627.
Kartasev, M. (2019). Integrating reinforcement learning
into behavior trees by hierarchical composition.
Liessner, R., Schmitt, J., Dietermann, A., and Bker, B.
(2019). Hyperparameter optimization for deep re-
inforcement learning in vehicle energy management.
In Proceedings of the 11th International Conference
on Agents and Artificial Intelligence - Volume 2:
ICAART,, pages 134–144. INSTICC, SciTePress.
Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T.,
Tassa, Y., Silver, D., and Wierstra, D. (2015). Contin-
uous control with deep reinforcement learning. arXiv
preprint arXiv:1509.02971.
Mateas, M. and Stern, A. (2002). A behavior language for
story-based believable agents. IEEE Intelligent Sys-
tems, 17(4):39–47.
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A.,
Antonoglou, I., Wierstra, D., and Riedmiller, M. A.
(2013). Playing atari with deep reinforcement learn-
ing. arXiv preprint arXiv:1312.5602.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Ve-
ness, J., Bellemare, M. G., Graves, A., Riedmiller,
M., Fidjeland, A. K., Ostrovski, G., Petersen, S.,
Beattie, C., Sadik, A., Antonoglou, I., King, H., Ku-
maran, D., Wierstra, D., Legg, S., and Hassabis, D.
(2015). Human-level control through deep reinforce-
ment learning. Nature, 518(7540):529–533.
Noblega, A., Paes, A., and Clua, E. (2019). Towards adap-
tive deep reinforcement game balancing. In Proceed-
ings of the 11th International Conference on Agents
and Artificial Intelligence - Volume 2: ICAART,, pages
693–700. INSTICC, SciTePress.
Sakr, F. and Abdennadher, S. (2016). Harnessing super-
vised learning techniques for the task planning of am-
bulance rescue agents. In Proceedings of the 8th In-
ternational Conference on Agents and Artificial In-
telligence - Volume 1: ICAART,, pages 157–164. IN-
STICC, SciTePress.
Schulman, J., Levine, S., Moritz, P., Jordan, M. I., and
Abbeel, P. (2015). Trust region policy optimization.
arXiv preprint arXiv:1502.05477.
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and
Klimov, O. (2017). Proximal policy optimization al-
gorithms. arXiv preprint arXiv:1707.06347.
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L.,
Den Driessche, G. V., Schrittwieser, J., Antonoglou,
I., Panneershelvam, V., Lanctot, M., et al. (2016).
Mastering the game of go with deep neural networks
and tree search. Nature, 529(7587):484–489.
Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D.,
and Riedmiller, M. (2014). Deterministic policy gra-
dient algorithms. In Proceedings of the 31st In-
ternational Conference on International Conference
Mixed Deep Reinforcement Learning-behavior Tree for Intelligent Agents Design
123