Figure 7: Static/dynamic R values in agent collaboration.
6 CONCLUSIONS
The MARAB algorithm (Galichet et al., 2014), is
a risk-aware multi-armed bandit algorithm. It pro-
vides an alternative to the UCB1 selection formula
and treats the conditional value at risk as the mea-
sure of branch quality. This approach can deal with
risk, but does not offer any decision making aspect.
Other approaches, such as (Liu and Koenig, 2008),
solve MDPs in a risk sensitive manner, while tak-
ing resource levels into consideration. This is done
by making use of a type of non-linear utility func-
tions known as “one-switch” utility functions. How-
ever, this approach computes offline policies, and thus
cannot readily be integrated with the highly dynamic
framework of BDI.
In this paper we presented a novel approach to cal-
culate risk alongside utility in popular MCTS algo-
rithms. We showed how such an online planner can be
integrated with a BDI agent. This integration allows
an autonomous agent to reason about its risk tolerance
level based on its current beliefs (e.g. the availability
of resources) and dynamically adjust them. It also al-
lows such an agent to react to unforeseen events, an
approach impossible in BDI agents that only use pre-
defined plans. Furthermore, our proposed framework
agrees with the principles as outlined in Section 1, en-
abling an agent to act appropriately in high-stakes en-
vironments. Experimental results underpin our theo-
retical contributions and show that taking risk dynam-
ically into account can lead to higher success rates
with only minimal reductions in utility compared to
agents with static risk aversion levels.
ACKNOWLEDGEMENTS
We would like to thank Carles Sierra, Llu
´
ıs Godo,
and Jianbing Ma for their inspiring discussions and
comments. This work is partially funded by EPSRC
PACES project (Ref: EP/J012149/1).
REFERENCES
Auer, P., Cesa-Bianchi, N., and Fischer, P. (2002). Finite-
time analysis of the multiarmed bandit problem. ML,
47(2-3):235–256.
Bauters, K., Liu, W., Hong, J., Sierra, C., and Godo, L.
(2014). CAN(PLAN)+: Extending the operational se-
mantics of the BDI architecture to deal with uncertain
information. In Proc. of UAI’14, pages 52–61.
Bratman, M. (1987). Intention, Plans, and Practical Rea-
son. Harvard University Press.
Browne, C. B., Powley, E., Whitehouse, D., Lucas, S. M.,
Cowling, P. I., Rohlfshagen, P., Tavener, S., Perez, D.,
Samothrakis, S., and Colton, S. (2012). A survey of
Monte Carlo tree search methods. IEEE Transactions
on comp. intell. and AI in games, 4(1):1–43.
Chen, Y., Bauters, K., Liu, W., Hong, J., McAreavey,
K., Sierra, C., and Godo, L. (2014). AgentSpeak+:
AgentSpeak with probabilistic planning. In Proc. of
CIMA’14, pages 15–20.
Dastani, M. (2008). 2APL: a practical agent programming
language. JAAMAS, 16(3).
Galichet, N., Sebag, M., and Teytaud, O. (2014). Explo-
ration vs exploitation vs safety: Risk-averse multi-
armed bandits. JMLR: Workshop and Conference Pro-
ceedings, 29:245–260.
Johansen, I. L. and Rausand, M. (2014). Foundations and
choice of risk metrics. Safety science, 62:386–399.
Keller, T. and Eyerich, P. (2012). PROST: Probabilistic
planning based on UCT. In Proc. of ICAPS.
Kocsis, L. and Szepesv
´
ari, C. (2006). Bandit based Monte-
Carlo planning. In Proc. of ECML’06, pages 282–293.
Liu, Y. and Koenig, S. (2008). An exact algorithm for
solving MDPs under risk-sensitive planning objec-
tives with one-switch utility functions. In Proc. of AA-
MAS’08, pages 453–460.
Rao, A. S. (1996). AgentSpeak (L): BDI agents speak out
in a logical computable language. In Agents Breaking
Away, pages 42–55. Springer.
Rao, A. S. and Georgeff, M. P. (1991). Modeling rational
agents within a BDI-architecture. KR, 91:473–484.
Sardina, S., de Silva, L., and Padgham, L. (2006). Hierar-
chical planning in BDI agent programming languages:
A formal approach. In Proc. of AAMAS’06, pages
1001–1008.
Welford, B. (1962). Note on a method for calculating cor-
rected sums of squares and products. Technometrics,
4(3):419–420.
Risk-aware Planning in BDI Agents
329