and the corrected model, we envisage that the abstract
policies for the two models will be similar. Therefore,
it is intuitively not necessary to generate an entirely
different abstract policy. Since it is time consuming
to generate policies as well as to verify them, we aim
to reuse those elements of the initial abstract policy
that still match the abstract model.
To evaluate the algorithm, a series of experiments
will be conducted using extensions of the two case
studies described in the previous section. For each
case study we will produce a series of RL environ-
ments where the environment is uniquely different
from that of the initial AMDP. We will then deter-
mine if on average the knowledge revision algorithm
is faster at finding a new safe abstract policy than if
an AI engineer were to manually inspect the initial
RL model, reconstruct the high-level model, and gen-
erate safe abstract policies from scratch.
6.3 Future Work
Future work will involve more in depth analysis of
the framework’s performance. This includes evalu-
ation of how long it takes for a safe solution to be
learned and the processing overheads incurred. Ad-
ditionally, further experiments will be conducted by
expanding the existing case studies to establish how
well the framework will scale, also, new case studies
will be developed for different domains to determine
the range of scenarios which the technique can be ap-
plied to.
ACKNOWLEDGEMENTS
This paper presents research sponsored by the UK
MOD. The information contained in it should not be
interpreted as representing the views of the UK MOD,
nor should it be assumed it reflects any current or fu-
ture UK MOD policy.
REFERENCES
Arcuri, A. and Briand, L. (2011). A practical guide for us-
ing statistical tests to assess randomized algorithms in
software engineering. In 33rd Intl. Conf. Software En-
gineering, pages 1–10.
Argall, B. D., Chernova, S., Veloso, M., et al. (2009). A
survey of robot learning from demonstration. Robotics
and Autonomous Systems, 57(5):469–483.
Boger, J., Hoey, J., Poupart, P., et al. (2006). A planning
system based on markov decision processes to guide
people with dementia through activities of daily liv-
ing. IEEE Transactions on Information Technology in
Biomedicine, 10(2):323–333.
Calinescu, R., Johnson, K., and Rafiq, Y. (2011). Using ob-
servation ageing to improve Markovian model learn-
ing in QoS engineering. In 2nd Intl. Conf. Perfor-
mance Engineering, pages 505–510.
Calinescu, R., Johnson, K., and Rafiq, Y. (2013). Devel-
oping self-verifying service-based systems. In 28th
IEEE/ACM Intl. Conf. Automated Software Engineer-
ing, pages 734–737.
Calinescu, R., Kikuchi, S., and Johnson, K. (2012). Compo-
sitional reverification of probabilistic safety properties
for large-scale complex IT systems. In Large-Scale
Complex IT Systems. Development, Operation and
Management, pages 303–329. Springer Berlin Heidel-
berg.
Dearden, R., Friedman, N., and Russell, S. (1998).
Bayesian Q-learning. In 15th National Conference on
Artificial Intelligence, pages 761–768.
Efthymiadis, K. and Kudenko, D. (2015). Knowledge revi-
sion for reinforcement learning with abstract MDPs.
In 14th Intl. Conf. Autonomous Agents and Multiagent
Systems, pages 763–770.
Garc
´
ıa, J. and Fern
´
andez, F. (2012). Safe exploration
of state and action spaces in reinforcement learning.
Journal of Artifical Intelligence Research, 45(1):515–
564.
Garc
´
ıa, J. and Fern
´
andez, F. (2015). A comprehensive sur-
vey on safe reinforcement learning. Journal of Ma-
chine Learning Research, 16(1):1437–1480.
Gerasimou, S., Calinescu, R., and Banks, A. (2014). Effi-
cient runtime quantitative verification using caching,
lookahead, and nearly-optimal reconfiguration. In 9th
International Symposium on Software Engineering for
Adaptive and Self-Managing Systems, pages 115–124.
Gerasimou, S., Tamburrelli, G., and Calinescu, R. (2015).
Search-based synthesis of probabilistic models for
quality-of-service software engineering. In 30th
IEEE/ACM Intl. Conf. Automated Software Engineer-
ing, pages 319–330.
Hansson, H. and Jonsson, B. (1994). A logic for reasoning
about time and reliability. Formal Aspects of Comput-
ing, 6(5):512–535.
Heger, M. (1994). Consideration of risk in reinforcement
learning. In 11th Intl. Conf. Machine Learning, pages
105–111.
Kober, J., Bagnell, J. A., and Peters, J. (2013). Reinforce-
ment learning in robotics: A survey. International
Journal of Robotics Research, 32(11):1238–1274.
Kwiatkowska, M., Norman, G., and Parker, D. (2007).
Stochastic model checking. In 7th Intl. Conf. Formal
Methods for Performance Evaluation, volume 4486,
pages 220–270.
Kwiatkowska, M., Norman, G., and Parker, D. (2011).
PRISM 4.0: Verification of probabilistic real-time sys-
tems. In 23rd Intl. Conf. Computer Aided Verification,
volume 6806, pages 585–591.
Lange, D. S., Verbancsics, P., Gutzwiller, R. S., et al.
(2012). Command and control of teams of au-
Assured Reinforcement Learning for Safety-critical Applications
15