Assured Reinforcement Learning with Formally Verified Abstract Policies

George Mason, Radu Calinescu, Daniel Kudenko, Alec Banks


We present a new reinforcement learning (RL) approach that enables an autonomous agent to solve decision making problems under constraints. Our assured reinforcement learning approach models the uncertain environment as a high-level, abstract Markov decision process (AMDP), and uses probabilistic model checking to establish AMDP policies that satisfy a set of constraints defined in probabilistic temporal logic. These formally verified abstract policies are then used to restrict the RL agent's exploration of the solution space so as to avoid constraint violations. We validate our RL approach by using it to develop autonomous agents for a flag-collection navigation task and an assisted-living planning problem.


  1. Abe, N., Melville, P., Pendus, C., et al. (2011). Optimizing debt collections using constrained reinforcement learning. In 16th ACM SIGKDD Intl. Conf. Knowledge Discovery and Data Mining, pages 75-84.
  2. Andova, S., Hermanns, H., and Katoen, J.-P. (2004). Discrete-time rewards model-checked. In Formal Modeling and Analysis of Timed Systems, pages 88- 104.
  3. Arcuri, A. and Briand, L. (2011). A practical guide for using statistical tests to assess randomized algorithms in software engineering. In 33rd Intl. Conf. Software Engineering, pages 1-10.
  4. Barrett, L. and Narayanan, S. (2008). Learning all optimal policies with multiple criteria. In 25th Intl. Conf. Machine learning, pages 41-47.
  5. Boger, J., Hoey, J., Poupart, P., et al. (2006). A planning system based on markov decision processes to guide people with dementia through activities of daily living. IEEE Transactions on Information Technology in Biomedicine, 10(2):323-333.
  6. Calinescu, R., Johnson, K., and Rafiq, Y. (2011). Using observation ageing to improve Markovian model learning in QoS engineering. In 2nd Intl. Conf. Performance Engineering, pages 505-510.
  7. Calinescu, R., Johnson, K., and Rafiq, Y. (2013). Developing self-verifying service-based systems. In 28th IEEE/ACM Intl. Conf. on Automated Software Engineering, pages 734-737.
  8. Calinescu, R., Kikuchi, S., and Johnson, K. (2012). Compositional reverification of probabilistic safety properties for large-scale complex it systems. In LargeScale Complex IT Systems. Development, Operation and Management, pages 303-329. Springer.
  9. Calinescu, R., Rafiq, Y., Johnson, K., and Bakir, M. E. (2014). Adaptive model learning for continual verification of non-functional properties. In 5th Intl. Conf. Performance Engineering, pages 87-98.
  10. Castro, D. D., Tamar, A., and Mannor, S. (2012). Policy gradients with variance related risk criteria. In 29th Intl. Conf. Machine Learning, pages 935-942.
  11. Dearden, R., Friedman, N., and Russell, S. (1998). Bayesian Q-learning. In 15th National Conference on Artificial Intelligence , pages 761-768.
  12. Delage, E. and Mannor, S. (2010). Percentile optimization for Markov decision processes with parameter uncertainty. Operations Research, 58(1):203-213.
  13. Driessens, K. and Dz?eroski, S. (2004). Integrating guidance into relational reinforcement learning. Machine Learning, 57(3):271-304.
  14. Efthymiadis, K. and Kudenko, D. (2015). Knowledge revision for reinforcement learning with abstract MDPs. In 14th Intl. Conf. Autonomous Agents and Multiagent Systems, pages 763-770.
  15. Gábor, Z., Kalmár, Z., and Szepesvári, C. (1998). Multicriteria reinforcement learning. In 15th Intl. Conf. Machine Learning, pages 197-205.
  16. García, J. and Fern ández, F. (2015). A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research, 16(1):1437-1480.
  17. Geibel, P. (2006). Reinforcement learning for MDPs with constraints. In 17th European Conference on Machine Learning, volume 4212, pages 646-653.
  18. Gerasimou, S., Calinescu, R., and Banks, A. (2014). Efficient runtime quantitative verification using caching, lookahead, and nearly-optimal reconfiguration. In9th Intl. Symposium on Software Engineering for Adaptive and Self-Managing Systems, pages 115-124.
  19. Gerasimou, S., Tamburrelli, G., and Calinescu, R. (2015). Search-based synthesis of probabilistic models for quality-of-service software engineering. In 30th IEEE/ACM Intl. Conf. Automated Software Engineering, pages 319-330.
  20. Hansson, H. and Jonsson, B. (1994). A logic for reasoning about time and reliability. Formal aspects of computing, 6(5):512-535.
  21. Katoen, J.-P., Zapreev, I. S., Hahn, E. M., et al. (2011). The ins and outs of the probabilistic model checker MRMC. Performance Evaluation, 68(2):90-104.
  22. Kober, J., Bagnell, J. A., and Peters, J. (2013). Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, page 0278364913495721.
  23. Kwiatkowska, M. (2007). Quantitative verification: Models, techniques and tools. In 6th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, pages 449-458.
  24. Kwiatkowska, M., Norman, G., and Parker, D. (2007). Stochastic model checking. In 7th Intl. Conf. Formal Methods for Performance Evaluation, volume 4486, pages 220-270.
  25. Kwiatkowska, M., Norman, G., and Parker, D. (2011). PRISM 4.0: Verification of probabilistic real-time systems. In 23rd Intl. Conf. Computer Aided Verification , volume 6806, pages 585-591.
  26. Li, L., Walsh, T. J., and Littman, M. L. (2006). Towards a unified theory of state abstraction for MDPs. In9th International Symposium on Artificial Intelligence and Mathematics, pages 531-539.
  27. Liu, C., Xu, X., and Hu, D. (2015). Multiobjective reinforcement learning: A comprehensive overview. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 45(3):385-398.
  28. Mannor, S. and Shimkin, N. (2004). A geometric approach to multi-criterion reinforcement learning. Journal of Machine Learning Research, 5:325-360.
  29. Marthi, B. (2007). Automatic shaping and decomposition of reward functions. In 24th Intl. Conf. Machine learning, pages 601-608.
  30. Mason, G., Calinescu, R., Kudenko, D., and Banks, A. (2016). Combining reinforcement learning and quantitative verification for agent policy assurance. In6th Intl. Workshop on Combinations of Intelligent Methods and Applications, pages 45-52.
  31. Mihatsch, O. and Neuneier, R. (2002). Risk-sensitive reinforcement learning. Machine Learning, 49(2):267- 290.
  32. Moffaert, K. V. and Nowé, A. (2014). Multi-objective reinforcement learning using sets of pareto dominating policies. Journal of Machine Learning Research, 15(1):3663-3692.
  33. Moldovan, T. M. and Abbeel, P. (2012). Safe exploration in Markov decision processes. In 29th Intl. Conf. Machine Learning, pages 1711-1718.
  34. Ponda, S. S., Johnson, L. B., and How, J. P. (2013). Risk allocation strategies for distributed chance-constrained task allocation. In American Control Conference, pages 3230-3236.
  35. Sutton, R. S., Precup, D., and Singh, S. (1999). Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial intelligence, 112(1-2):181-211.
  36. Szita, I. (2012). Reinforcement learning in games. In Reinforcement Learning, pages 539-577. Springer.
  37. Vamplew, P., Dazeley, R., Berry, A., et al. (2011). Empirical evaluation methods for multiobjective reinforcement learning algorithms. Machine Learning, 84(1):51-80.
  38. Watkins, C. J. C. H. and Dayan, P. (1992). Q-learning. Machine Learning, 8(3):279-292.
  39. Wiering, M. and Otterlo, M. (2012). Reinforcement learning and markov decision processes. In Reinforcement Learning: State-of-the-art, volume 12, pages 3-42. Springer.
  40. Xia, L. and Jia, Q.-S. (2013). Policy iteration for parameterized markov decision processes and its application. In 9th Asian Control Conference, pages 1-6.

Paper Citation

in Harvard Style

Mason G., Calinescu R., Kudenko D. and Banks A. (2017). Assured Reinforcement Learning with Formally Verified Abstract Policies . In Proceedings of the 9th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, ISBN 978-989-758-220-2, pages 105-117. DOI: 10.5220/0006156001050117

in Bibtex Style

author={George Mason and Radu Calinescu and Daniel Kudenko and Alec Banks},
title={Assured Reinforcement Learning with Formally Verified Abstract Policies},
booktitle={Proceedings of the 9th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,},

in EndNote Style

JO - Proceedings of the 9th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,
TI - Assured Reinforcement Learning with Formally Verified Abstract Policies
SN - 978-989-758-220-2
AU - Mason G.
AU - Calinescu R.
AU - Kudenko D.
AU - Banks A.
PY - 2017
SP - 105
EP - 117
DO - 10.5220/0006156001050117