Scale Complex IT Systems. Development, Operation
and Management, pages 303–329. Springer.
Calinescu, R., Rafiq, Y., Johnson, K., and Bakir, M. E.
(2014). Adaptive model learning for continual verifi-
cation of non-functional properties. In 5th Intl. Conf.
Performance Engineering, pages 87–98.
Castro, D. D., Tamar, A., and Mannor, S. (2012). Policy
gradients with variance related risk criteria. In 29th
Intl. Conf. Machine Learning, pages 935–942.
Dearden, R., Friedman, N., and Russell, S. (1998).
Bayesian Q-learning. In 15th National Conference on
Artificial Intelligence, pages 761–768.
Delage, E. and Mannor, S. (2010). Percentile optimization
for Markov decision processes with parameter uncer-
tainty. Operations Research, 58(1):203–213.
Driessens, K. and D
ˇ
zeroski, S. (2004). Integrating guid-
ance into relational reinforcement learning. Machine
Learning, 57(3):271–304.
Efthymiadis, K. and Kudenko, D. (2015). Knowledge revi-
sion for reinforcement learning with abstract MDPs.
In 14th Intl. Conf. Autonomous Agents and Multiagent
Systems, pages 763–770.
G
´
abor, Z., Kalm
´
ar, Z., and Szepesv
´
ari, C. (1998). Multi-
criteria reinforcement learning. In 15th Intl. Conf. Ma-
chine Learning, pages 197–205.
Garc
´
ıa, J. and Fern
´
andez, F. (2015). A comprehensive sur-
vey on safe reinforcement learning. Journal of Ma-
chine Learning Research, 16(1):1437–1480.
Geibel, P. (2006). Reinforcement learning for MDPs with
constraints. In 17th European Conference on Machine
Learning, volume 4212, pages 646–653.
Gerasimou, S., Calinescu, R., and Banks, A. (2014). Effi-
cient runtime quantitative verification using caching,
lookahead, and nearly-optimal reconfiguration. In 9th
Intl. Symposium on Software Engineering for Adap-
tive and Self-Managing Systems, pages 115–124.
Gerasimou, S., Tamburrelli, G., and Calinescu, R. (2015).
Search-based synthesis of probabilistic models for
quality-of-service software engineering. In 30th
IEEE/ACM Intl. Conf. Automated Software Engineer-
ing, pages 319–330.
Hansson, H. and Jonsson, B. (1994). A logic for reasoning
about time and reliability. Formal Aspects of Comput-
ing, 6(5):512–535.
Katoen, J.-P., Zapreev, I. S., Hahn, E. M., et al. (2011).
The ins and outs of the probabilistic model checker
MRMC. Performance Evaluation, 68(2):90–104.
Kober, J., Bagnell, J. A., and Peters, J. (2013). Re-
inforcement learning in robotics: A survey. The
International Journal of Robotics Research, page
0278364913495721.
Kwiatkowska, M. (2007). Quantitative verification: Mod-
els, techniques and tools. In 6th joint meeting of
the European Software Engineering Conference and
the ACM SIGSOFT Symposium on the Foundations of
Software Engineering, pages 449–458.
Kwiatkowska, M., Norman, G., and Parker, D. (2007).
Stochastic model checking. In 7th Intl. Conf. Formal
Methods for Performance Evaluation, volume 4486,
pages 220–270.
Kwiatkowska, M., Norman, G., and Parker, D. (2011).
PRISM 4.0: Verification of probabilistic real-time sys-
tems. In 23rd Intl. Conf. Computer Aided Verification,
volume 6806, pages 585–591.
Li, L., Walsh, T. J., and Littman, M. L. (2006). Towards a
unified theory of state abstraction for MDPs. In 9th In-
ternational Symposium on Artificial Intelligence and
Mathematics, pages 531–539.
Liu, C., Xu, X., and Hu, D. (2015). Multiobjective rein-
forcement learning: A comprehensive overview. IEEE
Transactions on Systems, Man, and Cybernetics: Sys-
tems, 45(3):385–398.
Mannor, S. and Shimkin, N. (2004). A geometric approach
to multi-criterion reinforcement learning. Journal of
Machine Learning Research, 5:325–360.
Marthi, B. (2007). Automatic shaping and decomposition of
reward functions. In 24th Intl. Conf. Machine learn-
ing, pages 601–608.
Mason, G., Calinescu, R., Kudenko, D., and Banks, A.
(2016). Combining reinforcement learning and quan-
titative verification for agent policy assurance. In 6th
Intl. Workshop on Combinations of Intelligent Meth-
ods and Applications, pages 45–52.
Mihatsch, O. and Neuneier, R. (2002). Risk-sensitive re-
inforcement learning. Machine Learning, 49(2):267–
290.
Moffaert, K. V. and Now
´
e, A. (2014). Multi-objective re-
inforcement learning using sets of pareto dominat-
ing policies. Journal of Machine Learning Research,
15(1):3663–3692.
Moldovan, T. M. and Abbeel, P. (2012). Safe exploration
in Markov decision processes. In 29th Intl. Conf. Ma-
chine Learning, pages 1711–1718.
Ponda, S. S., Johnson, L. B., and How, J. P. (2013). Risk al-
location strategies for distributed chance-constrained
task allocation. In American Control Conference,
pages 3230–3236.
Sutton, R. S., Precup, D., and Singh, S. (1999). Between
MDPs and semi-MDPs: A framework for temporal
abstraction in reinforcement learning. Artificial Intel-
ligence, 112(1-2):181–211.
Szita, I. (2012). Reinforcement learning in games. In Rein-
forcement Learning, pages 539–577. Springer.
Vamplew, P., Dazeley, R., Berry, A., et al. (2011). Empirical
evaluation methods for multiobjective reinforcement
learning algorithms. Machine Learning, 84(1):51–80.
Watkins, C. J. C. H. and Dayan, P. (1992). Q-learning. Ma-
chine Learning, 8(3):279–292.
Wiering, M. and Otterlo, M. (2012). Reinforcement learn-
ing and markov decision processes. In Reinforcement
Learning: State-of-the-art, volume 12, pages 3–42.
Springer.
Xia, L. and Jia, Q.-S. (2013). Policy iteration for parame-
terized markov decision processes and its application.
In 9th Asian Control Conference, pages 1–6.
Assured Reinforcement Learning with Formally Verified Abstract Policies
117