MDPs in a healthcare setting. Thus, they decided that 
the following works will consider some extensions of 
the  MDP  to  POMDP  and  introducing  the 
heterogeneity of doctors and patients by hierarchical 
modelling. 
5.3  IRL or Apprenticeship Learning 
Apprenticeship learning via IRL (Pieter & Ng, 2004) 
is  learning  a  reward  function  using  IRL  from 
observed  behaviour  and  using  the  learned  reward 
function  in  reinforcement  learning.  While  most 
studies  use  IRL  as  a  bridge  for  an  eventual  RL 
application, just like in apprenticeship learning, in the 
case  of  healthcare  where  safety  is  of  paramount 
importance, it might be helpful to consider using IRL 
alone for the sake of extracting the reward function. 
Because all it does is inferring the reward function of 
presumably  optimal treatment  policy  and  extracting 
information about the essential variables to consider 
and  recommend for  clinicians.  Thus, unless  we can 
provide a sophisticated and realistic simulation of the 
healthcare setting for RL algorithms to be trained and 
create  interpretable  RL  solutions  to  improve  the 
safety, robustness, and the accuracy of learnt policies 
in  healthcare-domains  which  is  currently  an 
unresolved topic that necessitates more research, IRL 
can safely help improve clinician’s decision-making. 
6  CONCLUSIONS 
Inverse  reinforcement  learning  (IRL)  presents  a 
theoretical, and in lots of cases, a practical solution to 
infer  the  reward  function  or  the  objective  behind  a 
given  policy.  Usually,  in  healthcare  domains,  the 
policy is performed by a clinician (i.e., the expert). 
This paper aims to provide a brief comprehensive 
survey  of  state-of-the-art  applications  of  IRL 
techniques  in  healthcare,  the  challenges  faced,  and 
some potential directions for future research. 
Although tremendous progress has been made in 
recent years in the field of IRL in a lot of other areas, 
clinical  settings  are  uniquely  critical  and  high  risk-
sensitive,  thus  the  limited  literature  regarding  these 
IRL applications in healthcare.  
Nevertheless,  IRL  can  be  safely  and  efficiently 
exploited to extract meaningful indicators associated 
with  the  learned  reward  function.  This  can  help  to 
recommend  new  effective  treatment  protocols  and 
therefore improving clinician’s decision making.  
However,  the  risks  of  IRL  in  healthcare 
applications is manifested highly when it is used as a 
bridge  for  an  eventual  RL  application  where  the 
patient’s health is becoming between the hands of an 
algorithm usually functioning as a non-interpretable 
black-box making clinicians unlikely to trust it. Also, 
most RL algorithms in healthcare learn either by trial 
and error, which is obviously unfeasible/unethical, or 
through  another  policy  given  as  retrospective 
treatment  data,  which  -as  we  discussed-  is  still  not 
reliable enough and needs more improvement. 
Finally,  the  hope  is  that  more  researchers  from 
different  disciplines  exploit  their  domain-expertise 
and collaborate to produce more reliable solutions to 
improve  the  decision-making  in  the  healthcare 
domains. 
REFERENCES 
Arora,  S.,  &  Doshi,  P.  (2021).  A  survey  of  inverse 
reinforcement  learning:  Challenges,  methods  and 
progress.  Artificial Intelligence,  297,  103500. 
https://doi.org/10.1016/j.artint.2021.103500 
Asoh, H., Shiro, M., Akaho, S., & Kamishima, T. (2013). 
An Application of Inverse Reinforcement Learning to 
Medical Records of Diabetes. European Conference on 
Machine Learning and Principles and Practice of 
Knowledge Discovery in Databases (ECML/PKDD 
’13), 1–8. 
Asoh, H., Shiro, M., Akaho, S., Kamishima, T., Hasida, K., 
Aramaki, E., & Kohro,  T. (2013). Modelling Medical 
Records  of  Diabetes.  Proceedings of the ICML2013 
Workshop on Role of Ma- Chine Learning in 
Transforming Healthcare, 6. 
Carden,  S.  W.,  &  Livsey,  J.  (2017).  Small-sample 
reinforcement  learning:  Improving  policies  using 
synthetic  data.  Intelligent Decision Technologies, 
11(2), 167–175. https://doi.org/10.3233/IDT-170285 
Dimitrakakis,  C.,  &  Rothkopf,  C.  A.  (2012).  Bayesian 
multitasks  inverse  reinforcement  learning.  Lecture 
Notes in Computer Science (Including Subseries 
Lecture Notes in Artificial Intelligence and Lecture 
Notes in Bioinformatics),  7188 LNAI,  273–284. 
https://doi.org/10.1007/978-3-642-29946-9_27 
Goodfellow,  I.,  Pouget-Abadie,  J.,  Mirza,  M.,  Xu,  B., 
Warde-Farley, D., Ozair, S., Courville, A., & Bengio, 
Y.  (2020).  Generative  adversarial  networks. 
Communications of the ACM,  63(11),  139–144. 
https://doi.org/10.1145/3422622 
Gottesman, O., Johansson, F., Komorowski, M., Faisal, A., 
Sontag,  D.,  Doshi-Velez,  F.,  &  Celi,  L.  A.  (2019). 
Guidelines  for  reinforcement  learning  in  healthcare. 
Nature Medicine,  25(1),  16–18. 
https://doi.org/10.1038/s41591-018-0310-5 
Gottesman, O., Johansson, F., Meier, J., Dent, J., Lee, D., 
Srinivasan, S., Zhang, L., Ding, Y., Wihl, D., Peng, X., 
Yao,  J.,  Lage,  I.,  Mosch,  C.,  Lehman,  L.  H., 
Komorowski, M., Komorowski, M., Faisal, A., Celi, L. 
A., Sontag, D.,  & Doshi-Velez, F.  (2018). Evaluating