MDPs in a healthcare setting. Thus, they decided that
the following works will consider some extensions of
the MDP to POMDP and introducing the
heterogeneity of doctors and patients by hierarchical
modelling.
5.3 IRL or Apprenticeship Learning
Apprenticeship learning via IRL (Pieter & Ng, 2004)
is learning a reward function using IRL from
observed behaviour and using the learned reward
function in reinforcement learning. While most
studies use IRL as a bridge for an eventual RL
application, just like in apprenticeship learning, in the
case of healthcare where safety is of paramount
importance, it might be helpful to consider using IRL
alone for the sake of extracting the reward function.
Because all it does is inferring the reward function of
presumably optimal treatment policy and extracting
information about the essential variables to consider
and recommend for clinicians. Thus, unless we can
provide a sophisticated and realistic simulation of the
healthcare setting for RL algorithms to be trained and
create interpretable RL solutions to improve the
safety, robustness, and the accuracy of learnt policies
in healthcare-domains which is currently an
unresolved topic that necessitates more research, IRL
can safely help improve clinician’s decision-making.
6 CONCLUSIONS
Inverse reinforcement learning (IRL) presents a
theoretical, and in lots of cases, a practical solution to
infer the reward function or the objective behind a
given policy. Usually, in healthcare domains, the
policy is performed by a clinician (i.e., the expert).
This paper aims to provide a brief comprehensive
survey of state-of-the-art applications of IRL
techniques in healthcare, the challenges faced, and
some potential directions for future research.
Although tremendous progress has been made in
recent years in the field of IRL in a lot of other areas,
clinical settings are uniquely critical and high risk-
sensitive, thus the limited literature regarding these
IRL applications in healthcare.
Nevertheless, IRL can be safely and efficiently
exploited to extract meaningful indicators associated
with the learned reward function. This can help to
recommend new effective treatment protocols and
therefore improving clinician’s decision making.
However, the risks of IRL in healthcare
applications is manifested highly when it is used as a
bridge for an eventual RL application where the
patient’s health is becoming between the hands of an
algorithm usually functioning as a non-interpretable
black-box making clinicians unlikely to trust it. Also,
most RL algorithms in healthcare learn either by trial
and error, which is obviously unfeasible/unethical, or
through another policy given as retrospective
treatment data, which -as we discussed- is still not
reliable enough and needs more improvement.
Finally, the hope is that more researchers from
different disciplines exploit their domain-expertise
and collaborate to produce more reliable solutions to
improve the decision-making in the healthcare
domains.
REFERENCES
Arora, S., & Doshi, P. (2021). A survey of inverse
reinforcement learning: Challenges, methods and
progress. Artificial Intelligence, 297, 103500.
https://doi.org/10.1016/j.artint.2021.103500
Asoh, H., Shiro, M., Akaho, S., & Kamishima, T. (2013).
An Application of Inverse Reinforcement Learning to
Medical Records of Diabetes. European Conference on
Machine Learning and Principles and Practice of
Knowledge Discovery in Databases (ECML/PKDD
’13), 1–8.
Asoh, H., Shiro, M., Akaho, S., Kamishima, T., Hasida, K.,
Aramaki, E., & Kohro, T. (2013). Modelling Medical
Records of Diabetes. Proceedings of the ICML2013
Workshop on Role of Ma- Chine Learning in
Transforming Healthcare, 6.
Carden, S. W., & Livsey, J. (2017). Small-sample
reinforcement learning: Improving policies using
synthetic data. Intelligent Decision Technologies,
11(2), 167–175. https://doi.org/10.3233/IDT-170285
Dimitrakakis, C., & Rothkopf, C. A. (2012). Bayesian
multitasks inverse reinforcement learning. Lecture
Notes in Computer Science (Including Subseries
Lecture Notes in Artificial Intelligence and Lecture
Notes in Bioinformatics), 7188 LNAI, 273–284.
https://doi.org/10.1007/978-3-642-29946-9_27
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B.,
Warde-Farley, D., Ozair, S., Courville, A., & Bengio,
Y. (2020). Generative adversarial networks.
Communications of the ACM, 63(11), 139–144.
https://doi.org/10.1145/3422622
Gottesman, O., Johansson, F., Komorowski, M., Faisal, A.,
Sontag, D., Doshi-Velez, F., & Celi, L. A. (2019).
Guidelines for reinforcement learning in healthcare.
Nature Medicine, 25(1), 16–18.
https://doi.org/10.1038/s41591-018-0310-5
Gottesman, O., Johansson, F., Meier, J., Dent, J., Lee, D.,
Srinivasan, S., Zhang, L., Ding, Y., Wihl, D., Peng, X.,
Yao, J., Lage, I., Mosch, C., Lehman, L. H.,
Komorowski, M., Komorowski, M., Faisal, A., Celi, L.
A., Sontag, D., & Doshi-Velez, F. (2018). Evaluating