ing the disease ”other”, and 20 sub-tasks have been
already solved. For Figure 4 it remains 104 relevant
symptoms to check, 18 possible diseases including
the disease ”other”, and 103 sub-tasks have been al-
ready solved.
Finally we have been able to learn a good policy
for the main task (1) where it remains 220 relevant
symptoms, 82 possible diseases including the disease
”other” and all the possible sub-tasks have been al-
ready solved. Our DQN-MC-Bootstrap starts with a
good policy which needs 45 questions on average to
reach a terminal state and only 40 after some train-
ing iterations. On the contrary, the strategy learned
by DQN-MC trying to solve this task from scratch
must ask on average 117 questions to reach a termi-
nal state and does not improve significantly during the
1000 iterations. The Breiman policy on the global
task needs 89 questions on average to reach a terminal
states (with a variance of 10 questions).
6 CONCLUSION
In this work, we formulated as a stochastic shortest
path problem the sequential decision making task as-
sociated with the objective of building a symptom
checker for the diagnosis of rare diseases.
We have studied several RL algorithms and made
them operational in our very high dimensional envi-
ronment. To do so, we divided the initial task into sev-
eral subtasks and learned a strategy for each one. We
have proven that appropriate use of intersections be-
tween subtasks can significantly accelerate the learn-
ing process. The strategies learned have proven to be
much better than classic greedy strategies.
Finally, a first preliminary study was carried out
internally at Necker Hospital to check the diagnostic
performance of our decision support system. This ex-
periment was conducted on a set of 40 rare disease pa-
tients from a fetopathology dataset, which has no con-
nection to the data used to train our algorithms. We
get good results; indeed more than 80% of the scenar-
ios led to a good diagnosis. Note that theoretically our
definition of the stopping rule of equation (2) makes
impossible any mis-diagnosis as long as ε is chosen
sufficiently low. In practice, of course, mis-diagnosis
are possible because of the inevitable shortcomings of
the environmental model (synonyms, omissions,...).
Research is currently underway to improve and
enrich the environmental model by adding new rare
diseases and symptoms. Finally, we are studying sev-
eral avenues to make our decision support tool more
robust in the face of the unavoidable defects of the en-
vironmental model. A larger scale study is underway
but faces difficulties in obtaining clinical data.
REFERENCES
Amiranashvili, A., Dosovitskiy, A., Koltun, V., and Brox, T.
(2018). Td or not td: Analyzing the role of temporal
differencing in deep reinforcement learning. arXiv.
Besson, R. (2019). Decision making strategy for antenatal
echographic screening of foetal abnormalities using
statistical learning. Theses, Universit
´
e Paris-Saclay.
Besson, R., Pennec, E. L., and Allassonni
`
ere, S. (2019).
Learning from both experts and data. Entropy, 21,
1208.
Breiman, L., Friedman, J. H., Olshen, R. A., and Stone,
C. J. (1984). Classification and Regression Trees.
Wadsworth and Brooks, Monterey, CA.
Chen, Y.-E., Tang, K.-F., Peng, Y.-S., and Chang, E. Y.
(2019). Effective medical test suggestions using deep
reinforcement learning. ArXiv, abs/1905.12916.
Cover, T. M. and Thomas, J. A. (2006). Elements of Infor-
mation Theory (Wiley Series in Telecommunications
and Signal Processing). Wiley-Interscience, New
York, NY, USA.
Hart, P. E., Nilsson, N. J., and Raphael, B. (1968). A for-
mal basis for the heuristic determination of minimum
cost paths. IEEE Transactions on Systems Science and
Cybernetics, 4(2):100–107.
Heess, N., Silver, D., and Teh, Y. W. (2013). Actor-critic
reinforcement learning with energy-based policies. In
Proceedings of the Tenth European Workshop on Re-
inforcement Learning, volume 24, pages 45–58.
K
¨
ohler, S. and al. (2017). The human phenotype ontology
in 2017. In Nucleic Acids Research.
Korf, R. E. (1985). Depth-first iterative-deepening: An op-
timal admissible tree search. Artif. Intell., 27(1):97–
109.
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A.,
Antonoglou, I., Wierstra, D., and Riedmiller, M. A.
(2013). Playing atari with deep reinforcement learn-
ing. CoRR, abs/1312.5602.
Peng, Y.-S., Tang, K.-F., Lin, H.-T., and Chang, E. (2018).
Refuel: Exploring sparse features in deep reinforce-
ment learning for fast disease diagnosis. In NIPS.
Sutton, R. S. and Barto, A. G. (2018). Introduction to Re-
inforcement Learning. MIT Press, Cambridge, MA,
USA, 2nd edition.
Tang, K.-F., Kao, H.-C., Chou, C.-N., and Chang, E. Y.
(2016). Inquire and diagnose : Neural symptom
checking ensemble using deep reinforcement learn-
ing. In NIPS.
Williams, R. J. (1992). Simple statistical gradient-following
algorithms for connectionist reinforcement learning.
Machine Learning, 8:229–256.
Zubek, V. B. and Dietterich, T. G. (2005). Integrating learn-
ing from examples into the search for diagnostic poli-
cies. CoRR, abs/1109.2127.
ICAART 2020 - 12th International Conference on Agents and Artificial Intelligence
482