Authors:
Hajar Hakkoum
1
;
Ali Idri
1
;
2
;
Ibtissam Abnane
1
and
José Luis Fernades-Aleman
3
Affiliations:
1
ENSIAS, Mohammed V University in Rabat, Morocco
;
2
Mohammed VI Polytechnic University in Benguerir, Morocco
;
3
Department of Computer Science and Systems, University of Murcia, 30100 Murcia, Spain
Keyword(s):
Interpretability, Machine Learning, Breast Cancer, SHAP, Global Surrogate, Categorical Encoding.
Abstract:
The lack of transparency in machine learning black-box models continues to be an impediment to their adoption in critical domains such as medicine, in which human lives are involved. Historical medical datasets often contain categorical attributes that are used to represent the categories or progression levels of a parameter or disease. The literature has shown that the manner in which these categorical attributes are handled in the preprocessing phase can affect accuracy, but little attention has been paid to interpretability. The objective of this study was to empirically evaluate a simple multilayer perceptron network when trained to diagnose breast cancer with ordinal and one-hot categorical encoding, and interpreted using a decision tree global surrogate and the Shapley Additive exPlanations (SHAP). The results obtained on the basis of Spearman fidelity show the poor performance of MLP with both encodings, but a slight preference for one-hot. Further evaluations are required wit
h more datasets and categorical encodings to analyse their impact on model interpretability.
(More)