Does Categorical Encoding Affect the Interpretability of a Multilayer Perceptron for Breast Cancer Classification?

Hajar Hakkoum, Ali Idri, Ali Idri, Ibtissam Abnane, José Luis Fernades-Aleman

2023

Abstract

The lack of transparency in machine learning black-box models continues to be an impediment to their adoption in critical domains such as medicine, in which human lives are involved. Historical medical datasets often contain categorical attributes that are used to represent the categories or progression levels of a parameter or disease. The literature has shown that the manner in which these categorical attributes are handled in the preprocessing phase can affect accuracy, but little attention has been paid to interpretability. The objective of this study was to empirically evaluate a simple multilayer perceptron network when trained to diagnose breast cancer with ordinal and one-hot categorical encoding, and interpreted using a decision tree global surrogate and the Shapley Additive exPlanations (SHAP). The results obtained on the basis of Spearman fidelity show the poor performance of MLP with both encodings, but a slight preference for one-hot. Further evaluations are required with more datasets and categorical encodings to analyse their impact on model interpretability.

Download


Paper Citation


in Harvard Style

Hakkoum H., Idri A., Abnane I. and Luis Fernades-Aleman J. (2023). Does Categorical Encoding Affect the Interpretability of a Multilayer Perceptron for Breast Cancer Classification?. In Proceedings of the 12th International Conference on Data Science, Technology and Applications - Volume 1: DATA; ISBN 978-989-758-664-4, SciTePress, pages 351-358. DOI: 10.5220/0012084800003541


in Bibtex Style

@conference{data23,
author={Hajar Hakkoum and Ali Idri and Ibtissam Abnane and José Luis Fernades-Aleman},
title={Does Categorical Encoding Affect the Interpretability of a Multilayer Perceptron for Breast Cancer Classification?},
booktitle={Proceedings of the 12th International Conference on Data Science, Technology and Applications - Volume 1: DATA},
year={2023},
pages={351-358},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012084800003541},
isbn={978-989-758-664-4},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 12th International Conference on Data Science, Technology and Applications - Volume 1: DATA
TI - Does Categorical Encoding Affect the Interpretability of a Multilayer Perceptron for Breast Cancer Classification?
SN - 978-989-758-664-4
AU - Hakkoum H.
AU - Idri A.
AU - Abnane I.
AU - Luis Fernades-Aleman J.
PY - 2023
SP - 351
EP - 358
DO - 10.5220/0012084800003541
PB - SciTePress