A Comparative Study on the Impact of Categorical Encoding on Black Box Model Interpretability
Hajar Hakkoum, Ali Idri, Ali Idri
2024
Abstract
This study explores the challenge of opaque machine learning models in medicine, focusing on Support Vector Machines (SVMs) and comparing their performance and interpretability with Multilayer Perceptrons (MLPs). Using two medical datasets (breast cancer and lymphography) and three encoding methods (ordinal, one-hot, and dummy), we assessed model accuracy and interpretability through a decision tree surrogate and SHAP Kernel explainer. Our findings highlight a preference for ordinal encoding for accuracy, while one-hot encoding excels in interpretability. Surprisingly, dummy encoding effectively balanced the accuracy-interpretability trade-off.
DownloadPaper Citation
in Harvard Style
Hakkoum H. and Idri A. (2024). A Comparative Study on the Impact of Categorical Encoding on Black Box Model Interpretability. In Proceedings of the 13th International Conference on Data Science, Technology and Applications - Volume 1: DATA; ISBN 978-989-758-707-8, SciTePress, pages 384-391. DOI: 10.5220/0012766300003756
in Bibtex Style
@conference{data24,
author={Hajar Hakkoum and Ali Idri},
title={A Comparative Study on the Impact of Categorical Encoding on Black Box Model Interpretability},
booktitle={Proceedings of the 13th International Conference on Data Science, Technology and Applications - Volume 1: DATA},
year={2024},
pages={384-391},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012766300003756},
isbn={978-989-758-707-8},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 13th International Conference on Data Science, Technology and Applications - Volume 1: DATA
TI - A Comparative Study on the Impact of Categorical Encoding on Black Box Model Interpretability
SN - 978-989-758-707-8
AU - Hakkoum H.
AU - Idri A.
PY - 2024
SP - 384
EP - 391
DO - 10.5220/0012766300003756
PB - SciTePress