A Comparative Study on the Impact of Categorical Encoding on Black Box Model Interpretability

Hajar Hakkoum, Ali Idri, Ali Idri

2024

Abstract

This study explores the challenge of opaque machine learning models in medicine, focusing on Support Vector Machines (SVMs) and comparing their performance and interpretability with Multilayer Perceptrons (MLPs). Using two medical datasets (breast cancer and lymphography) and three encoding methods (ordinal, one-hot, and dummy), we assessed model accuracy and interpretability through a decision tree surrogate and SHAP Kernel explainer. Our findings highlight a preference for ordinal encoding for accuracy, while one-hot encoding excels in interpretability. Surprisingly, dummy encoding effectively balanced the accuracy-interpretability trade-off.

Download


Paper Citation


in Harvard Style

Hakkoum H. and Idri A. (2024). A Comparative Study on the Impact of Categorical Encoding on Black Box Model Interpretability. In Proceedings of the 13th International Conference on Data Science, Technology and Applications - Volume 1: DATA; ISBN 978-989-758-707-8, SciTePress, pages 384-391. DOI: 10.5220/0012766300003756


in Bibtex Style

@conference{data24,
author={Hajar Hakkoum and Ali Idri},
title={A Comparative Study on the Impact of Categorical Encoding on Black Box Model Interpretability},
booktitle={Proceedings of the 13th International Conference on Data Science, Technology and Applications - Volume 1: DATA},
year={2024},
pages={384-391},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012766300003756},
isbn={978-989-758-707-8},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 13th International Conference on Data Science, Technology and Applications - Volume 1: DATA
TI - A Comparative Study on the Impact of Categorical Encoding on Black Box Model Interpretability
SN - 978-989-758-707-8
AU - Hakkoum H.
AU - Idri A.
PY - 2024
SP - 384
EP - 391
DO - 10.5220/0012766300003756
PB - SciTePress