Crawford, L., Flaxman, S. R., Runcie, D. E., and West, M.
(2019). Variable prioritization in nonlinear black box
methods: A genetic association case study. The An-
nals of Applied Statistics, 13(2):958.
Dheeru, D. and Casey, G. (2017). UCI machine learning
repository. Accessed: 2024-09-12.
Fama, E. F. and French, K. R. (2024). Fama/French 5
factors. https://mba.tuck.dartmouth.edu. Accessed:
2024-09-12.
Gevrey, M., Dimopoulos, I., and Lek, S. (2003). Review
and comparison of methods to study the contribution
of variables in artificial neural network models. Eco-
logical Modelling, 160(3):249–264.
Ish-Horowicz, J., Udwin, D., Flaxman, S., Filippi, S., and
Crawford, L. (2019). Interpreting deep neural net-
works through variable importance. arXiv preprint
arXiv:1901.09839.
Kim, B., Wattenberg, M., Gilmer, J., Cai, C., Wexler, J.,
Viegas, F., et al. (2018). Interpretability beyond fea-
ture attribution: Quantitative testing with concept acti-
vation vectors (TCAV). In Proceedings of the 35th In-
ternational Conference on Machine Learning, pages
2668–2677. PMLR.
Kumar, E. I., Venkatasubramanian, S., Scheidegger, C., and
Friedler, S. (2020). Problems with Shapley-value-
based explanations as feature importance measures. In
Proceedings of the 37th International Conference on
Machine Learning, pages 5491–5500. PMLR.
Liang, D., Tsai, C.-F., and Wu, H.-T. (2015). The effect
of feature selection on financial distress prediction.
Knowledge-Based Systems, 73:289–297.
Lundberg, S. M. and Lee, S.-I. (2017). A unified approach
to interpreting model predictions. Advances in Neural
Information Processing Systems, 30.
Molnar, C. (2020). Interpretable Machine Learning.
Lulu.com.
Ribeiro, M. T., Singh, S., and Guestrin, C. (2016). Why
should I trust you? Explaining the predictions of any
classifier. In Proceedings of the 22nd ACM SIGKDD
International Conference on Knowledge Discovery
and Data Mining, pages 1135–1144.
Shrikumar, A., Greenside, P., and Kundaje, A. (2017).
Learning important features through propagating ac-
tivation differences. In Proceedings of the 34th In-
ternational Conference on Machine Learning, pages
3145–3153. PMLR.
Strumbelj, E. and Kononenko, I. (2010). An efficient expla-
nation of individual classifications using game theory.
The Journal of Machine Learning Research, 11:1–18.
Sturmfels, P., Lundberg, S., and Lee, S.-I. (2020). Visualiz-
ing the impact of feature attribution baselines. Distill.
Published: 2020-01-10, Accessed: 2024-09-12.
Sundararajan, M., Taly, A., and Yan, Q. (2017). Axiomatic
attribution for deep networks. In Proceedings of the
34th International Conference on Machine Learning,
pages 3319–3328. PMLR.
Xiaomao, X., Xudong, Z., and Yuanfang, W. (2019). A
comparison of feature selection methodology for solv-
ing classification problems in finance. In Journal of
Physics: Conference Series, volume 1284, pages 12–
16. IOP Publishing.
Yeh, C.-K., Hsieh, C.-Y., Suggala, A. S., Inouye, D. I.,
and Ravikumar, P. (2019). On the (in)fidelity
and sensitivity for explanations. arXiv preprint
arXiv:1901.09392.
APPENDIX
The appendix contains the experiment results de-
scribed in the paper. Note that the results are rounded
to three decimals unless a higher grade of granularity
is required for comparisons (most often for the infi-
delity metrics).
Table 7: Mean results (Standard Deviation) for 100 repeated
experiments on case I data. Best-in-class values are marked
with *.
determination infidelity sensitivity
IG 0.186 0.0035 2.782*
(0.118) (0.0020) (0.822)
GSHAP 0.183 0.0037 21808.902
(0.118) (0.0021) (41842.360)
LIME 0.107 0.0025* 921.865
(0.093) (0.0012) (5259.562)
SVS 0.223 0.0028 142.402
(0.144) (0.0013) (87.815)
DeepLIFT 0.236* 0.0040 7.512
(0.114) (0.0019) (1.200)
Table 8: Mean results (Standard Deviation) for 100 repeated
experiments on the case II data. Best-in-class values are
marked with *.
determination infidelity sensitivity
IG 0.129 0.0033 2.872*
(0.108) (0.0018) (0.911)
GSHAP 0.126 0.0036 24776.439
(0.106) (0.0018) (118520.80)
LIME 0.101 0.0026* 434.697
(0.089) (0.0012) (545.340)
SVS 0.146 0.0029 144.623
(0.119) (0.0014) (79.661)
DeepLIFT 0.174* 0.0041 7.537
(0.108) (0.0020) (1.441)
Feature Importance for Deep Neural Networks: A Comparison of Predictive Power, Infidelity and Sensitivity
25