
REFERENCES
Amparore, E. G., Perotti, A., and Bajardi, P. (2021). To trust
or not to trust an explanation: using leaf to evaluate
local linear xai methods. PeerJ Computer Science, 7.
Bhatt, U., Weller, A., and Moura, J. M. F. (2021). Eval-
uating and aggregating feature-based model explana-
tions. In Proceedings of the Twenty-Ninth Interna-
tional Joint Conference on Artificial Intelligence.
Castelvecchi, D. (2016). Can we open the black box of ai?
Nature, 538:20–23.
Doshi-Velez, F. and Kim, B. (2017). Towards a rigorous
science of interpretable machine learning. arXiv: Ma-
chine Learning.
Dwivedi, R., Dave, D., Naik, H., Singhal, S., Omer, R.,
Patel, P., Qian, B., Wen, Z., Shah, T., Morgan, G., and
Ranjan, R. (2023). Explainable ai (xai): Core ideas,
techniques, and solutions. ACM Comput. Surv., 55(9).
Dwork, C., Hardt, M., Pitassi, T., Reingold, O., and Zemel,
R. (2012). Fairness through awareness. In Proceed-
ings of the 3rd Innovations in Theoretical Computer
Science Conference, ITCS ’12, page 214–226. ACM.
Esteva, A., Robicquet, A., Ramsundar, B., Kuleshov, V.,
DePristo, M., Chou, K., Cui, C., Corrado, G., Thrun,
S., and Dean, J. (2019). A guide to deep learning in
healthcare. Nature Medicine, 25.
Holzinger, A., Langs, G., Denk, H., Zatloukal, K., and
M
¨
uller, H. (2019). Causability and explainabilty of
artificial intelligence in medicine. Wiley Interdisci-
plinary Reviews: Data Mining and Knowledge Dis-
covery, 9:e1312.
Irvin, J., Rajpurkar, P., Ko, M., Yu, Y., Ciurea-Ilcus, S.,
Chute, C., Marklund, H., Haghgoo, B., Ball, R., Sh-
panskaya, K., Seekins, J., Mong, D. A., Halabi, S. S.,
Sandberg, J. K., Jones, R., Larson, D. B., Langlotz,
C. P., Patel, B. N., Lungren, M. P., and Ng, A. Y.
(2019). Chexpert: A large chest radiograph dataset
with uncertainty labels and expert comparison. In Pro-
ceedings of the Thirty-Third AAAI Conference.
Johnson, A. E. W., Pollard, T. J., Berkowitz, S. J., Green-
baum, N. R., Lungren, M. P., Deng, C.-y., Mark, R. G.,
and Horng, S. (2019). Mimic-cxr, a de-identified pub-
licly available database of chest radiographs with free-
text reports. Scientific Data, 6(1):317.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2017). Im-
agenet classification with deep convolutional neural
networks. Commun. ACM, 60(6):84–90.
Le, P. Q., Nauta, M., Nguyen, V. B., Pathak, S., Schl
¨
otterer,
J., and Seifert, C. (2023). Benchmarking explainable
ai - a survey on available toolkits and open challenges.
In Elkind, E., editor, Proceedings of the Thirty-Second
International Joint Conference on Artificial Intelli-
gence, IJCAI-23, pages 6665–6673.
Lundberg, S. M. and Lee, S.-I. (2017). A unified approach
to interpreting model predictions. In Proceedings of
the 31st International Conference on Neural Informa-
tion Processing Systems, NIPS’17, page 4768–4777,
Red Hook, NY, USA. Curran Associates Inc.
Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., and
Galstyan, A. (2021). A survey on bias and fairness in
machine learning. ACM Comput. Surv., 54(6).
Obermeyer, Z., Powers, B., Vogeli, C., and Mullainathan, S.
Dissecting racial bias in an algorithm used to manage
the health of populations. Science, 366:447–453.
Panigutti, C., Perotti, A., Panisson, A., Bajardi, P., and Pe-
dreschi, D. (2021). Fairlens: Auditing black-box clin-
ical decision support systems. Information Processing
& Management, 58(5):102657.
Petsiuk, V., Das, A., and Saenko, K. (2018). Rise: Ran-
domized input sampling for explanation of black-box
models. ArXiv, abs/1806.07421.
Ribeiro, M. T., Singh, S., and Guestrin, C. (2016). ”why
should i trust you?”: Explaining the predictions of any
classifier. In Proceedings of the 22nd ACM SIGKDD
International Conference on Knowledge Discovery
and Data Mining, KDD ’16, page 1135–1144. ACM.
Saleiro, P., Kuester, B., Stevens, A., Anisfeld, A., Hinkson,
L., London, J., and Ghani, R. (2018). Aequitas: A
bias and fairness audit toolkit.
Selbst, A. D., Boyd, D., Friedler, S. A., Venkatasubrama-
nian, S., and Vertesi, J. (2019). Fairness and abstrac-
tion in sociotechnical systems. In Proceedings of the
Conference on Fairness, Accountability, and Trans-
parency, FAT* ’19, page 59–68. ACM.
Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R.,
Parikh, D., and Batra, D. (2017). Grad-cam: Visual
explanations from deep networks via gradient-based
localization. In 2017 IEEE International Conference
on Computer Vision (ICCV), pages 618–626.
Seyyed-Kalantari, L., Zhang, H., McDermott, M., Chen,
I., and Ghassemi, M. (2021). Underdiagnosis bias of
artificial intelligence algorithms applied to chest ra-
diographs in under-served patient populations. Nature
Medicine, 27.
Shin, H.-C., Roth, H. R., Gao, M., Lu, L., Xu, Z., Nogues,
I., Yao, J., Mollura, D. J., and Summers, R. M. (2016).
Deep convolutional neural networks for computer-
aided detection: Cnn architectures, dataset character-
istics and transfer learning. Ieee Transactions on Med-
ical Imaging, 35:1285 – 1298.
Sundararajan, M., Taly, A., and Yan, Q. (2017). Axiomatic
attribution for deep networks. In Proceedings of the
34th International Conference on Machine Learning -
Volume 70, ICML’17, page 3319–3328. JMLR.org.
Wang, X., Peng, Y., Lu, L., Lu, Z., Bagheri, M., and Sum-
mers, R. M. (2017). Chestx-ray8: Hospital-scale chest
x-ray database and benchmarks on weakly-supervised
classification and localization of common thorax dis-
eases. In 2017 IEEE Conference on Computer Vision
and Pattern Recognition (CVPR), pages 3462–3471.
Xu, C., Greer, C., Joshi, M. N., and Doshi, T. (2020). Fair-
ness indicators demo: Scalable infrastructure for fair
ml systems.
Zeiler, M. D. and Fergus, R. (2013). Visualizing
and understanding convolutional networks. ArXiv,
abs/1311.2901.
Auditing Fairness and Explainability in Chest X-Ray Image Classifiers
1315