
Schnabel, J. A., Davatzikos, C., Alberola-L
´
opez, C.,
and Fichtinger, G., editors, Medical Image Comput-
ing and Computer Assisted Intervention – MICCAI
2018, pages 841–850, Cham. Springer International
Publishing.
Mishra, N. K. and Celebi, M. E. (2016). An Overview
of Melanoma Detection in Dermoscopy Images
Using Image Processing and Machine Learning.
arXiv:1601.07843 [cs, stat]. arXiv: 1601.07843.
Nachbar, F., Stolz, W., Merkle, T., Cognetta, A. B., Vogt,
T., Landthaler, M., Bilek, P., Braun-Falco, O., and
Plewig, G. (1994). The ABCD rule of dermatoscopy.
High prospective value in the diagnosis of doubtful
melanocytic skin lesions. Journal of the American
Academy of Dermatology, 30(4):551–559.
Pacheco, A. G., Lima, G. R., Salom
˜
ao, A. S., Krohling,
B., Biral, I. P., de Angelo, G. G., Alves Jr, F. C., Es-
gario, J. G., Simora, A. C., Castro, P. B., Rodrigues,
F. B., Frasson, P. H., Krohling, R. A., Knidel, H., San-
tos, M. C., do Esp
´
ırito Santo, R. B., Macedo, T. L.,
Canuto, T. R., and de Barros, L. F. (2020). Pad-ufes-
20: A skin lesion dataset composed of patient data and
clinical images collected from smartphones. Data in
Brief, 32:106221.
Penzel, N., Reimers, C., Bodesheim, P., and Denzler, J.
(2022). Investigating neural network training on a fea-
ture level using conditional independence. In ECCV
Workshop on Causality in Vision (ECCV-WS), pages
383–399, Cham. Springer Nature Switzerland.
Perez, F., Vasconcelos, C., Avila, S., and Valle, E. (2018).
Data augmentation for skin lesion analysis.
Ramachandran, P., Parmar, N., Vaswani, A., Bello, I., Lev-
skaya, A., and Shlens, J. (2019). Stand-alone self-
attention in vision models. Advances in Neural Infor-
mation Processing Systems, 32.
Reimers, C., Penzel, N., Bodesheim, P., Runge, J., and Den-
zler, J. (2021). Conditional dependence tests reveal
the usage of abcd rule features and bias variables in
automatic skin lesion classification. In CVPR ISIC
Skin Image Analysis Workshop (CVPR-WS), pages
1810–1819.
Reimers, C., Runge, J., and Denzler, J. (2020). Determin-
ing the relevance of features for deep neural networks.
In European Conference on Computer Vision, pages
330–346. Springer.
Runge, J. (2018). Conditional independence testing based
on a nearest-neighbor estimator of conditional mutual
information. In International Conference on Artificial
Intelligence and Statistics. PMLR.
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S.,
Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bern-
stein, M., et al. (2015). Imagenet large scale visual
recognition challenge. International journal of com-
puter vision, 115:211–252.
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and
Chen, L.-C. (2018). Mobilenetv2: Inverted residu-
als and linear bottlenecks. In Proceedings of the IEEE
conference on computer vision and pattern recogni-
tion, pages 4510–4520.
Scope, A., Marchetti, M. A., Marghoob, A. A., Dusza,
S. W., Geller, A. C., Satagopan, J. M., Weinstock,
M. A., Berwick, M., and Halpern, A. C. (2016). The
study of nevi in children: Principles learned and impli-
cations for melanoma diagnosis. Journal of the Amer-
ican Academy of Dermatology, 75(4):813–823.
Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R.,
Parikh, D., and Batra, D. (2017). Grad-cam: Visual
explanations from deep networks via gradient-based
localization. In Proceedings of the IEEE international
conference on computer vision, pages 618–626.
Society, A. C. (2022). Cancer facts & figures 2022. Last
accessed 02 August 2022.
Strobl, E. V., Zhang, K., and Visweswaran, S. (2019).
Approximate kernel-based conditional independence
tests for fast non-parametric causal discovery. Jour-
nal of Causal Inference.
Tan, M. and Le, Q. (2019). Efficientnet: Rethinking model
scaling for convolutional neural networks. In Interna-
tional conference on machine learning, pages 6105–
6114. PMLR.
Tolstikhin, I. O., Houlsby, N., Kolesnikov, A., Beyer, L.,
Zhai, X., Unterthiner, T., Yung, J., Steiner, A., Key-
sers, D., Uszkoreit, J., Lucic, M., and Dosovitskiy, A.
(2021). Mlp-mixer: An all-mlp architecture for vision.
CoRR, abs/2105.01601.
Trockman, A. and Kolter, J. Z. (2022). Patches are all you
need? CoRR, abs/2201.09792.
Tschandl, P., Rosendahl, C., and Kittler, H. (2018). The
HAM10000 dataset, a large collection of multi-source
dermatoscopic images of common pigmented skin le-
sions. Sci. Data, 5(1):180161.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,
L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I.
(2017). Attention is all you need. Advances in neural
information processing systems, 30.
Wang, X., Girshick, R., Gupta, A., and He, K. (2018). Non-
local neural networks. In Proceedings of the IEEE
conference on computer vision and pattern recogni-
tion, pages 7794–7803.
Welch, B. L. (1947). The generalization of ‘student’s’ prob-
lem when several different population variances are
involved. Biometrika, 34(1/2):28–35.
Xiao, G., Tian, Y., Chen, B., Han, S., and Lewis, M. (2023).
Efficient streaming language models with attention
sinks. arXiv preprint arXiv:2309.17453.
Yang, G., Luo, S., and Greer, P. (2023). A novel vision
transformer model for skin cancer classification. Neu-
ral Processing Letters, pages 1–17.
When Medical Imaging Met Self-Attention: A Love Story That Didn’t Quite Work out
157