Authors:
David Gaviria
1
;
Md Saker
2
and
Petia Radeva
3
;
4
Affiliations:
1
Facultat d’Informatica de Barcelona, Universitat Politècnica de Catalunya, Carrer de Jordi Girona 31, Barcelona, Spain
;
2
Department of Engineering Science, University of Oxford, Headington OX3 7DQ, Oxford, England, U.K.
;
3
Department of Mathematics and Computer Science, Universitat de Barcelona, Gran Via de les Corts Catalanes 585, Barcelona, Spain
;
4
Computer Vision Center, Bellaterra, Barcelona, Spain
Keyword(s):
Skin Cancer, Melanoma, ISIC Challenge, Vision Transformers.
Abstract:
Vision Transformers (ViTs) are deep learning techniques that have been gaining in popularity in recent years. In this work, we study the performance of ViTs and Convolutional Neural Networks (CNNs) on skin lesions classification tasks, specifically melanoma diagnosis. We show that regardless of the performance of both architectures, an ensemble of them can improve their generalization. We also present an adaptation to the Gram-OOD* method (detecting Out-of-distribution (OOD) using Gram matrices) for skin lesion images. Moreover, the integration of super-convergence was critical to success in building models with strict computing and training time constraints. We evaluated our ensemble of ViTs and CNNs, demonstrating that generalization is enhanced by placing first in the 2019 and third in the 2020 ISIC Challenge Live Leaderboards (available at https://challenge.isic-archive.com/leaderboards/live/).