
ISIC Archive
1
: The ISIC Archive, hosted by the In-
ternational Skin Imaging Collaboration (ISIC), con-
tains a large publicly available collection of skin im-
ages. The majority are dermoscopic images of pig-
mented skin lesions, which are not suitable for our
dataset. Instead we filtered the archive images for to-
tal body photographs (TBPs), which yielded 36 im-
ages, showing the posterior torso. Based on these im-
ages, we extracted multiple smaller images of compa-
rable sizes. This resulted in 160 normal images and 11
abnormal images showing scars or imprints of cloth-
ing.
ArsenicSkinImagesBD
2
: The ArsenicSkinIm-
agesBD dataset (Emu et al., 2024) contains 741
images of 37 arsenic-affected and 741 images of 76
non-arsenic-affected individuals from Bangladesh,
captured by smartphone cameras. Of the 741 non-
affected images, 175 were used as normal images.
The remaining images were excluded due to different
reasons (e.g. duplicates, showing hands / fingers or
potential skin conditions).
SD-198
3
: SD-198 (Sun et al., 2016) is a benchmark
dataset for clinical skin diseases containing 6,584 im-
ages from 198 classes. We selected 158 images from
the classes acne vulgaris, allergic contact dermatitis,
eczema, erythema annulare centrifugum, erythema
multiforme, factitial dermatitis, guttate psoriaris, pso-
riaris, tinea corporis and used them as abnormal im-
ages. In addition 11 healthy skin patches were ex-
tracted and added to the normal image dataset.
Google Image Search: Another 21 images contain-
ing erythema or hematoma were collected using a
Google Image Search and added to the abnormal
dataset.
Table 1 shows the number of normal and abnormal
images by source dataset. In total 346 normal and 190
anomalous images were collected. The dataset was
splitted into three datasets for training, validation and
evaluation. Models were trained on 250 normal im-
ages. A validation set of 62 images (20 normal and
42 abnormal) was utilized to optimize hyperparame-
ters and to save the best model for evaluation. The
test set for final evaluation contains 224 images (76
normal and 148 abnormal).
For the autoencoder, all images were resized to
128 × 128 and pixel values were scaled into a range
of [0, 1]. For SimpleNet, all images were resized to
224 × 224 and pixel values were first scaled into a
range of [0, 1] and then normalized according to the
mean and standard deviation of ImageNet as in Liu
1
https://www.isic-archive.com/
2
https://data.mendeley.com/datasets/x4hgnjj5gv/2
3
https://huggingface.co/datasets/
resyhgerwshshgdfghsdfgh/SD-198
et al. (2023). No data augmentation was applied.
3.2 Model Architectures, Training and
Evaluation
In the following we describe the two anomaly detec-
tion models, the training procedure and the evaluation
metrics used in this study.
SimpleNet: SimpleNet was introduced by Liu et al.
(2023) for the task of detecting and localizing anoma-
lies in industrial images. The authors argue that ex-
isting approaches (e. g. reconstruction- and feature-
based) have some drawbacks and therefore proposed
SimpleNet which combines several approaches and
comes with further improvements. SimpleNet con-
sists of four components. The first component is the
feature extractor, a pretrained neural network used for
extracting local image features. Since pretrained net-
works are usually trained on natural images such as
ImageNet and not on industrial or medical images, a
simple neural network called feature adaptor is uti-
lized to map the extracted features into the target do-
main. The third component is an anomalous feature
generator which artificially generates anomalous fea-
tures by adding random gaussian noise to normal fea-
tures. Last, a simple discriminator network is trained
to discriminate the normal and the artificially gener-
ated anomalous features. In contrast to Shen et al.
(2020) the discrimination is performed on individ-
ual local feature vectors, not on whole images. Sim-
pleNet with all its components can be trained in an
end-to-end fashion. During inference the generation
of anomalous features is omitted. Local features are
extracted and adapted from the input image and then
mapped to an anomaly score by the discriminator net-
work. Arranging all local anomaly scores in a 2D-
grid yields an anomaly map, highlighting anomalous
areas in the input image. Based on the anomaly map
an image level anomaly score can be computed. In
the original publication of SimpleNet the maximum
anomaly score is used.
For our experiment we used the same hyperparam-
eter configuration as in Liu et al. (2023). We trained
SimpleNet for 160 epochs with a batchsize of 8 and
saved the best model based on validation anomaly de-
tection performance.
Autoencoder: As a baseline model, we implemented
a convolutional autoencoder (AE), consisting of an
encoder and a symmetrical decoder. The encoder
compresses an input image x ∈ R
H×W ×C
into a latent
feature vector z ∈ R
d
. Based on this feature vector, the
decoder reconstructs the original image. The encoder
consists of four convolutional layers, each downsam-
pling the image resolution to
H
in
2
×
W
in
2
. The first con-
Semi-Supervised Anomaly Detection in Skin Lesion Images
537