network is not able to learn enough representative
instances for each class label (i.e., skin cancer types)
to extract the distinctive image features for the
classification. In addition, those heterogeneous
datasets may exhibit different data variations and
characteristics that the hand-crafted CNNs may not be
able to provide promising detection performance.
To address these issues, ensemble CNN
architectures have been developed. Specifically, they
are hybrid CNN architectures (Aldwgeri, A., 2019;
Mahbod, A., 2019; Al Mamun, Md., 2021) that
combine different single CNN models, which have
been pre-trained on a large number of images and
adapted to their diverse variations to extract more
unique features from the domain-specific images.
Models that conduct ensemble learning deliver a better
classification performance than that of single CNN
models only. Currently, there are several well-known
pre-trained CNN models, such as VGGNet (Simonyan,
K., 2015), GoogleNet (Szegedy, C., 2015), and ResNet
(He, K., 2016), which are pre-trained on ImageNet
(ImageNet, 2021) and CIFAR (Krizhevsky, A., 2009).
However, to identify the best possible combination of
those pre-trained CNN models is challenging due to a
large number of different possible model combinations
with the high computational learning cost. In addition,
even though learning the best model combination
among all the possible pre-trained CNNs is a dynamic
process, those pre-trained CNN models are still in the
static architectures that may lack the adaptability for
diverse image variations (e.g., skin cancer images).
Those pre-trained CNN models may perform
significantly worse with heterogeneous data sources
not encountered before.
To bridge the above gaps, (Sun, Y., 2020) have
developed an automatically evolving genetic-based
CNN (AE-CNN) framework to dynamically design
and construct an optimal CNN architecture on any
available image dataset without requiring any manual
intervention. The experimental results show that the
CNN architecture generated from the framework
outperforms the above state-of-the-art CNNs’ peer
competitors in terms of the classification accuracy
performance. However, the AE-CNN framework that
generates an optimal architecture is not fully designed
for skin cancer detection and classification and does
not consider two crucial components: (1) pre-
processing the raw images (e.g., lesion segmentation,
image augmentation, etc.,) and (2) selecting the best
CNN model based upon the entire training dataset
instead of a separated validation dataset only. To
mitigate the above shortcomings, we enhance and
customize AE-CNN to develop and implement an
auto-designed CNN (AutoCNN) framework that
enables domain users to dynamically generate an
optimal CNN architecture on their available datasets
to assist physicians in early detecting multi-skin
cancer diseases (MSCD) over dermatoscopic images.
Specifically, the contributions of this work are three-
fold: (1) integrate the pre-processing module into AE-
CNN to sanitize and diversify dermatoscopic images,
(2) enhance the evaluation algorithm of AE-CNN to
improve the model selection process by using the k-
fold cross-validation (Sanjay, M., 2018) on the entire
training dataset, and (3) conduct an experimental
study, using the 25,331 dermatoscopic images
provided by the 2019 International Skin Imaging
Collaboration (ISIC, 2019), to present the
classification accuracy. From the results, we can
conclude that the CNN model constructed by
AutoCNN outperforms the model constructed by AE-
CNN to detect and classify MSCD. The source code
will be available to the public after the acceptance.
The remainder of the paper is organized as
follows. First, we briefly describe the AE-CNN
framework in Section 2. In Section 3, we illustrate our
enhanced AutoCNN framework and its workflow.
We also demonstrate and explain our pipelines of
both skin cancer image segmentation and its
augmentation in Section 4 and 5, respectively. In
Section 6, we discuss and summarize the
experimental study and results. In Section 7, we
conclude and briefly outline our future work.
2 AE-CNN FRAMEWORK
Fig. 1 is the AE-CNN framework. First, the size of
the population N, i.e., the total number of individual
CNN architectures in each generation, is predefined.
Note that each CNN individual in each generation is
trained on 80% of the ISIC-2019 image dataset and
validated on 10% of the dataset in the fitness
evaluation module in our experimental study. Each
CNN individual of a generation in the population N
takes part in the evolutionary process of the genetic
algorithm with the maximal generation number of T.
During the population evolutionary process, a new
CNN offspring is generated from its selected parents
with the crossover and mutation operations, while the
parents are selected by the binary tournament
selection. After the fitness of the generated offspring
has been evaluated, a new population is selected with
the environmental selection operation from the
current operation that contains the current individuals
and the generated offspring, and the parents survive
into the next evolutionary process, the next
generation. Towards the end, the framework