selected and the rest are discarded. Drawing the
selected contour line into an array of the same size as
the image, initially containing zero values, results in
a mask containing binary 1 values only at the location
of the pixels representing the lesion. Using a bit-level
AND operation, the mask just created is combined
with the original photograph to obtain the localized
birthmark with a black background color.
In order to use the samples just processed for deep
learning purposes, they need to be formatted
according to the expected input of the neural network.
In practice, this involves two simple steps. The
upright orientation photographs are rotated clockwise
by 90 degrees to ensure that the orientation
differences do not provide false information to the
CNN. The need for this operation is checked simply
by the ratio of the lengths of the sides. Since the
neural network used in this project receives inputs of
248×248 pixels, the last step in the processing is
rescaling. To do this, a black background image of the
same size as the longest side of the image is created
and the contents of the photo to be reduced are copied
onto it, starting from the top left corner. In the
resulting photo, the content is oriented towards the
top left corner, and can therefore be resized easily
without loss of data. (Kalouche, 2016)
2.3 The InceptionV3 based CNN
Architecture
Since during the preprocessing phase all samples are
converted to 248×248 pixels and the network operates
with RGB color space photographs, the base model is
initialized with input parameters of dimension
(248×248×3). In light of the knowledge transfer to be
applied, the top layers of the model are not imported.
The loaded layers use the original weight values set
on the ImageNet database (Russakovsky, 2015).
All the layers of the resulting model are then
frozen to ensure that their weight parameters are not
altered during the learning process.
In order to use the network as intended, it is
necessary to add additional layers: first, a layer with
a ReLU activation function of 1024 neurons is added
to the last, i.e. output ("mixed7") layer, which is
converted to one-dimensional. To obtain the
corresponding output, another layer is added, in this
case with a single neuron, activated by a softmax
function. In between the two layers attached to the
network, a dropout regularisation layer with a 20%
dropout rate is placed to facilitate higher-level
generalisation.
The model uses an RMSProp optimizer to
minimize the value of the loss function, which is an
optimizer that operates in a similar way to the
momentum-based solutions, but actually works by
parameter-level changes. Described by mathematical
equations, the operation of RMSProp is as follows:
(1)
where E[g
2
] is the moving average of squared
gradients, w is the weight, δC/δw is the gradient of the
cost function with respect to the weight, η is the
learning rate and β is the moving average parameter
(good default value: 0.9).
The corresponding learning rate is initialized at a
lower than usual value of one ten-thousandth.
The actual value of the loss function is calculated
using the binary cross-entropy function, which is
often used in the deep learning domain. To measure
the performance of the resulting network, various
metrics are calculated, which are: accuracy,
validation accuracy, average absolute error.
As a further regularization step, a callback
function implementing Early Stopping was
implemented to stop the teaching in time. The idea is
that it monitors the value of the validation loss
function and monitors the learning process of the
network, stopping it if the value of the function seems
to increase persistently over more than a predefined
number of epochs. In the case of the network, this
tolerance value is three epochs. The learning process
thus runs until it is stopped, or in other cases for 500
epochs.
The complete network configuration is
summarised in Table 1.
Table 1: Parameter values of the network.
Parameters Values
Learning rate 0.001-0.0001
Epoch count 500
Color mode RGB, 3 channels
In