3.1 Augmentation
The stage of augmentation consists of applying two
types of transformations. Although it is often
observed that data augmentation takes place by
simple alterations of the original images (rotation, flip
etc.), the methodology follows a different path by
selecting a contrast enhancement and denoising
algorithm to reach its goal. The choice is based on
experiments that demonstrated the improved
performance of the classification algorithm in images
that were initially imposed to contrast enhancement
and denoising afterwards. In order to get the first set
of images, Contrast Limited Adaptive Histogram
Equalization (CLAHE) is performed. CLAHE
(Zuiderveld, 1994) is basically an Adaptive
Histogram Equalization algorithm; therefore, it
generates localized image histograms corresponding
to each area that displays different brightness levels
from another, and through them increases the
intensity value at the points where edges are located.
For the generation of the second set of images a Non-
Local Means Denoising algorithm is applied on the
contrast enhanced image. The NL Means (Buades et
al, 2011) Denoising algorithm is utilized to reduce
noise through non-local means. This algorithm works
as a convolutional filter calculating the mean from the
values of all the pixels in the image (instead of only
the adjacent pixels) with added weight on each pixel.
The data augmentation procedure results to the
triplication of the dataset size, which is essential for
training the neural network in the predictive model.
In Figure 2, the initial RCM image showing an Acral
Nevus and two synthetic copies produced by the
augmentation procedure.
Figure 2: Data Augmentation. The initial image on the left,
the contrast enhanced image in the centre and the denoised
image on the right.
3.2 Feature Extraction
Each image is processed for the extraction of visual
features utilizing the SURF and Haralick algorithm.
The application of the SURF algorithm (Bay et al,
2008) to each image is performed locally on the
interest points that are detected by a fast Hessian
Detector. This operation results to the extraction of a
large number of 64-dimensional vectors, which are
representative of the information depicted in each
interest point. On the other hand, Haralick features
(Haralick, 1979) are extracted globally on each image
producing a 14-dimensional vector. Concluding this
procedure, a set of 64 dimensional vectors and a 14-
dimensional vector is assigned to each image. The
combination of these two techniques has been proven
to be rather efficient in the classification of colorectal
histopathology images in (Kallipolitis and
Maglogiannis, 2019), exhibiting similar patterns.
3.3 Modelling
In order to model the information extracted from the
RCM images, a visual vocabulary is created by K-
Means clustering of the whole set of 64 dimensional
vectors from the augmented dataset. The appropriate
number of clusters is defined by performing elbow
analysis while clustering. At a certain number of
cluster (for the system k=345) the slope of the
graphical representation becomes shallow. The k
values that belong to the shallow curve are excluded
to avoid the known curse of high dimensionality. The
K-Means clustering leads to the formation of a 345-
word visual vocabulary, where each word represents
each centroid of K clusters. In order to feed the next
step (classification), each image needs to be
represented as a single vector. The utilization of a
local feature extractor (SURF) creates the necessity
of a structure (Visual vocabulary) that can map
multiple vectors into one. This mapping operation is
performed by associating the interest points of each
image to the visual words of the vocabulary. The
association takes place by measuring the Euclidean
distance between visual words and interest points.
The completion of this procedure leads to the
representation of each image with a 345-dimensional
vector (vocabulary vector). To reach the form of the
final vector the vocabulary vector is concatenated
with the 14-dimensional Haralick vector. However,
values deriving from the Haralick algorithm are by far
greater than the values deriving from the mapping.
Therefore, the Haralick values are normalized
according to the minimum and maximum values of
the vocabulary vector.
3.4 Classification
The 359-dimensional feature vector is the input to a
simple neural network which consists of three fully
connected layers. A simple fully connected neural
network approach is selected instead of a deep