and the decoder of the DCAE will be extracted and
used as the final high-level features of the system.
The DCAE will help to encode the geometrical
details of the cells contained in the original pictures.
The discrimination potentiality carried by the
extracted features allows us to feed them as the inputs
of a shallow nonlinear classifier, which will certainly
find a way to discriminate them. The proposed
method was tested on the SNP HEp-2 Cell dataset
(Wiliem et al.) and the results show that the proposed
features outperform by far the conventional and
popular handcrafted features and perform at least as
well as the state-of-the-art supervised deep learning
based methods.
2 PROPOSED METHODOLOGY
Auto-encoders (Hinton et al.) are unsupervised
learning methods that are used for the purpose of
feature extraction and dimensionality reduction of
data. Neural network based auto-encoder consists of
an encoder and a decoder. The encoder takes an input
of dimension d, and maps it to a hidden
representation , of dimension r, using a
deterministic mapping function such that:
= f(Wx + b) (1)
where the parameters W and b are the weights and
biases associated with the encoder. The decoder then
takes the output of the encoder and uses the same
mapping function in order to provide a
reconstruction that must be of the same shape or in
the same form (which means almost equal to) as the
original input signal . Using equation (1), the output
of the decoder is also given by:
z = f(W’x + b’) (2)
where the parameters W’ and b’ are the weights and
bias associated with the decoder layer. Finally, the
network must learn the parameters W, W’, b and b’
so that z must be close or, if possible, equal to x. In
final, the network leans to minimize the differences
between the encoder’s input x and the decoder’s
output.
This encoding-decoding process can be done with
the use of convolutional neural networks by using
what we call the deep convolutional autoencoder
(DCAE). Unlike conventional neural networks,
where you can set the size of the output that you want
to get, the convolutional neural networks are
characterized by the process of down-sampling,
accomplished by the pooling layers, which are
incorporated in their architecture. And this sub-
sampling process has as consequence the loss of the
input’s spatial information while we go deeper inside
the network.
To tackle this problem, we can use DCAE instead
of conventional convolutional neural networks. In the
DCAE, after the down-sampling process
accomplished by the encoder, the decoder tries to up-
sample the representation until we reconstruct the
original size. This can be made by backwards
convolution often called “deconvolution” operations.
The final solution of the network can be written in the
form:
, ’, , ’
argmin
,
,,
,
(3)
where z denotes the decoder’s output and x is the
original image. The function L in equation (3)
estimates the differences between the x and z. So, the
solution of equation (3) represents the parameter
values that minimize the most the difference between
input x and the reconstruction z.
In our experiments, the feature vectors extracted
from the DCAE contain 4096 elements. The second
part of the method consists of giving this feature
vector to a shallow artificial neural network (ANN).
Finally, in order to predict the cell type, a supervised
learning process will be conducted using the extracted
features from the DCAE as the inputs and a 2 layered
ANN as the classifier.
3 RESULTS AND DISCUSSION
There are 1,884 cellular images in the dataset, all of
them extracted from the 40 different specimen
images. Different specimens were used for
constructing the training and testing image sets, and
both sets were created in such a way that they cannot
contain images from the same specimen. From the 40
specimen, 20 were used for the training sets and the
remaining 20 were used for the testing sets. In total
there are 905 and 979 cell images for the training and
testing sets, respectively. Each set (training and
testing) contains five-fold validation splits of
randomly selected images. In each set, the different
splits are used for cross validating the different
models, each split containing 450 images
approximatively. The SNPHEp-2 dataset was
presented by Wiliem et al. (2016). Figure 1 shows the
example images of the five different cell types
randomly selected from the dataset.
As previously mentioned, the created feature
vectors extracted from the DCAE contain 4096
elements. So, our network will have 4096 neurons in