Variability Evaluation of CNNs using Cross-validation on Viruses Images

Andr

e R. de Geus

, Andr

e R. Backes and Jefferson R. Souza

School of Computer Science, Federal University of Uberl

andia, Av. Jo

ao Naves de

Avila, 2121, Uberl

andia, MG, Brazil

Keywords:

Convolutional Neural Network, Cross-validation, Virus Classiﬁcation.

Abstract:

Virus description and recognition is an essential issue in medicine. It helps researchers to study virus attributes

such as its morphology, chemical compositions, and modes of replication. Although it can be performed

through visual inspection, it is a task highly dependent on a qualiﬁed expert. Therefore, the automation of this

task has received great attention over the past few years. In this study, we applied transfer learning from pre-

trained deep neural networks for virus species classiﬁcation. Given that many image datasets do not specify a

ﬁxed training and test sets, and to avoid any bias, we evaluated the impact of a cross-validation scheme on the

classiﬁcation accuracy. The experimental results achieved up to 89% of classiﬁcation accuracy, outperforming

previous studies by 2.8% of accuracy.

1 INTRODUCTION

Machine Learning (ML), a subset of Artiﬁcial Intel-

ligence (AI), has revolutionized many research ﬁelds

since the mid-1950s. It was responsible for the ad-

vent of Artiﬁcial Neural Networks and, more recently,

Deep Learning (DL) methods. These latter are cur-

rently state-of-the-art in many problems that can be

tackled via machine learning, in particular, classiﬁ-

cation problems. One of the most popular DL meth-

ods in computer vision is the Convolutional Neural

Network (CNN). These networks are characterized

by being composed basically of convolutional layers,

which processes the inputs considering local receptive

ﬁelds. The main application of CNNs is to process

visual information since the convolution enables us to

ﬁlter images considering its two-dimensional spatial

structure (Ponti et al., 2017). Research works on CNN

have enabled us to process images (whether aerial,

macroscopic or microscopic) which previously could

not be processed.

Virus description and recognition is an essential

issue in medicine. It helps researchers to study virus

attributes such as its morphology, chemical compo-

sitions, and modes of replication. Due to its impor-

tance, and the fact that this is a task highly depen-

dent on a qualiﬁed expert, it has received great atten-

tion in recent years. In (Proenc¸a et al., 2013), tex-

ture descriptors are used to automate the segmenta-

tion of polyomavirus particles. The authors in (dos

Santos et al., 2015) describe how to build an ensem-

ble of different texture analysis methods to classify

virus images acquired via Transmission Electron Mi-

croscopy (TEM) (Biel and Madeley, 2001). In (Wen

et al., 2016), the authors combined PCA ﬁlters and the

Completed Local Binary Patterns (CLBP) method to

extract texture descriptors from virus images for clas-

siﬁcation purposes. Similarly, Dijkstra’s algorithm is

used to extract a texture signature to classify viruses

in (Ghidoni et al., 2014).

Due to the importance of TEM virus images,

this study proposes the use of transfer learning from

four pre-trained deep neural networks (SqueezeNet,

ResNet, InceptionV3, and DenseNet) for virus im-

age classiﬁcation. Moreover, given that the evaluated

virus dataset does not specify a training and testing

sets, and to avoid any bias, we also propose to eval-

uate the impact of a cross-validation scheme on the

classiﬁcation accuracy.

The remaining of this paper is organized as fol-

lows: in Section 2, we present a review of the CNN

and the CNN models used in this work. Section 3 de-

scribes the dataset used to evaluate the different CNN

models. In Section 4, we describe the experimental

setup, while Section 5 discuss the results obtained by

each CNN model. Section 6 concludes the paper.

2 CONVOLUTIONAL NEURAL

NETWORK (CNNs)

Convolutional Neural Network (CNN) is a class of

deep learning models whose goal is to learn and to

626

R. de Geus, A., Backes, A. and Souza, J.

Variability Evaluation of CNNs using Cross-validation on Viruses Images.

DOI: 10.5220/0009352106260632

In Proceedings of the 15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2020) - Volume 4: VISAPP, pages

626-632

ISBN: 978-989-758-402-2; ISSN: 2184-4321

extract semantic features for image classiﬁcation and

segmentation (Huang et al., 2017; He et al., 2016;

Szegedy et al., 2016; Krizhevsky et al., 2012). A

CNN is a bio-inspired network that uses the concept

of receptive ﬁelds to explore spatial correlations in the

image, so that it is capable of transforming and of re-

ducing image information, thus obtaining a meaning-

ful representation of its content.

Any CNN has its main structure deﬁned by three

types of layers: convolutional, pooling, and fully con-

nected layer. The convolutional layer uses the convo-

lution operation to emulate a receptive ﬁeld and its re-

sponse to a visual stimulus. The convolutional layer is

the primary layer of CNN since it acts as an attribute

detector in the image. Pooling layers help to re-

duce the dimensionality of the image as also the CNN

sensitivity to image distortions and shifting. A few

fully connected layers, followed by a softmax layer

(Krizhevsky et al., 2012), are located at the end of

CNN, and they are used for classiﬁcation, outputting

the most probably class for a given image.

Due to the difﬁcult to model, train, and test

different network models, large computing compa-

nies (such as Google and Microsoft) have developed

CNNs models and trained them in large image data

sets. These pre-trained networks can be used to learn

generic features of new datasets, while we ﬁne-tune

the output of the network to this new problem. In this

work, we used InceptionV3, ResNet, SqueezeNet,

and DenseNet models pre-trained on the 2012 Ima-

geNet image data set, which contains 1000 classes:

• InceptionV3: Google’s research team proposed

this network model, and it introduces the incep-

tion module as an approach to reduce the compu-

tational load of CNNs while maintaining its per-

formance (Szegedy et al., 2016). The inception

module is based on factorizing and asymmetric

convolutions, in which the main goal is to re-

duce the number of connections/parameters of the

network without decreasing its efﬁciency. Incep-

tionV3 is a large CNN containing 23.8 millions of

parameters.

• ResNet: Microsoft Research (He et al., 2016)

proposed this network model, and it uses resid-

ual learning to improve network accuracy. ResNet

learns residues by using a scheme of skip connec-

tion to propagate information over layers. As a re-

sult, this scheme enables us to create deeper net-

works as it minimizes the problem of vanishing

gradients. Depending on its structure, the num-

ber of layers in ResNet can range from 18 to 152

layers, and up to 100.11 millions of parameters.

• DenseNet: different from other CNN models,

DenseNet (Huang et al., 2017) is considered a

small network is having 8 millions of parameters.

Similar to ResNet, it uses the concept of residue

connections as building blocks of its model. How-

ever, DenseNet proposes to concatenate the previ-

ous layers instead of using a summation. Addi-

tionally, DenseNet presents more group connec-

tions than other networks, so that feature maps of

all predecessor layers are used as input to all sub-

sequent layers.

• SqueezeNet: this model (Iandola et al., 2016)

was designed to be a smaller network model, but

still capable of achieving results similar to bigger

models. It introduces the concept of Fire mod-

ules with squeeze convolution layers (only 1 × 1

ﬁlters), that are fed to an expanded layer. This

model results in a network with 50× fewer pa-

rameters than AlexNet, and ideal for application

in hardware with limited memory.

Table 1 shows the number of parameters, input

size and number of convolutions of each network

model evaluated.

Table 1: CNN models speciﬁcations.

CNN # of Input # of

model parameters size conv.

DenseNet 8.0M 224 × 224 120

ResNet 25.6M 224 × 224 104

InceptionV3 23.8M 299 × 299 197

SqueezeNet 1.2M 224 × 224 24

3 IMAGE DATASET

To study the variability of CNN when subjected to

a cross-validation scheme, we opted to use an im-

age dataset presenting high inter-class and low intra-

class similarity, as it poses an additional challenge

for classiﬁcation tasks. To accomplish, that we used

the virus dataset available at www.cb.uu.se/

∼

gustaf/

virustexture/. Further details on how the images were

obtained can be found in (Kylberg et al., 2012).

This dataset contains 1500 Transmission Electron

Microscopy (TEM) images. These images represent

15 different types of viruses: Adenovirus, Astrovirus,

CCHF, Cowpox, Dengue, Ebola, Inﬂuenza, Lassa,

Marburg, Norovirus, Orf, Papilloma, Rift Valley, Ro-

tavirus, WestNile. Each virus type is represented by

100 images of 41 × 41 pixels size. Although this

database is available in 8-bits and 16-bits formats,

we used only the 8-bits format in our experiments to

avoid normalization problems during the training of

the CNN.

Variability Evaluation of CNNs using Cross-validation on Viruses Images

627

Figure 1: The virus database. from top left to right bottom: Adenovirus, Astrovirus, CCHF, Cowpox, Dengue, Ebola,

Inﬂuenza, Lassa, Marburg, Norovirus, Orf, Papilloma, Rift Valley, Rotavirus and WestNile (Source: paper (Kylberg et al.,

2011) and www.cb.uu.se/

∼

gustaf/virustexture/).

Figure 2: Examples of variation within a class (Source: paper (Kylberg et al., 2011) and www.cb.uu.se/gustaf/virustexture/).

Figure 1 shows one example of each virus type,

while Figure 2 shows an example of variation within

the class of a single virus type.

4 EXPERIMENTS

In our experiments, we aimed to investigate the use of

different CNN models in a virus classiﬁcation prob-

lem and to evaluate the variability of these networks

when subjected to a cross-validation scheme. Given

that most of the image datasets do not specify a ﬁxed

train and test sets, we propose to train the networks

with the k-fold method to better estimate their accu-

racy.

To properly train all network models and to avoid

any bias, we must ensure that the different models are

trained using the same subset of samples. To accom-

plish that, we split the dataset into 10 disjoint sets

(k = 10) so that the same subset of samples would

be used to train each network.

During the experiments, we used the pre-trained

2012 ImageNet weights for all CNN models, and we

ﬁne-tuned the network output (fully connected lay-

ers) to our problem. According to (Yosinski et al.,

2014), the use of low-level features learned in larger

datasets achieves better results when compared to fea-

tures learned using a network trained from scratch in

a smaller dataset. ImageNet data set contains approx-

imately 1.2 million images divided into 1000 classes.

As shown in Table 1, each CNN model has a ﬁxed

input layer structure, so that the images in the Virus

VISAPP 2020 - 15th International Conference on Computer Vision Theory and Applications

628

Figure 3: The bar plots of the accuracy over epochs and folds for each model.

dataset were resized accordingly. We also deﬁned a

data augmentation scheme to be used on the training

set to enlarge the image dataset artiﬁcially. Data aug-

mentation enables us to properly train a network by

generating new samples of a given image under differ-

ent variations, thus helping to avoid overﬁtting (Saj-

jadi et al., 2016). For each training image, we per-

formed random rotations of one degree between -15

◦

and 15

◦

in each image of the training set.

In the training process, the optimization of the

weight parameters is done using Stochastic Gradi-

ent Descent (SGD) algorithm with a low learning

rate of 1e

−

3. As pointed in (Bengio, 2012), us-

ing SGD rather than adaptive learning rate optimizers

(i.e., Adam optimizer) prevents the magnitude of the

learning rate stays small and not wreck the previously

learned features. For the training process, we used the

cross-entropy loss function.

5 RESULTS AND DISCUSSION

Figure 3 shows the evolution of the accuracy on the

test set of each network model during its training. In

Table 2, we present the mean and the standard devi-

ation accuracy of the network models evaluated af-

ter 50 epochs training. For a given network model,

it is possible to notice that it exists a small varia-

tion of the accuracy among the folds. Even though

the network model presents a high average accuracy

and very small standard deviation, there are subsets

of images that are more difﬁcult to test than others.

This is expected as the dataset used in the experiments

presents low intra-class similarity, as shown in Figure

Even though this variation among folds is present,

the high accuracy indicates that the feature maps

learned by the network model are robust. As a re-

Variability Evaluation of CNNs using Cross-validation on Viruses Images

629

Figure 4: The plots of the loss average and standard deviation of the folds over epochs for each model.

sult, these features are capable of identifying the virus

class, although the many distortions present in the im-

ages (e.g., translation and occlusion of parts of the

virus), thus surpassing the low intra-class similarity

problem of the images.

Another essential point to take notice is the lower

accuracy presented by SqueezeNet in Table 2. This

CNN model showed the lowest accuracy, which is

0.0460 lower than InceptionV3 (0.872). To investi-

gate its poor performance, Figure 4 presents the evo-

lution of the loss function of each network model dur-

ing their training. After 50 epochs, we notice that

except for the SqueezeNet, all other CNN models

present a loss function close to zero and a very small

variation among folds.

According to Table 1, SqueezeNet is the CNN

model with the lowest amount of convolutional ﬁlters.

Although it seems counterintuitive, SqueezeNet de-

mands a larger number of epochs to train its reduced

model to achieve a result similar to deeper networks,

as shown in Figure 5. An explanation for such perfor-

Table 2: Average accuracy of the CNN models.

CNN model Epochs Accuracy

DenseNet 50 0.890±0.023

ResNet 50 0,886±0.020

InceptionV3 50 0.872±0.028

SqueezeNet 50 0.826±0.024

SqueezeNet 100 0.865±0.017

mance may lie in the structure of SqueezeNet, which

uses few pooling layers to achieve a big feature map.

Pooling layers help to reduce the dimensionality of

the image and the CNN sensitivity to image distor-

tions and shifting, problems that are present in the

virus dataset. When using 100 epochs to train this

model (Table 2), we are able to reduce its average loss

function. However, there is still a signiﬁcant variance

among the folds. As a result, we are able to improve

its average accuracy from 0.826 to 0.865, but still, the

lowest accuracy value among all CNNs evaluated.

Finally, we compared our results with other works

VISAPP 2020 - 15th International Conference on Computer Vision Theory and Applications

630

Figure 5: The plots of the loss average and standard deviation of the folds over epochs for SqueezeNet using 100 epochs.

Table 3: Comparison of mean accuracies of some texture analysis methods applied to the virus database.

Methods Mean accuracy

DenseNet 0.890±0.023

Highest accuracy in (dos Santos et al., 2015) 0.8570

Highest accuracy in (Wen et al., 2016) 0.862 ± 0.020

that addressed the same problem. Our best result

(obtained by DenseNet) is able to surpass the high-

est accuracies of two texture-based approaches, and

it presents an accuracy of 0.028 higher than the best

method compared (dos Santos et al., 2015). This re-

sult demonstrates the effectiveness of Deep Learning

methods in image problems that show high within-

class and low intra-class similarity.

6 CONCLUSIONS

In this paper, we addressed the problem of TEM

virus image classiﬁcation using CNN models. We

evaluated four CNN models (InceptionV3, ResNet,

SqueezeNet, and DenseNet) pre-trained on the 2012

ImageNet image data set, and our goal was to cor-

rectly classify the 15 different types of viruses im-

ages in the dataset. We also evaluated the impact of

a cross-validation scheme on the classiﬁcation accu-

racy, since the dataset does not deﬁne training and test

sets.

The results showed that the best accuracy is

achieved using the DenseNet CNN model, a result

that surpasses the success rates of two recent works

that addressed the same problem. Additionally, this

result is insensitive to the training set used during the

execution of the k-fold cross-validation scheme. Al-

though its accuracy is only slightly superior to Resnet,

DenseNet converges faster, and it presents compli-

ance of data on the test accuracy, thus indicating

DenseNet as a relevant CNN to be used to discrim-

inate virus images.

ACKNOWLEDGEMENTS

The authors would like to acknowledge the Federal

University of Uberl

andia, FAPEMIG (Minas Gerais

Research Foundation) under Grant #APQ-03437-15,

CNPq (National Council for Scientiﬁc and Techno-

logical Development) under Grant #400699/2016-8,

#302416/2015-3 and #301715/2018-1, and PROPP-

UFU. This study was ﬁnanced in part by the

Coordenac¸

ao de Aperfeic¸oamento de Pessoal de

ıvel Superior - Brazil (CAPES) - Finance Code 001.

REFERENCES

Bengio, Y. (2012). Practical recommendations for gradient-

based training of deep architectures. Neural Net-

works: Tricks of the Trade. Lecture Notes in Computer

Science, vol 7700.

Biel, S. S. and Madeley, D. (2001). Diagnostic virology -

the need for electron microscopy: a discussion paper.

Journal of Clinical Virology, 22(1):1 – 9.

dos Santos, F. L. C., Paci, M., Nanni, L., Brahnam, S., and

Hyttinen, J. (2015). Computer vision for virus image

Variability Evaluation of CNNs using Cross-validation on Viruses Images

631

classiﬁcation. Biosystems Engineering, 138:11 – 22.

Innovations in Medicine and Healthcare.

Ghidoni, S., Nanni, L., Brahnam, S., and Menegatti, E.

(2014). Texture descriptors based on Dijkstra’s al-

gorithm for medical image analysis. In Innovation

in Medicine and Healthcare 2014, Proceedings of the

second KES International Conference on Innovation

in Medicine and Healthcares, InMed 2014, 9-11 July

2014, San Sebastian, Spain, pages 74–82.

He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid-

ual learning for image recognition. In 2016 IEEE Con-

ference on Computer Vision and Pattern Recognition

(CVPR), pages 770–778.

Huang, G., Liu, Z., v. d. Maaten, L., and Weinberger, K. Q.

(2017). Densely connected convolutional networks. In

2017 IEEE Conference on Computer Vision and Pat-

tern Recognition (CVPR), pages 2261–2269.

Iandola, F. N., Han, S., Moskewicz, M. W., Ashraf, K.,

Dally, W. J., and Keutzer, K. (2016). Squeezenet:

Alexnet-level accuracy with 50x fewer parameters and

<0.5mb model size. arXiv:1602.07360.

Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012).

Imagenet classiﬁcation with deep convolutional neu-

ral networks. In Proceedings of the 25th Interna-

tional Conference on Neural Information Processing

Systems - Volume 1, pages 1097–1105.

Kylberg, G., Uppstr

om, M., Hedlund, K.-O., Borgefors, G.,

and Sintorn, I.-M. (2012). Segmentation of virus par-

ticle candidates in transmission electron microscopy

images. Journal of Microscopy, 245(2):140–147.

Kylberg, G., Uppstr

om, M., and Sintorn, I.-M. (2011).

Virus texture analysis using local binary patterns and

radial density proﬁles. In San Martin, C. and Kim, S.-

W., editors, Progress in Pattern Recognition, Image

Analysis, Computer Vision, and Applications, pages

573–580, Berlin, Heidelberg. Springer Berlin Heidel-

berg.

Ponti, M. A., Ribeiro, L. S. F., Nazare, T. S., Bui, T.,

and Collomosse, J. (2017). Everything you wanted

to know about deep learning for computer vision but

were afraid to ask. In SIBGRAPI, pages 17–41.

Proenc¸a, M. C., Nunes, J. F., and de Matos, A. P. (2013).

Texture indicators for segmentation of polyomavirus

particles in transmission electron microscopy images.

Microscopy and Microanalysis, 19(5):1170–1182.

Sajjadi, M., Javanmardi, M., and Tasdizen, T. (2016). Reg-

ularization with stochastic transformations and per-

turbations for deep semi-supervised learning. In

Proceedings of the 30th International Conference on

Neural Information Processing Systems, pages 1171–

1179.

Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna,

Z. (2016). Rethinking the inception architecture for

computer vision. In 2016 IEEE Conference on Com-

puter Vision and Pattern Recognition (CVPR), pages

2818–2826.

Wen, Z., Li, Z., Peng, Y., and Ying, S. (2016). Virus im-

age classiﬁcation using multi-scale completed local

binary pattern features extracted from ﬁltered images

by multi-scale principal component analysis. Pattern

Recognition Letters, 79:25 – 30.

Yosinski, J., Clune, J., Bengio, Y., and Lipson, H. (2014).

How transferable are features in deep neural net-

works? In Proceedings of the 27th International Con-

ference on Neural Information Processing Systems -

Volume 2, NIPS’14, pages 3320–3328, Cambridge,

MA, USA. MIT Press.

VISAPP 2020 - 15th International Conference on Computer Vision Theory and Applications

632