The following content was written in the
subsequent part of this article: Part 2 summarizes the
relevant work in the field of deep learning, which is
a literature review. The third part introduces the
established deep CNN model and provides a detailed
description of its internal architecture and the
optimization function Adam of the model. In the
fourth section, we elucidate the model's
classification outcomes on the dataset and conduct a
comprehensive performance assessment. Section 5
summarizes the main work of this article and
proposes the shortcomings of the research.
2
RELATED WORKS
With the development of machine learning, related
algorithms are facing new opportunities and
challenges in computer vision. In image
recognition, the classification performance of deep
learning algorithms for images is becoming
increasingly accurate. The following is a list of
relevant developments:
R. Chauhan, K.K. Ghanshala, R.C. Joshi, et al.
developed two distinct CNN architectures for both
the MNIST and CIFAR-10 datasets, exclusively
relying on CPU-based computation (Chauhan et al,
2018). CNN performed well on the MNIST dataset,
achieving an accuracy of 99.6% after 10
epochs. On the CIFAR-10 dataset, due to
insufficient training epoch size, the accuracy is only
80.17%. Furthermore, a suggestion is put forth to
augment the training epoch as a means to further
enhance the model's accuracy.
Deep learning in medicine helps to effectively
diagnose epidemics. Boukaye, Bernard, Fana et al.
used efficient CNN to effectively recognize and
classify pathogen images of cholera and malaria in
microscopic images, ultimately achieving an
accuracy of 94% (Traore et al, 2018). And it is
proposed that integrating pathogen image
recognition methods from this microscope into a
medical microscope can help diagnose and prevent
crises caused by epidemics. HYU, SOYOUN,
KYUNGYONG et al. based on ResNet deep CNN
and recognition of chest X-ray images, can
effectively diagnose cardiac hypertrophy (Yoo et al,
2021). The accuracy of model recognition is close to
80%. In addition, this work evaluated and compared
the classification results obtained by four types of
optimization functions SGD, Adam, AdaGrad, and
RMSProp in neural networks. According to this
work, when SGD is used as the optimizer in neural
networks, the model performs best in diagnosing
cardiac hypertrophy.
Weather recognition stands as a pivotal
application in the field of computer vision. Bin,
Xuelong, Xiaoqiang et al. assigned multiple weather
condition labels to each weather image in two
datasets, and completed the multi label classification
task based on a special CNN-RNN network model
(Zhao et al, 2018). In this model, CNN is extended
to a channel attention model. This model not only
effectively identifies weather, but also explores the
interrelationships within different weather conditions.
This study has markedly enhanced the model's
precision when contrasted with the conventional
approach of treating weather recognition as a
single-label classification task. Furthermore, we
conducted a comparative analysis involving AlexNet,
multi-label versions of VGGNet, ML-KNN,
ML-ARAM, and various other network models
using two distinct weather recognition datasets.
Finally, it was found that the CNN-RNN performed
best in multi label classification tasks on this dataset.
3
CLASSIFIER MODEL
3.1 CNN Model
CNN typically consist of several layers. They
include input layers, convolutional layers, pooling
layers, and dense layers (commonly referred to as
fully connected layers) and so on (Gu et al, 2018).
Convolutional layers are adept at extracting
pivotal features from the input image data. Within
neural networks, these layers necessitate multiple
convolutional kernels for computation. Each element
within these kernels corresponds to the network's
weight coefficients and bias vectors, taking
inspiration from the feedforward neural networks
found in biological organisms. The location on the
output feature map of the convolutional layer, in
relation to the pre-convolution input region, defines
the portion where the features of the CNN perceive
the input image. This region's size is contingent
upon the dimensions of the convolutional kernel
employed in the correlation operations of the
convolutional layer, commonly referred to as the
"receptive field. " (Gu et al, 2018).
Pooling layer is used for downsampling
convolutional layers, thereby reducing the number of
data points. Two prevalent pooling techniques are
frequently employed: average pooling and max
pooling.
Dense layer is used to classify the extracted
features mentioned above (similar to the fully