such as how to improve the structure of Convolution
Neural Network to increase the learning ability, and
how to convert Convolution Neural Network from a
reasonable form into the new application model.
The outline of the paper is as follows. Section
two describes the background to the CNN including
the theoretical proposition stage, the model
realization stage, and the extensive research stage.
Section three defines the structure of CNN that
mainly composed of input layer, convolution layer,
subsampling layer (pooling layer), fully connected
layer and output layer. Section four gives two
examples of Convolution Neural Network
applications and discusses the future of other
applications.
2 BACKGROUND
The history of Convolutional Neural Networks can
be roughly divided into three stages: the theoretical
proposition stage (Yandong, 2016), the model
realization stage (Fukushima, 1988), and the
extensive research stage.
2.1 The theoretical proposition stage
In the 1960s, Hubel showed that the biological
information from the retina to the brain is stimulated
by multiple levels of receptive field. In 1980, for the
first time, Fukushima proposed Neocognitron
(Ballester, 2016) based on the theory of receptive
fields. Neocognitron is a self-organizing multi-layer
neural network model. The response of each layer is
obtained by the local sensory field of the upper
layer. The recognition of the model is not affected
by position, small shape changes and the size of the
scale. Unsupervised learning by Neocognitron is
also the dominant learning method in early studies
of convolutional neural networks
2.2 The model realization stage
In 1998, Lecun proposed LeNet-5 to use a gradient-
based backpropagation algorithm to supervise the
network. The trained network converts the original
image into a series of feature maps through the
convolutional layer and the down-sampling layer
alternately connected. Finally, the feature
representation of the images is classified by the fully
connected neural network. Convolutional kernel
completed the receptive field function, and it can
lower the local area information through the
convolution kernel excitation to a higher level. The
successful application of LeNet-5 in the field of
handwritten character recognition has drawn the
attention of academia to the convolutional neural
network. In the same period, research on
Convolutional Neural Networks in speech
recognition, object detection, face recognition and so
on has been gradually carried out.
2.3 The extensive research stage
In 2012, AlexNet proposed by Krizhevsky won the
championship in the image classification contest of
ImageNet, a large image database, with the huge
superiority of 11% beyond the second place, making
the Convolutional Neural Network become an
academic focus. After AlexNet, new models of
convolutional neural networks have been proposed,
such as Visual Geometry Group (VGG), Google's
GoogLeNet, Microsoft's ResNet, etc. These
networks refresh AlexNet Record created on
ImageNet. Furthermore, convolutional neural
network is continuously merged with some
traditional algorithms, and with the introduction of
migration learning method, the application of
Convolutional Neural Networks has been rapidly
expanded. Some typical applications include:
Convolutional neural network combined with
Recurrent Neural Network (RNN) for image
summarization and image content quiz; Convolution
Neural Networks have achieved significant accuracy
gains in small sample image recognition databases;
and video-oriented behavioral recognition models -
3D Convolutional Neural Networks.
3 STRUCTURE
As shown in the figure 1, the typical Convolutional
Neural Network is mainly composed of input layer,
convolution layer, subsampling layer (pooling
layer), fully connected layer and output layer.