vector for identity recognition, which was the
prototype of the face recognition method based on
geometric features, yet not really automatic face
recognition.
The earliest research on automatic face
recognition originated in the 1960s. The
representative result was published by Chan at
Panoramic Research Incorporated in 1965(Bledsoe
W. W., 1965). Domestic face recognition research
started late, 1979 Journal of Automation published a
"review of artificial intelligence at home and
abroad" (Li, 1979), which is the first time to retrieve
the domestic journals "face recognition" concept. In
1992, Hong published the "Image Algebra Feature
Extraction for Image Recognition" in Journal of
Automation (Zi-Quan Hong, 1992) and Zheng
Jianping "Standard Frontal Face Recognition in
Computer Engineering, Is the earliest academic
papers retrieved in the field of face recognition
research (Zheng J, 1992).
In the past decades, more and more face
recognition technology has attracted the attention of
domestic and foreign researchers. Especially in the
21st century, with the rapid development of artificial
intelligence, the use of advanced algorithms for face
recognition has been pushed to the peak of research.
However, face recognition technology has received
extensive attention and research. It is still a
challenging task because of changes in light, gesture
changes, facial expressions and occlusion and other
factors.
Convolutional Neural Networks are inspired by
the structure of biological neural networks and
visual systems. 1962 Hubel and Wiesel through the
cat's visual cortical cell research, put forward the
experience of receptive field concept (Hubel D H,
1962). In 1980, Fukushima first proposed a
theoretical model based on the receptive field
Neocognitron (Fukushima K, 1987). Neocognitron
was a self-organized multi-layer neural network
model. In 1998, Yan LeCun used the gradient
descent optimization algorithm and the
back-propagation error algorithm to train the
convolution neural network on the handwriting, and
achieved the best effect in the world at that time
(Krizhevsky A, 2012). 2012 Geoffrey Hinton and
others in the very well-known ImageNet on the
Convolution Neural Network model to obtain the
best results of the world. The results was far more
than the second, which made the CNN attracting
higher attentions.
3 THE STRUCTURE OF CNN
CNN is a specially artificial neural network
designed to process two-dimensional input data, and
each layer in the network consists of multiple planes.
Each plane consists of multiple independent neurons.
CNN was inspired by the early Time-Delay Neural
Network (TDNN) (Waibel A, 1990). TDNN reduces
the computational complexity of network training by
sharing weights in the time dimension. It is suitable
for processing speech and time-series signals. CNN
adopts the weight-sharing network structure to make
it more similar to the biological neural network.
Compared with the fully connected layer network in
each layer, CNN can effectively reduce the learning
complexity of the network model, with fewer
network connection layers and weight parameters,
and thus easier to train.
The basic structure of CNN consists of input
layer, convolution layer, pooling layer, fully
connected layer and output layer. That is, a
convolution layer connected to a pool layer, the pool
layer and then connect a convolution layer, and so
on. Since each neuron in the output feature of the
convolutional layer is locally connected to its input,
the corresponding connection uses the weights and
local input weighted sum, plus offset value to get
this neuron input value. The process is equivalent to
the convolution process, and this is why CNN is
called (Lecun Y, 1998).
In the convolution layer of CNN, each neuron of
the feature map is connected with the local
receptivity field of the previous layer. The local
features are extracted through the convolution
operation. In the convolutional layer, there are many
feature maps. Each feature map extracts one feature.
When extracting features, neurons in the same
feature map share a set of weight convolution
kernels. Different feature maps have different
weights. And weight parameters are constantly
adjusted during the training so that feature extraction
is performed in a favorable direction.
There will be a pooling layer after the
convolutional layer. Because the previous layer has
a large amount of overlap when window sliding
convolution is done. There is redundancy in the
convolution value. A pooling layer is needed to
simplify the output of the convolution layer. Pooling
layer will retain the main information convolutional
layer, while reducing the parameters and calculation,
to prevent over-fitting. The most common pooling is
max-pooling, which takes the largest feature points
in the field. Max-pooling transmits only the
parameters with largest value and takes others away.