However, these methods are also only suitable for
optimizing simple network, for example, the net-
work with one hidden layer and small number of
nodes. Recently, genetic algorithms are widely used
to optimize network architecture and weights (Fisze-
lew et al., 2007). But its slow convergence rate is a
very influential factor. So corresponding to complex
deep learning models with many hidden layers, new
methods need to be developed for the architecture
optimization.
The proposed architecture optimization method
in the paper is specifically aimed at the complex
deep learning model with many hidden layers and a
large number of nodes, which can adjust the number
of the nodes in multiple hidden layers. To improve
the efficiency, the optimization is based on the cor-
relation analysis of the node weights initialized us-
ing the Restricted Boltzmann Machine (RBM) (Hin-
ton and Salakhutdinov, 2006). Different from the
traditional methods, the time-consuming network
training process is avoided in the proposed architec-
ture optimization method.
The proposed method has the following ad-
vantages: First, it can be well applied to the complex
network architecture optimization. Because the
computing unit of the method is the set of multiple
nodes rather than a single node, the method can be
easily extended for the network with a lot of layers.
Second, the proposed architecture optimization is
based on the correlation analysis between the
weights of the nodes after initialization rather than
the error function after training, the efficiency is
greatly improved.
The rest of paper is organized as follows. In Sec-
tion 2, we first briefly introduce the working flow of
deep learning model and the mechanism of RBM in
network initialization. And then we discuss in detail
about the calculation of correlation coefficient in
Subsection 2.1. At last, Subsection 2.2 shows the
detailed steps of method about optimizing multiple
hidden layer nodes. In Section 3, we present the
experimental results through analyzing and compar-
ing different datasets. Finally, Section 4 draws con-
clusions and discusses the future research direction.
2 ARCHITECTURE
OPTIMIZATION METHOD
Autoencoder, as a kind of deep learning model, is
often used for dimensionality reduction (Laurens
van der Maaten et al., 2009). Our proposed method
is based on this model framework. It consists of the
encoder module which transforms high dimensional
input data into low dimensional output data (code),
and the decoder module which reconstructs the high
dimensional data from the code. The construction of
the autoencoder is mainly concentrated on the en-
coder module because the decoder module can be
approximately regarded as transposition of the en-
coder module. Our method is to optimize the struc-
ture of the encoder module. In the training of deep
learning, RBM is used to initialize the weights of the
nodes. The usage of RBM is a significant improve-
ment of the deep learning model because it is diffi-
cult to optimize the weights of the nodes of the mul-
tiple hidden layers without good initial weights
(Hinton and Salakhutdinov, 2006).
RBM is a kind of stochastic two-layer network
containing a visible layer corresponding to the input
and a hidden layer corresponding to the output. The
two layers are fully connected, that is to say, each
node of the hidden layer connects to all the nodes of
the visible layer, but the nodes in the same layer
cannot connect to each other. In the initialization
process, the network will be divided into multiple
RBMs and the output of previous RBM will become
the input of the next RBM, so as to achieve the ex-
traction of input data information layer by layer.
The second step is the weights fine-tuning after
RBM initialization. Using the traditional weights
fine-tuning methods, for example, backpropagation
(BP), can fine-tune weights by minimizing the data
reconstruction error.
The proposed network architecture optimization
method working on the above framework is depicted
in Figure 1. In the initial stage, a simple network
with very small number of nodes in each hidden
layer is created, which corresponds to the architec-
ture of nodes connected by solid lines in Figure 1.
The architecture optimization is achieved by dynam-
ic growth of the number of the nodes of the hidden
layers, where the added nodes correspond to the
nodes connected by dotted lines in Figure 1. At each
step, the same number of nodes (N
i
nodes) are added
to the target layer, then the correlation analysis on
the weights of all the nodes (the weights of the node
stand for the weights between all the nodes of the
previous layer and the current node) in the layer is
carried out. In our method, N
i
nodes which have the
smallest correlation with the rest of the nodes are
selected from all the nodes. The correlation coeffi-
cient between the selected nodes and the rest of the
nodes are computed (See Subsection 2.1 for details).
At the end, when the correlation coefficient is great-
er than a given threshold, the dynamic growth of the
number of the nodes is stopped and the number of
NCTA 2016 - 8th International Conference on Neural Computation Theory and Applications
54