by identifying the maximum learning rate to use and
then employs a decreasing maximum cyclic learning
rate during training. Once the convolution layers have
converged to the maximum extent possible, the fully
connected layers are then adjusted cyclically.
The learning agents are allowed to run through the
hyperparameters as previously discussed. The algo-
rithm achieves an accuracy of 99.1% for the optimal
result for the available range of hyperparameters for
the MNIST data set.
Table 2 shows convergence of the MNIST data set
using this algorithm. For entries in the ‘Filter Dimen-
sions’ column, the list on each row shows the final
filter dimensions selected for each convolution layer.
[[5, 5], [5, 5]], for example, indicates that the first
and second convolution layer filters have dimensions
of 5x5. Each entry in the ‘Filters’ column shows the
number of filters for each convolution layer. ‘FCLs’
shows the number of neurons in each fully connected
layer. ‘Acc’ indicates the final accuracy upon com-
pletion. The architecture settled on filter dimensions
of [[5,5], [5, 5]] for the first two convolutional lay-
ers, [256, 32] for the number of filters for the first two
convolutional layers, and [128, 10] for the number of
neurons in the fully connected layers.
Table 2: MNIST algorithm progression summary.
Filter Dims Filters FCLs Acc
[[3, 3]] [32] [64, 10] 97.65%
[[3, 3]] [32] [128, 10] 98.12%
[[5, 5]] [32] [128, 10] 98.55%
[[5, 5]] [256] [128, 10] 98.8%
[[5, 5], [5, 5]] [256, 32] [128, 10] 98.85%
[[5, 5], [5,5]] [256, 64] [128, 10] 99.1%
Table 3 shows CIFAR-10 progression of network
architecture parameters and the corresponding im-
provement in accuracy. The columns have the same
format as table 2. As with the MNIST data set, we can
see that the CIFAR-10 accuracy goes from 62.99% to
82.56% through the lifetime of the algorithm.
There is a 16.5% difference in accuracy between
the MNIST and CIFAR-10 data sets. This is due to
the increased complexity within the CIFAR-10 data
set, relative to MNIST.
6 DISCUSSION
One of the intriguing results from this paper is the
fact that this algorithm can be pointed at a directory
of images, and without any prior knowledge of im-
age details, network architectures, or hyperparameter
nuances, this algorithm is able to construct and train,
from scratch, a deep CNN.
As demonstrated in Tables 2 and 3, the algorithm
builds network architectures with progressively better
accuracy. There are several important pieces that en-
able this to come together. One is appropriate weight
initialization. Timely convergence is more difficult
with deep networks when weights are not properly
initialized. Another key element is identifying ideal
learning rates. Cyclic learning rates allow for quick
convergence.
This novel approach to hyperparameter tuning is
significant because it brings the ability to generate a
trained network structure to those that do not have
a deep understanding of architecture design. It al-
lows one to gather a data set and then jump to using
a trained neural network while letting the algorithm
work through network architecture design details.
Table 4 contains a comparison of results for hy-
perparameter tuning for the MNIST data set. In this
table, random grid, TPE, SMAC (all from (Thornton
et al., 2012), PnP, and RL (Baker et al., 2016) are
compared against each other. (Thornton et al., 2012)
ran a series of automated hyperparameter tuning al-
gorithms using their suite of tools with the MNIST
data set. They allowed Random Grid search to run
for 400 hours while TPE and SMAC ran for 30 hours
each. The reinforcement learning algorithm obtained
an impressive 99.56%, but with the cost of 192 hours
of training. The PnP algorithm was able to achieve
99.1% with 4 hours of training. PnP was able to
achieve very good results at a substantial time savings
relative to the other algorithms.
Table 5 contain results of the PnP algorithm com-
pared to other algorithms. The Weka Random Grid
(Thornton et al., 2012) approach was allowed to run
for 400 hours to reach an accuracy of 35.46%. No in-
put from the user was required. TPE and SMAC were
run by (Domhan et al., 2015). They did have to set
the upper and lower bounds for their algorithm. Fol-
lowing 33 hours of training they were able to achieve
accuracies of 82.53% and 81.92%. The reinforce-
ment learning algorithm was able to get an impres-
sive 92.68% accuracy, but after 192 hours of training.
The PnP algorithm was able to achieve 82.56% accu-
racy after 7 hours of training. Again, PnP was able to
achieve competitive results at a substantial time sav-
ings relative to the other algorithms.
Another aspect of consideration in automatic hy-
perparameter tuning is the level of expertise needed.
Setting up a reinforcement learning approach to ob-
tain optimal hyperparameter values is a non-trivial
task, and one that most neural network practitioners
are not likely to take on. In theory, it is an interest-
Plug and Play Deep Convolutional Neural Networks
393