ability of CNN to capture patterns in context by
adding loop connections so that cells can be
modulated by other cells in the same layer (Liang,
2015). Dense convolutional networks (DenseNet) are
proposed to solve the problem of overfitting in deep
learning models. As the number of parameters rises,
DenseNets consistently improves accuracy by
directly combining any two layers with the same
feature picture size, showing no symptoms of
overfitting or performance loss (Huang, 2017). To
identify the nonlinear relationship in the information,
the regularization methods have also been improved.
Drop-Activation employs deterministic networks
with altered nonlinearities for prediction, randomly
eliminates nonlinear activations from the network
during training, and adds randomization to the
activation function (Liang, 2020).
The efficacy of image classification extends
beyond merely the model itself; other factors wield
significant influence. Dataset availability, model
design, and researchers' expertise play pivotal roles in
determining model effectiveness (Lu, 2007). In the
realm of CNN models, factors like optimizer
selection, learning rate, epoch count, batch size, and
activation function profoundly impact accuracy
(Nazir, 2018). For example, when using CNN models
to extract spatial features for hyperspectral image
(HSI) classification, several optimizers perform
differently: stochastic gradient descent (SGD),
adaptive moment estimation (Adam), adaptive
gradient (Adagrade), root mean square propagation
(RMSprop), and nesterov-accelerated adaptive
moment estimation (Nadam) (Bera, 2020). This
article's primary objective is to construct image
classification models and delve into the ramifications
of diverse model architectures through the lens of
deep learning. Notably, the optimizer emerges as a
crucial determinant in model update iterations.
Furthermore, the study meticulously scrutinizes and
contrasts the effects of parameters such as learning
rate and epoch count on model performance. By
juxtaposing the accuracy of multilayer perceptron
(MLP) and CNN models in addressing image
classification challenges, the differential impact of
various model structures is succinctly summarized.
Both CNN and MLP stand as formidable models in
the realm of image analysis, adept at effectively
representing and modeling data. Additionally,
through deliberate manipulation of individual
parameters and subsequent observation of accuracy
shifts, this article delineates the nuanced impact of
each parameter on the CNN model. Such insights not
only foster a deeper comprehension of parameter
influences but also furnish valuable reference points
for future model optimization endeavors.
2 METHODOLOGIES
2.1 Dataset Description and
Preprocessing
The dataset chose to train the models is Cifar-10. It
comes from Department of Computer Science,
University of Toronto (Krizhevsky, 2009). It has sixty
thousand 32x32 color pictures divided into ten
classes: truck, airplane, car, cat, deer, dog, frog,
horse, and so forth. With 10,000 photos apiece, the
dataset is split into five training batches and one test
batch. The dataset has been used in image
classification problems widely. Images in the dataset
are low-resolution ( 32 × 32 ), which require less
computer power and can train the models quickly.
Another reason for selecting this dataset is to test the
models’ ability of classifying creatures and objects in
the real world. Because the structures of MLP model
and CNN model are different, the dataset need to be
processed differently. For the dataset of the MLP
model, the data is first converted into a tensor format
acceptable to PyTorch. Then the images are
normalized by scaling the pixel values between -1 and
1. For the dataset of the CNN model, the images are
first stored as a 32 × 32 matrix. The labels are
converted into a two-valued matrix (one-hot
encoding). Finally, it is also necessary to normalize
the data by scaling the pixel value between 0 and 1.
2.2 Proposed Approach
This paper's principal goal is to compare the MLP
model's accuracy to that of the CNN model, while
also scrutinizing the effects of different optimizers
and regularization techniques during training.
Subsequently, the aim is to identify the optimal
combination of parameters to construct and predict
with the model. In comparing the two models, both
are constructed using the RMSprop optimizer. Each
model undergoes training for 100 epochs, and their
respective accuracies are plotted for comparative
analysis. Upon determining the superior model
structure through comparison, the appropriate
optimizer and regularization method are selected.
Three types of regularization techniques are
employed to construct models, and their respective
accuracies are evaluated. Additionally, four distinct
optimizers are utilized to build models, with their