Figure 7: Accuracies on the test sets of the datasets (Picture
credit: Original).
LeNet-5: LeNet-5 exhibits the lowest accuracy
on all datasets. The performance on VGG16
and ResNet50 is comparable, while ResNet50
shows better processing of more complex data
sets. Furthermore, Since LeNet-5 is inherently
designed for simpler tasks, its architecture is
relatively shallower. Hence the under-
performance of LeNet-5 which struggles to
capture the nuances presented in more
complex datasets is predictable and acceptable.
VGG16 and ResNet50: It is evident that the
more complex CNNs, VGG16 and ResNet50,
outperform LeNet-5 across all datasets.
Among all of them, ResNet50 achieves the
highest accuracy on both CALTECH-101 and
STL-10. These results underscore the
significance of architectural complexity in
enhancing image classification performance
across diverse datasets. The advanced
performance of VGG16 and ResNet50 on
CALTECH-101 and STL-10 can be attributed
to their deeper architectures, especially the
skip connections of ResNet50, which allow
them to capture intricate features in the diverse
images present in these datasets. The
complexity of CALTECH-101 and STL-10
aligns with ResNet50's strengths, while
VGG16's consistent architecture is advanta-
geous in maintaining a relatively good level in
the face of a large number of instances.
However, it is important to consider the
computational demands of VGG16,
particularly in resource-constrained scenarios.
In this experiment, VGG16 always took the
longest training time on all the datasets. In
summary, the ability of these architectures to
capture both low and high-level features is
evident in their superior performance on
CALTECH-101 and STL-10.
It is worth noting that the characteristics of a
dataset can have a significant impact on the observed
performance. In the case of CIFAR-10, CALTECH-
101, and STL-10, the selection of datasets presents a
set of challenges that reflect real-world scenarios. On
the one hand, among all the CNNs, the accuracies on
STL-10 are always the highest, which may be
attributed to this dataset containing the largest number
of instances with limited categories. On the other
hand, the accuracies of the easiest dataset, CIFAR-10,
on the more complex CNNs, VGG16 and ResNet50,
are not the best as expected. This may point out that
although more complex networks can adapt to more
complex environments, they do not always perform
very well in simple environments.
5 CONCLUSION
This study conducted a comprehensive analysis of
image classification algorithms using diverse datasets:
CIFAR-10, STL-10, and CALTECH-10, and CNN
architectures: LeNet-5, VGG16, and ResNet50. The
data set and network structure are carefully selected so
that the results can cover a large scenario. This paper
aims to figure out how architectural complexity
impacts performance across varying datasets. Results
highlighted the importance of architecture in
addressing dataset challenges. Across the datasets,
depth and skip connections were key. The depth of
VGG16 and the application of skip connections in
ResNet50 excelled on complex datasets, capturing
intricate features. In conclusion, this study informs
architectural decisions for diverse image classification
scenarios, bridging CNN design and dataset specifics.
Future research can explore intricate designs,
transfer learning, and hybrid models. These efforts
will advance image classification, producing
enhanced performance and generalization models.
REFERENCES
H. Yu, J. Zhao, and Y. Zhu, “Research on Face Recognition
Method Based on Deep Learning,” IEEE Xplore, Oct.
01, 2019.
D. Bhatt et al., “CNN variants for Computer Vision:
history, architecture, application, challenges and future
scope,” Electronics, vol. 10, no. 20, p. 2470, Oct. 2021.
Y. Pei, Y. Huang, Q. Zou, X. Zhang, and S. Wang, “Effects
of Image Degradation and Degradation Removal to
CNN-based Image Classification,” IEEE Transactions
on Pattern Analysis and Machine Intelligence, pp. 1–1,
2019.