Comparative Analysis of Image Classification Algorithms Based

on Traditional and Advanced Convolutional Neural Networks

Taimingwang Liu

School of AI and Advanced Computing, Xi'an Jiaotong-Liverpool University, Taicang, Jiangsu, China

Keywords: Image Classification, Convolutional Neural Networks, Architectural Depth, Dataset Characteristics,

Comparative Analysis.

Abstract: This study presents a comprehensive comparative analysis of image classification algorithms across diverse

datasets and distinct convolutional neural network (CNN) architectures. The datasets considered—CIFAR-

10, CALTECH-101, and STL-10—embody varying complexities characteristic of real-world scenarios. They

span scenarios of limited categories and low-resolution images to challenges involving diverse instances with

fewer categories and high-resolution demands. The selected CNN architectures—LeNet5, VGG16, and

ResNet50—exhibit varying depths and design philosophies, offering a diverse landscape for evaluation.

Systematic experimentation and evaluation unveil the intricate interplay between architectural complexity and

dataset characteristics. The findings underscore the pivotal role of architectural depth in addressing diverse

dataset challenges. Notably, VGG16 and ResNet50 consistently outperform LeNet5 across all datasets,

emphasizing the importance of deeper architectures in image classification tasks. These insights provide

valuable guidance for architectural choices in image classification, ensuring alignment with specific dataset

characteristics. Additionally, the study lays the foundation for future research endeavors aimed at refining

architectural designs and enhancing image classification algorithm performance across various real-world

scenarios.

1 INTRODUCTION

Image classification, a foundational task in the realm

of computer vision, holds a central role in a multitude

of real-world applications, ranging from object

recognition and medical imaging to enabling

autonomous vehicles. Throughout the years, CNN has

emerged as the predominant approach for image

classification, achieving remarkable success due to its

innate capability to autonomously extract pertinent

features from raw image data. Moreover, it is worth

noting that Deep Learning (DL) stands as an effective

solution for addressing various image processing

challenges, such as facial recognition (Yu et al, 2019).

The ever-evolving landscape of CNN architectures,

encompassing both traditional and advanced models,

presents a compelling opportunity to assess their

comparative performance across diverse datasets

(Bhatt et al, 2021).

This paper aims to conduct a comparative analysis

of image classification algorithms, focusing on both

traditional and advanced CNN architectures.

Specifically, the performance of LeNet-5, ResNet50,

and VGG16 was investigated across three diverse

benchmark datasets: CALTECH-101, CIFAR-10, and

STL-10. This study sheds light on the architectures'

suitability for handling various image classification

challenges, while also offering insights into the

significance of employing advanced networks.

2 RELATED WORK

The landscape of image classification has been

significantly shaped by the emergence of CNNs,

which have revolutionized the field and demonstrated

exceptional performance in a variety of applications

(Pei et al, 2019). This section provides an overview of

the pertinent literature surrounding image

classification algorithms and their comparative

evaluations.

2.1 Image Classification and CNNs

Image classification involves assigning a label to an

image from a predefined set of categories. CNNs,

218

Liu, T.

Comparative Analysis of Image Classiﬁcation Algorithms Based on Traditional and Advanced Convolutional Neural Networks.

DOI: 10.5220/0012798600003885

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 1st International Conference on Data Analysis and Machine Learning (DAML 2023), pages 218-223

ISBN: 978-989-758-705-4

inspired by the organization of the visual cortex in

animals, have shown remarkable performance in

image classification tasks. LeNet-5, introduced by

LeCun et al., marked a significant step in the evolution

of CNNs, demonstrating the potential of deep learning

for feature extraction (Lecun et al, 1998 & Liu et al,

2022). Subsequent architectures, such as VGG16 and

ResNet50, introduced deeper architectures with skip

connections and improved accuracy (Simonyan and

Zisserman, 2015 & He et al, 2016).

2.2 Comparative Analysis of CNN

Architectures

Comparative analyses of CNN architectures have

become essential to understand the strengths and

weaknesses of different models. Previous studies have

focused on benchmark datasets to assess architecture

performance. For instance, Krizhevsky et al. evaluated

different CNN architectures on the ImageNet dataset,

demonstrating the effectiveness of deep networks in

large-scale image classification (Krizhevsky et al,

2012). More recent work by Tan & Le introduced

EfficientNet, showcasing superior performance while

being computationally efficient (Tan and Le, 2019).

2.3 Challenges and Dataset Selection

Dataset selection plays a pivotal role in shaping the

outcomes of comparative studies. It is crucial to

choose diverse datasets with distinct characteristics to

provide a comprehensive evaluation of the capabilities

of different architectural models. This study selected

three widely used benchmark datasets, each with its

unique features and challenges.

 CIFAR-10: CIFAR-10 is a classic dataset that

contains samples from ten major image classes

shown in Fig. 1. This dataset focuses on the

basic class classification of objects and is

suitable for evaluating the performance of

image classification algorithms in identifying

common objects.

Figure 1: Images from CIFAR-10 (Picture credit: Original).

 STL-10: The composition of STL-10 is similar

to CIFAR-10. It also includes ten categories

with natural scenes and objects common in

life, which are almost the same as the previous

one (Coates et al, 2021) . Fig. 2 shows

examples of each class. STL-10 images are

often complex, which can be recognized as an

upgraded version of CIFAR-10 with higher

resolution and more instances. It can better

reflect real-world image diversity.

Figure 2: Images from STL-10 (Picture credit: Original).

 CALTECH-101: The CALTECH-101 dataset

is more challenging rather the others, which

includes 101 different object categories. These

categories cover a wide variety of objects,

including animals, food, tools, and more (Li et

al, 2004). Fig. 3 shows some examples of part

of the classes in this dataset, where the images

have not been preprocessed. CALTECH-101

provides a rigorous test of object recognition

algorithms because it requires the algorithms

to be able to handle a variety of different

classes of objects.

Figure 3: Images from CALTECH-101 (without

preprocessing) (Picture credit: Original).

These dataset selections ensure that the

comparative analysis encompasses a spectrum of

image classification challenges, ranging from small,

simple images to high-resolution, complex scenes, and

diverse object categories.

2.4 Significance of Comparative Studies

Comparative analyses guide researchers and

practitioners in selecting appropriate architectures for

specific tasks. Such studies facilitate a deeper

Comparative Analysis of Image Classiﬁcation Algorithms Based on Traditional and Advanced Convolutional Neural Networks

219

understanding of architectural behavior across

different scenarios, aiding informed decision-making.

The insights gained from these analyses contribute to

the design of more effective models and the

advancement of the field.

3 METHODOLOGY

The methodology applied in this research underscores

a meticulous and strategic approach to conducting a

comprehensive and rigorous comparative analysis of

image classification algorithms. The primary

objective of this section is to delve into the key aspects

of the methodology, highlighting the preprocessing

steps and the rationale behind the selection of

convolutional neural network (CNN) architectures.

This strategic approach ensures holistic evaluation of

algorithmic performance, taking into consideration

both data preparation and architectural considerations.

3.1 Preprocessing

Not only the quality of data but also the quality of

preprocessing can affect the performance of the model

(Gulati and Raheja, 2021). To ensure rigorous

evaluation, each dataset was divided into three

subsets: training, validation, and test set with a ratio of

7:2:1. The training set was used for model parameter

optimization, the validation set aided in

hyperparameter tuning, and the test set, consisting of

unseen data, served as the final performance

evaluation metric. Table 1 shows how the datasets are

split.

Table 1: Split of the datasets.

Dataset Train set Validation set Test set

CIFAR-10 49,000 14,000 7,000

STL-10 9,100 2,600 1,300

CALTECH-101 6,400 18,28 916

Minimal resizing was performed on the images to

retain their original dimensions. However, since the

images in the CALTECH-101 dataset have varying

sizes, they were uniformly resized to 224x224 pixels.

For data augmentation, horizontal flipping was

applied. Additionally, the images were normalized

with 0.485, 0.456, 0.406, and 0.229, 0.224, 0.225

respectively for the mean and standard deviation

values for the RGB channels. These values are from

the famous ImageNet dataset.

3.2 CNN Architectures

Choosing the right CNN architectures is crucial for

this study as it ensures a comprehensive evaluation of

image classification algorithms at different

developmental stages. Three significant architectures

were selected that have played a pivotal role in

computer vision. By comparing these architectures,

the results on various aspects of image classification

can be obtained clearer and more meaningful.

 LeNet-5: Introduced by LeCun et al. in 1998,

LeNet-5 holds a paramount place in the

history of convolutional neural networks

(Lecun et al, 1998). Fig. 4. illustrates the

classic architecture of LeNet-5. As one of the

earliest CNNs, its “Convolutional +

Sampling(Pooling) + Fully Connection”

structure laid the foundation for modern image

classification techniques (Lecun et al, 1998).

LeNet-5's significance lies in its utilization of

convolutional and subsampling layers,

showcasing the effectiveness of hierarchical

feature learning. Despite its relatively simple

structure, its inclusion in this study allows for

a benchmark evaluation of fundamental model

performance.

Figure 4: Architecture of LeNet-5.

 VGG16: The year 2014 witnessed the

introduction of VGG16, a seminal architecture

crafted by Simonyan and Zisserman.

Renowned for its uniformity and depth,

VGG16's distinctive trait is the consistent use

of 3x3 convolutional filters across its layers

(shown in fig. 5) (Simonyan and Zisserman,

2015) This architectural simplicity contributes

to its ease of implementation and

interpretation. Although resource-intensive

due to its 16-layer depth, VGG16's deep

structure empowers it to capture intricate

features within images. Its selection in this

study enables the exploration of the impact of

architectural complexity on classification

outcomes.

DAML 2023 - International Conference on Data Analysis and Machine Learning

220

Figure 5: Architecture of VGG16.

 ResNet50: Introduced by He et al. in 2015,

ResNet50 represents a breakthrough in

addressing the challenges posed by vanishing

gradients in deep neural networks (shown in

fig. 6) (He et al, 2016). This architecture's

innovative use of skip connections allows for

the training of remarkably deep networks

without encountering diminishing gradient

magnitudes. ResNet50's 50-layer depth

showcases its ability to capture intricate image

features while maintaining gradient flow. Its

inclusion in this study offers insights into the

advantages conferred by skip connections and

depth in the field of image classification.

Figure 6: Architecture of ResNet50.

In conclusion, the methodology of this study

strategically selects and explores the historical

significance and unique characteristics of three

distinct CNN architectures. These selections span

pivotal moments in the timeline of deep learning,

allowing for a comprehensive evaluation of image

classification algorithms. The careful consideration of

these architectures ensures a robust foundation for the

ensuing comparative analysis.

4 EXPERIMENT

4.1 Dataset Selection Rationale and

Training Settings

 Dataset Selection: The selection of datasets in

this study is based on a deliberate strategy to

encompass a range of challenges that

progressively increase algorithmic demands.

The details are shown in Table II. In detail,

four factors were considered, number of

classes, number of images, image size, and

color channel. What all data sets have in

common is the RGB color channel.

Specifically, CIFAR-10 and STL-10 have

limited categories, while the former has more

instances with lower resolution. Moreover, the

CALTECH-101 dataset is much more

challenging where all the attributes are larger

or higher, except the number of instances. This

selection ensures a systematic evaluation of

CNN architectures across varying

complexities.

Table 2: Description of the datasets.

Dataset

# of

classes

# of

images

Image

size(pixel)

Color

channel

CIFAR-10 10 70,000 32x32 RGB

STL-10 10 130,000 96x96 RGB

CALTECH-101 101 9,144

variable

(higher than

224x224

in average)

RGB

 Training Settings: The models were trained for

120 epochs using the stochastic gradient

descent (SGD) optimizer with a momentum of

0.9. The criterion used for training was the

cross-entropy loss function.

4.2 Experiment Results

The results section presents the performance of

LeNet-5, VGG16, and ResNet50 across the three

chosen datasets: CIFAR-10, CALTECH-101, and

STL-10. The performance metrics are summarized in

Fig. 7 providing a concise overview of each

architecture's accuracy on each dataset.

Comparative Analysis of Image Classiﬁcation Algorithms Based on Traditional and Advanced Convolutional Neural Networks

221

Figure 7: Accuracies on the test sets of the datasets (Picture

credit: Original).

 LeNet-5: LeNet-5 exhibits the lowest accuracy

on all datasets. The performance on VGG16

and ResNet50 is comparable, while ResNet50

shows better processing of more complex data

sets. Furthermore, Since LeNet-5 is inherently

designed for simpler tasks, its architecture is

relatively shallower. Hence the under-

performance of LeNet-5 which struggles to

capture the nuances presented in more

complex datasets is predictable and acceptable.

 VGG16 and ResNet50: It is evident that the

more complex CNNs, VGG16 and ResNet50,

outperform LeNet-5 across all datasets.

Among all of them, ResNet50 achieves the

highest accuracy on both CALTECH-101 and

STL-10. These results underscore the

significance of architectural complexity in

enhancing image classification performance

across diverse datasets. The advanced

performance of VGG16 and ResNet50 on

CALTECH-101 and STL-10 can be attributed

to their deeper architectures, especially the

skip connections of ResNet50, which allow

them to capture intricate features in the diverse

images present in these datasets. The

complexity of CALTECH-101 and STL-10

aligns with ResNet50's strengths, while

VGG16's consistent architecture is advanta-

geous in maintaining a relatively good level in

the face of a large number of instances.

However, it is important to consider the

computational demands of VGG16,

particularly in resource-constrained scenarios.

In this experiment, VGG16 always took the

longest training time on all the datasets. In

summary, the ability of these architectures to

capture both low and high-level features is

evident in their superior performance on

CALTECH-101 and STL-10.

It is worth noting that the characteristics of a

dataset can have a significant impact on the observed

performance. In the case of CIFAR-10, CALTECH-

101, and STL-10, the selection of datasets presents a

set of challenges that reflect real-world scenarios. On

the one hand, among all the CNNs, the accuracies on

STL-10 are always the highest, which may be

attributed to this dataset containing the largest number

of instances with limited categories. On the other

hand, the accuracies of the easiest dataset, CIFAR-10,

on the more complex CNNs, VGG16 and ResNet50,

are not the best as expected. This may point out that

although more complex networks can adapt to more

complex environments, they do not always perform

very well in simple environments.

5 CONCLUSION

This study conducted a comprehensive analysis of

image classification algorithms using diverse datasets:

CIFAR-10, STL-10, and CALTECH-10, and CNN

architectures: LeNet-5, VGG16, and ResNet50. The

data set and network structure are carefully selected so

that the results can cover a large scenario. This paper

aims to figure out how architectural complexity

impacts performance across varying datasets. Results

highlighted the importance of architecture in

addressing dataset challenges. Across the datasets,

depth and skip connections were key. The depth of

VGG16 and the application of skip connections in

ResNet50 excelled on complex datasets, capturing

intricate features. In conclusion, this study informs

architectural decisions for diverse image classification

scenarios, bridging CNN design and dataset specifics.

Future research can explore intricate designs,

transfer learning, and hybrid models. These efforts

will advance image classification, producing

enhanced performance and generalization models.

REFERENCES

H. Yu, J. Zhao, and Y. Zhu, “Research on Face Recognition

Method Based on Deep Learning,” IEEE Xplore, Oct.

01, 2019.

D. Bhatt et al., “CNN variants for Computer Vision:

history, architecture, application, challenges and future

scope,” Electronics, vol. 10, no. 20, p. 2470, Oct. 2021.

Y. Pei, Y. Huang, Q. Zou, X. Zhang, and S. Wang, “Effects

of Image Degradation and Degradation Removal to

CNN-based Image Classification,” IEEE Transactions

on Pattern Analysis and Machine Intelligence, pp. 1–1,

2019.

DAML 2023 - International Conference on Data Analysis and Machine Learning

222

Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-

based learning applied to document

recognition,” Proceedings of the IEEE, vol. 86, no. 11,

pp. 2278–2324, 1998.

R. Liu, Y. Liu, Z. Wang, and H. Tian, “Research on face

recognition technology based on an improved LeNet-5

system,” IEEE Xplore, Jan. 01, 2022.

K. Simonyan and A. Zisserman, “Very Deep Convolutional

Networks for Large-Scale Image

Recognition,” arXiv.org, Apr. 10, 2015.

K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual

Learning for Image Recognition,” Thecvf.com, pp.

770–778, 2016.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet

Classification with Deep Convolutional Neural

Networks,” Advances in Neural Information

Processing Systems, vol. 25, 2012.

M. Tan and Q. Le, “EfficientNet: Rethinking Model

Scaling for Convolutional Neural

Networks,” proceedings.mlr.press, May 24, 2019.

A. Coates, A. Ng, and H. Lee, “An Analysis of Single-

Layer Networks in Unsupervised Feature

Learning,” proceedings.mlr.press, Jun 2021.

Li Fei-Fei, R. Fergus, and P. Perona, “Learning Generative

Visual Models from Few Training Examples: An

Incremental Bayesian Approach Tested on 101 Object

Categories,” 2004 Conference on Computer Vision and

Pattern Recognition Workshop.

V. Gulati and N. Raheja, “Efficiency Enhancement of

Machine Learning Approaches through the Impact of

Preprocessing Techniques,” IEEE Xplore, Oct. 01,

2021.

Comparative Analysis of Image Classiﬁcation Algorithms Based on Traditional and Advanced Convolutional Neural Networks

223