The Comprehensive Investigation of Lung Disease Classification

Based on SGD

Yixiang Fan

Information Management and Information Systems, Zhejiang Gongshang University, Zhejiang Province, China

Keywords: Deep Learning, Medical Science and Health, SGD Algorithm.

Abstract: Lung disease classification is an important research topic in the field of medical imaging. This paper explores

the use of the stochastic gradient descent (SGD) algorithm for classifying lung diseases. Initially, it details

the principles of the SGD algorithm and its application in lung disease classification. Following this, the paper

summarizes existing research on childhood pneumonia and introduces a novel approach named Stochastic

Gradient Descent with Warm Restarts Ensemble (SGDRE). This method combines an integration technique,

random gradient descent, and a hot restart mechanism to address prevalent issues in deep learning and enhance

the precision of early diagnosis. In the automatic detection of pneumonia, researchers use a new deep learning

method to simplify the detection process of pneumonia and improve the accuracy by using deep transfer

learning, and classify the bacteria and viruses of pneumonia. Finally, this study discussed the future research

directions and challenges, including how to use interpretability algorithm, Transfer learning and Federated

learning to further improve the interpretability of the model, the application of the system in different data

sets, and the protection of patient privacy. This paper aims to provide researchers with a comprehensive

understanding of lung disease classification using SGD algorithm.

1 INTRODUCTION

Lung disease has been a major problem in the world's

health field for a long time, resulting in severe effects

on people's health. Chronic Obstructive Pulmonary

Disease (COPD), asthma, pulmonary fibrosis, and so

on. All of these conditions have a serious impact on

patients' health and quality of life, resulting in

difficulty in breathing, coughing, chest pain, and

shortness of breath. Not only do they add pain to

patients, but they also bring a great burden to the

health care system and society. Accurate

identification and classification of pulmonary

diseases is essential to prevent, diagnose and treat.

The traditional diagnosis method usually depends

on the physician's subjective judgement and

experience, which results in a high misdiagnosis rate,

and restricts the diagnostic accuracy and

effectiveness (Qiu, 2022). Therefore, it is a hot spot

to use machine learning algorithms to classify lung

diseases. Robust Gradient Descent (SGD) is suitable

for large data sets. This efficiency makes it possible

for the model to learn patterns and features more

https://orcid.org/0009-0008-1735-6122

rapidly, thus increasing the precision of classification.

Moreover, the combination of machine learning and

SGD algorithms can help us to learn the key features

from the lung image and the clinical data. Through

training, the model can get the most representative

characteristics from the data, which can enhance the

validity and generalization capability of the model.

The combination of machine learning models and

SGD algorithms for lung disease classification has

many advantages. With the continuous development

and application of deep learning technology, many

research teams and medical institutions have begun to

explore how to use optimization algorithms such as

SGD to train efficient lung disease classification

models. Grega Vrbani is proposing an alternative

ensemble method, SGDRE (which uses CNN and

SGD with warm restarts), as part of his work. SGDRE

is a collection of CNN models that are built in a

manner that does not increase training time

(Vrbančič, 2022). The identification of pneumonia

from chest X-ray images could be done efficiently

and effectively using this method. More valuable

information about a lung cancer diagnosis can be

Fan, Y.

The Comprehensive Investigation of Lung Disease Classiﬁcation Based on SGD.

DOI: 10.5220/0012939700004508

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 1st International Conference on Engineering Management, Information Technology and Intelligence (EMITI 2024), pages 435-439

ISBN: 978-989-758-713-9

435

obtained through a CT scan. To enhance diagnosis

and treatment procedures, CT scan input pictures are

utilized in the formulation of various Machine

Learning (ML) and Deep Learning (DL) algorithms

(Gopinath, 2023). A method for detecting pneumonia

patients using detected X-ray images is proposed

using a combination of different optimizers and

transfer learning. A benchmark open dataset of chest

X-ray images is used to train the proposed deep

transfer learning method (Manickam, 2021).

This article aims to systematically review and

summarize the research progress of using SGD for

lung disease classification in recent years and explore

the challenges and future development directions

faced in this field. This article will introduce the

impact of lung diseases on patient health and the

limitations of traditional diagnostic methods,

emphasizing the importance of machine learning in

lung disease classification. Review and analyze the

research achievements of SGD algorithm in the field

of lung disease classification, explore the advantages

and disadvantages of different algorithm models, and

summarize the application effects of existing methods

in practice. Finally, the challenges and issues in

current research will be discussed, and future research

trends and directions will be proposed to provide

reference for further promoting the development and

application of lung disease classification technology.

2 INTRODUCTION TO THE

APPLICATION OF SGD IN

PULMONARY DISEASES

Stochastic Gradient Descent, abbreviated as SGD, is

a commonly used optimization algorithm in machine

learning and deep learning. The gradient descent

algorithm has a variant that is perfectly suitable for

large datasets.

The main idea behind SGD is to update the model

parameters by using only a subset (mini-batch) of

training data for each iteration (Ruder, 2016). In each

iteration, the algorithm computes a random mini-

batch of gradients and updates the parameters in the

reverse direction to minimize the loss. This process is

repeated over and over again, and the goal of SGD is

to find a set of optimum parameters to minimize the

loss and enhance the performance of the model.

SGD is used to classify lung diseases by

developing machine learning models that allow

accurate classification of medical images, such as X

rays and CT scans, into various types of pulmonary

diseases. The rationale for using SGD in this context

is to provide an efficient optimization approach that

can deal with the complexity and scale of medical

imaging data.

2.1 SGDRE

Early diagnosis of childhood pneumonia is essential

for early treatment. Stochastic Gradient Descent with

Warm Restarts Ensemble (SGDRE) was developed

by the authors. The SGDRE algorithm solves the

generalization problem by using the average

ensemble method, and the SGDR mechanism is used

to obtain the various classifiers required to assemble

the ensemble. Using the average ensemble method, a

variety of classifiers are obtained by using the SGDR

mechanism of SGDRE method. The multimodal

character of the cost function can be solved by

Stochastic Gradient Descent with Restart (SGDR)

design. Learning speed can be abruptly increased to

search for a global minimum, but SGDR may drop to

a local minimum in the course of training(Vrbančič,

2022). In SGDRE there are four phases, beginning

with the initial training phase, then progressing to

SGD reboot 1, SGD reboot 2, and integration phase.

Using different learning rate annealing functions

(cosine annealing, linear reduction, and sine-based

annealing), the maximum number of different models

can be obtained. In a limited training budget, the first

phase of training is done, and the rest of the budget is

spent on SGD reboot 1 and SGD reboot

2(Loshchilov,2017). Finally, the collected models

are evaluated at the integration stage, and the three

best performance models are chosen to construct the

final integrated model.

2.2 Automated Pneumonia Detection

Using deep transfer learning to streamline and boost

detection accuracy, researchers have developed a

unique deep learning technique for the autonomous

identification of pneumonia. This study aims to

preprocess input chest X-ray images, classify

pneumonia as bacterial or viral using pre-trained

models on the ImageNet dataset (e.g., ResNet50,

InceptionV3, InceptionResNetV2), and use

segmentation based on the U-Net architecture to

identify the presence of pneumonia (Manickam,

2021).

Two optimizers were employed to extract useful

features and raise the pre-trained model's accuracy.

Adam computes individual adaptive learning rates for

various parameters by combining the benefits of two

SGD extensions: adaptive gradient algorithm

(AdaGrad) and root mean square propagation

EMITI 2024 - International Conference on Engineering Management, Information Technology and Intelligence

436

(RMSProp). Despite Adam's widespread appeal, new

research indicates that he might not always be able to

"converge to the optimal solution" in particular

situations. AdaBound, a novel optimizer that

convergence occurs to SGD at indefinite bounds, is

well-defined, well-structured, and was proposed by

Liangchen et al. (Luo, 2019). It can generalize more

effectively and converges more quickly. These two

optimization methods were employed in this work,

and the performance of each at batch counts of 16 and

32 was examined independently. The performance of

the pre-trained model was examined and contrasted

with other convolutional neural network (CNN)

models based on the values that were acquired.

2.3 Automated LUS Scoring of

COVID-19 Pneumonia Patients

In order to thoroughly analyze the entire dataset, the

researchers utilized 5-fold cross validation along with

a secondary selection approach based on the ResNet-

50 model (He, 2016). For the secondary selection of

LUS images, a combination of five deep neural

network models rooted in ResNet-50 were applied,

complemented by a SoftMax classifier (Luo, 2021)

and SGDM (Jayalakshmy, 2020) optimizer.

The predominant components of this model

consist of convolutional and identity blocks. The

former primarily focuses on adjusting the network

dimensions, while the latter aims to enhance the

network's depth. In networks with fewer layers,

normalizing data in the intermediate layer enables the

utilization of a stochastic gradient descent algorithm

during backpropagation. By introducing a residual

module incorporating a convolutional neural network

model, the direct transmission of input data to the

output layer, bypassing the convolutional layer, is

made possible. This module has the capability to

retain original information and effectively combat the

challenge of gradient vanishing during

backpropagation. Through this process, deep network

training and feature extraction can be accomplished.

The incorporation of a zero-pad layer prior to the

convolutional layer ensures that the dimensions of the

input image and the feature map after the remain

consistent (Xing, 2022).

3 DISCUSSIONS

When using SGD for lung disease classification, there

are several limitations and challenges to consider,

including interpretability, applicability, and privacy

issues.

In terms of interpretability, SGD-based neural

network model can be considered as a "black box"

model, meaning that the inner workings of the

algorithm can be Lack of transparency. The inner

workings of SGD-based model involve numerous

layers and parameters, making it challenging to

interpret how each parameter contributes to the final

prediction. This lack of transparency can undermine

trust in the model's reliability and robustness,

especially in critical healthcare applications where

transparency and accountability are essential.

For the applicability, in medical image analysis, it

is challenging to use the same type of classification

algorithm on different subsets of datasets due to poor

generalization ability, the necessity for large datasets,

and the time complexity of the learning process.

When applying the same algorithm to different

subsets of medical image datasets, these challenges

make it difficult to achieve consistent and accurate

results (Vrbancic, 2019).

In terms of model privacy, training machine

learning models on patient data can lead to issues

related to data privacy and security, and there is a risk

of unintentional disclosure of confidential

information. It is crucial to implement powerful

privacy protection technologies, such as data

anonymization and encryption, to protect patient

privacy.

These limitations and challenges must be

addressed through careful model selection, validation,

and ethical considerations to ensure responsible

deployment of machine learning models in healthcare

applications. As a classic optimization algorithm,

SGD-based neural network models have broad

application prospects in lung disease classification

tasks, such as grad-CAM, SHAP, Transfer learning

and Federated learning.

The decision-making process of convolutional

neural networks (CNNs) in image classification tasks

can be visualized and understood using Gradient-

weighted Class Activation Mapping (Grad-CAM).

Visualizing the regions in lung images that contribute

the most to model predictions is possible when using

Gradient CAM for lung disease classification.

Enhance the explanatory and interpretable nature of

the proposed deep learning model by utilizing grad

CAM technology (Panwar, 2020). By assigning an

importance score to every feature in a complex

machine learning model, Shapley Additive

Explanations (SHAP) is a powerful method for

interpreting predictions. The trust and confidence in

classification results is enhanced by the transparency

and explanatory power of SHAP, which leads to

better decision-making and more accurate treatment

strategies (Nahiduzzaman, 2024).

The Comprehensive Investigation of Lung Disease Classiﬁcation Based on SGD

437

Transfer learning involves leveraging knowledge

gained from solving one problem and applying it to a

different but related problem. In the classification of

lung diseases, the data set of lung diseases may be

small or unbalanced, and migration learning can solve

this problem by using the information in other large

data sets. If a well-trained model has been used for

the classification of a certain lung disease, it can be

used as a pre training model, and then applied to solve

the problem of new lung disease classification

through fine-tuning.

Federated learning is a machine learning method

designed to train models without sending raw data

from devices to a central server. On the contrary, the

model is trained on the local device, and then only

updates or gradients of the model are sent to the

central server, which updates the global model after

aggregation. Federated learning provides a solution to

protect user data privacy, such as medical records and

personal preferences, by training models on local

devices and aggregating updates. In addition, the

developed algorithms should rely on more advanced

hardware or transmission mechanisms to achieve

higher processing speeds and more accurate

identification capabilities (Deng, 2023; Sugaya,

2019).

4 CONCLUSIONS

Through this research, a systematic summary and

analysis have been conducted on the use of SGD

algorithm for lung disease classification. Through a

comprehensive evaluation of multiple cases and

research results such as SGDRE, Automated

pneumonia detection, automated LUS, The SGD

algorithm has shown good performance and

effectiveness in lung disease classification tasks.

The SGD algorithm has strong scalability and

generalization ability, can adapt to different types and

scales of lung disease datasets, and has a certain

degree of noise resistance and robustness. It can

achieve high accuracy and stability on medical

imaging datasets, providing strong support for the

accurate diagnosis of lung diseases. Compared with

other traditional machine learning algorithms and

deep learning methods, SGD algorithm has

significant advantages in computational efficiency

and model convergence speed. This makes SGD an

important choice for processing large-scale medical

imaging data.

Although the SGD algorithm has made significant

progress in lung disease classification, it still faces

some challenges and limitations, such as the quality

of data annotations and user privacy, which require

further improvement and exploration. Future research

can focus on improving the accuracy and

interpretability of the SGD algorithm in lung disease

classification and promoting its widespread

application in clinical practice.

REFERENCES

Deng, X., Oda, S., Kawano, Y., 2023. Graphene-based

midinfrared photodetector with bull’ s eye plasmonic

antenna. Optical Engineering, 62(9), p. 097102-

097102.

Gopinath, A., et al. 2023. Computer aided model for lung

cancer classification using cat optimized convolutional

neural networks. Measurement: Sensors.

He, K. M., et al. 2016. Deep Residual Learning for Image

Recognition. 2016 IEEE Conference on Computer

Vision and Pattern Recognition.

Jayalakshmy, S., & Sudha, G. F. 2020. Scalogram based

prediction model for respiratory disorders using

optimized convolutional neural networks. Artificial

Intelligence in Medicine, 103, 10.

Loshchilov, I., & Hutter, F. 2017. SGDR: Stochastic

Gradient Descent with Warm Restarts. In International

Conference on Learning Representations (ICLR).

Luo, J., et al. 2021. Improving the performance of

multisubject motor imagery-based BCIs using twin

cascaded softmax CNNs. Journal of Neural

Engineering, 18.

Luo, L., Xiong, Y., Liu, Y., & Sun, X. 2019. Adaptive

Gradient Methods with Dynamic Bound of Learning

Rate. arXiv preprint arXiv:1902.09843.

Manickam, A., et al. 2021. Automated pneumonia detection

on chest X-ray images: A deep learning approach with

different optimizers and transfer learning architectures.

Measurement.

Nahiduzzaman, M., et al. 2024. A novel framework for lung

cancer classification using lightweight convolutional

neural networks and ridge extreme learning machine

model with SHapley Additive exPlanations (SHAP).

Expert Systems with Applications, 248.

Panwar, H., et al. 2020. A deep learning and grad-CAM

based color visualization approach for fast detection of

COVID-19 cases using chest X-ray and CT-Scan

images. Chaos, Solitons & Fractals, 140.

Qiu, Y., et al. 2022. Pose-guided matching based on deep

learning for assessing quality of action on rehabilitation

training. Biomedical Signal Processing and Control, 72,

103323.

Ruder, S. 2016. An overview of gradient descent

optimization algorithms. arXiv preprint arXiv:

1609.04747.

Sugaya, T., Deng, X., 2019. Resonant frequency tuning of

terahertz plasmonic structures based on solid

EMITI 2024 - International Conference on Engineering Management, Information Technology and Intelligence

438

immersion method. 2019 44th International Conf. on

Infrared, Millimeter, and Terahertz Waves, p.1-2.

Vrbancic, G., et al. 2019. Automatic detection of heartbeats

in heart sound signals using deep convolutional neural

networks. Elektronika Ir Elektrotechnika, 25(3).

Vrbančič, G., & Podgorelec, V. 2022. Efficient ensemble

for image-based identification of Pneumonia utilizing

deep CNN and SGD with warm restarts. Expert

Systems with Applications, 187.

Xing, W., et al. 2022. Automated lung ultrasound scoring

for evaluation of coronavirus disease 2019 pneumonia

using two-stage cascaded deep learning model.

Biomedical Signal Processing and Control.

The Comprehensive Investigation of Lung Disease Classiﬁcation Based on SGD

439