The Comprehensive Investigation of Lung Disease Classification
Based on SGD
Yixiang Fan
a
Information Management and Information Systems, Zhejiang Gongshang University, Zhejiang Province, China
Keywords: Deep Learning, Medical Science and Health, SGD Algorithm.
Abstract: Lung disease classification is an important research topic in the field of medical imaging. This paper explores
the use of the stochastic gradient descent (SGD) algorithm for classifying lung diseases. Initially, it details
the principles of the SGD algorithm and its application in lung disease classification. Following this, the paper
summarizes existing research on childhood pneumonia and introduces a novel approach named Stochastic
Gradient Descent with Warm Restarts Ensemble (SGDRE). This method combines an integration technique,
random gradient descent, and a hot restart mechanism to address prevalent issues in deep learning and enhance
the precision of early diagnosis. In the automatic detection of pneumonia, researchers use a new deep learning
method to simplify the detection process of pneumonia and improve the accuracy by using deep transfer
learning, and classify the bacteria and viruses of pneumonia. Finally, this study discussed the future research
directions and challenges, including how to use interpretability algorithm, Transfer learning and Federated
learning to further improve the interpretability of the model, the application of the system in different data
sets, and the protection of patient privacy. This paper aims to provide researchers with a comprehensive
understanding of lung disease classification using SGD algorithm.
1 INTRODUCTION
Lung disease has been a major problem in the world's
health field for a long time, resulting in severe effects
on people's health. Chronic Obstructive Pulmonary
Disease (COPD), asthma, pulmonary fibrosis, and so
on. All of these conditions have a serious impact on
patients' health and quality of life, resulting in
difficulty in breathing, coughing, chest pain, and
shortness of breath. Not only do they add pain to
patients, but they also bring a great burden to the
health care system and society. Accurate
identification and classification of pulmonary
diseases is essential to prevent, diagnose and treat.
The traditional diagnosis method usually depends
on the physician's subjective judgement and
experience, which results in a high misdiagnosis rate,
and restricts the diagnostic accuracy and
effectiveness (Qiu, 2022). Therefore, it is a hot spot
to use machine learning algorithms to classify lung
diseases. Robust Gradient Descent (SGD) is suitable
for large data sets. This efficiency makes it possible
for the model to learn patterns and features more
a
https://orcid.org/0009-0008-1735-6122
rapidly, thus increasing the precision of classification.
Moreover, the combination of machine learning and
SGD algorithms can help us to learn the key features
from the lung image and the clinical data. Through
training, the model can get the most representative
characteristics from the data, which can enhance the
validity and generalization capability of the model.
The combination of machine learning models and
SGD algorithms for lung disease classification has
many advantages. With the continuous development
and application of deep learning technology, many
research teams and medical institutions have begun to
explore how to use optimization algorithms such as
SGD to train efficient lung disease classification
models. Grega Vrbani is proposing an alternative
ensemble method, SGDRE (which uses CNN and
SGD with warm restarts), as part of his work. SGDRE
is a collection of CNN models that are built in a
manner that does not increase training time
(Vrbančič, 2022). The identification of pneumonia
from chest X-ray images could be done efficiently
and effectively using this method. More valuable
information about a lung cancer diagnosis can be
Fan, Y.
The Comprehensive Investigation of Lung Disease Classification Based on SGD.
DOI: 10.5220/0012939700004508
Paper published under CC license (CC BY-NC-ND 4.0)
In Proceedings of the 1st International Conference on Engineering Management, Information Technology and Intelligence (EMITI 2024), pages 435-439
ISBN: 978-989-758-713-9
Proceedings Copyright © 2024 by SCITEPRESS – Science and Technology Publications, Lda.
435
obtained through a CT scan. To enhance diagnosis
and treatment procedures, CT scan input pictures are
utilized in the formulation of various Machine
Learning (ML) and Deep Learning (DL) algorithms
(Gopinath, 2023). A method for detecting pneumonia
patients using detected X-ray images is proposed
using a combination of different optimizers and
transfer learning. A benchmark open dataset of chest
X-ray images is used to train the proposed deep
transfer learning method (Manickam, 2021).
This article aims to systematically review and
summarize the research progress of using SGD for
lung disease classification in recent years and explore
the challenges and future development directions
faced in this field. This article will introduce the
impact of lung diseases on patient health and the
limitations of traditional diagnostic methods,
emphasizing the importance of machine learning in
lung disease classification. Review and analyze the
research achievements of SGD algorithm in the field
of lung disease classification, explore the advantages
and disadvantages of different algorithm models, and
summarize the application effects of existing methods
in practice. Finally, the challenges and issues in
current research will be discussed, and future research
trends and directions will be proposed to provide
reference for further promoting the development and
application of lung disease classification technology.
2 INTRODUCTION TO THE
APPLICATION OF SGD IN
PULMONARY DISEASES
Stochastic Gradient Descent, abbreviated as SGD, is
a commonly used optimization algorithm in machine
learning and deep learning. The gradient descent
algorithm has a variant that is perfectly suitable for
large datasets.
The main idea behind SGD is to update the model
parameters by using only a subset (mini-batch) of
training data for each iteration (Ruder, 2016). In each
iteration, the algorithm computes a random mini-
batch of gradients and updates the parameters in the
reverse direction to minimize the loss. This process is
repeated over and over again, and the goal of SGD is
to find a set of optimum parameters to minimize the
loss and enhance the performance of the model.
SGD is used to classify lung diseases by
developing machine learning models that allow
accurate classification of medical images, such as X
rays and CT scans, into various types of pulmonary
diseases. The rationale for using SGD in this context
is to provide an efficient optimization approach that
can deal with the complexity and scale of medical
imaging data.
2.1 SGDRE
Early diagnosis of childhood pneumonia is essential
for early treatment. Stochastic Gradient Descent with
Warm Restarts Ensemble (SGDRE) was developed
by the authors. The SGDRE algorithm solves the
generalization problem by using the average
ensemble method, and the SGDR mechanism is used
to obtain the various classifiers required to assemble
the ensemble. Using the average ensemble method, a
variety of classifiers are obtained by using the SGDR
mechanism of SGDRE method. The multimodal
character of the cost function can be solved by
Stochastic Gradient Descent with Restart (SGDR)
design. Learning speed can be abruptly increased to
search for a global minimum, but SGDR may drop to
a local minimum in the course of training(Vrbančič,
2022). In SGDRE there are four phases, beginning
with the initial training phase, then progressing to
SGD reboot 1, SGD reboot 2, and integration phase.
Using different learning rate annealing functions
(cosine annealing, linear reduction, and sine-based
annealing), the maximum number of different models
can be obtained. In a limited training budget, the first
phase of training is done, and the rest of the budget is
spent on SGD reboot 1 and SGD reboot
2(Loshchilov,2017). Finally, the collected models
are evaluated at the integration stage, and the three
best performance models are chosen to construct the
final integrated model.
2.2 Automated Pneumonia Detection
Using deep transfer learning to streamline and boost
detection accuracy, researchers have developed a
unique deep learning technique for the autonomous
identification of pneumonia. This study aims to
preprocess input chest X-ray images, classify
pneumonia as bacterial or viral using pre-trained
models on the ImageNet dataset (e.g., ResNet50,
InceptionV3, InceptionResNetV2), and use
segmentation based on the U-Net architecture to
identify the presence of pneumonia (Manickam,
2021).
Two optimizers were employed to extract useful
features and raise the pre-trained model's accuracy.
Adam computes individual adaptive learning rates for
various parameters by combining the benefits of two
SGD extensions: adaptive gradient algorithm
(AdaGrad) and root mean square propagation
EMITI 2024 - International Conference on Engineering Management, Information Technology and Intelligence
436
(RMSProp). Despite Adam's widespread appeal, new
research indicates that he might not always be able to
"converge to the optimal solution" in particular
situations. AdaBound, a novel optimizer that
convergence occurs to SGD at indefinite bounds, is
well-defined, well-structured, and was proposed by
Liangchen et al. (Luo, 2019). It can generalize more
effectively and converges more quickly. These two
optimization methods were employed in this work,
and the performance of each at batch counts of 16 and
32 was examined independently. The performance of
the pre-trained model was examined and contrasted
with other convolutional neural network (CNN)
models based on the values that were acquired.
2.3 Automated LUS Scoring of
COVID-19 Pneumonia Patients
In order to thoroughly analyze the entire dataset, the
researchers utilized 5-fold cross validation along with
a secondary selection approach based on the ResNet-
50 model (He, 2016). For the secondary selection of
LUS images, a combination of five deep neural
network models rooted in ResNet-50 were applied,
complemented by a SoftMax classifier (Luo, 2021)
and SGDM (Jayalakshmy, 2020) optimizer.
The predominant components of this model
consist of convolutional and identity blocks. The
former primarily focuses on adjusting the network
dimensions, while the latter aims to enhance the
network's depth. In networks with fewer layers,
normalizing data in the intermediate layer enables the
utilization of a stochastic gradient descent algorithm
during backpropagation. By introducing a residual
module incorporating a convolutional neural network
model, the direct transmission of input data to the
output layer, bypassing the convolutional layer, is
made possible. This module has the capability to
retain original information and effectively combat the
challenge of gradient vanishing during
backpropagation. Through this process, deep network
training and feature extraction can be accomplished.
The incorporation of a zero-pad layer prior to the
convolutional layer ensures that the dimensions of the
input image and the feature map after the remain
consistent (Xing, 2022).
3 DISCUSSIONS
When using SGD for lung disease classification, there
are several limitations and challenges to consider,
including interpretability, applicability, and privacy
issues.
In terms of interpretability, SGD-based neural
network model can be considered as a "black box"
model, meaning that the inner workings of the
algorithm can be Lack of transparency. The inner
workings of SGD-based model involve numerous
layers and parameters, making it challenging to
interpret how each parameter contributes to the final
prediction. This lack of transparency can undermine
trust in the model's reliability and robustness,
especially in critical healthcare applications where
transparency and accountability are essential.
For the applicability, in medical image analysis, it
is challenging to use the same type of classification
algorithm on different subsets of datasets due to poor
generalization ability, the necessity for large datasets,
and the time complexity of the learning process.
When applying the same algorithm to different
subsets of medical image datasets, these challenges
make it difficult to achieve consistent and accurate
results (Vrbancic, 2019).
In terms of model privacy, training machine
learning models on patient data can lead to issues
related to data privacy and security, and there is a risk
of unintentional disclosure of confidential
information. It is crucial to implement powerful
privacy protection technologies, such as data
anonymization and encryption, to protect patient
privacy.
These limitations and challenges must be
addressed through careful model selection, validation,
and ethical considerations to ensure responsible
deployment of machine learning models in healthcare
applications. As a classic optimization algorithm,
SGD-based neural network models have broad
application prospects in lung disease classification
tasks, such as grad-CAM, SHAP, Transfer learning
and Federated learning.
The decision-making process of convolutional
neural networks (CNNs) in image classification tasks
can be visualized and understood using Gradient-
weighted Class Activation Mapping (Grad-CAM).
Visualizing the regions in lung images that contribute
the most to model predictions is possible when using
Gradient CAM for lung disease classification.
Enhance the explanatory and interpretable nature of
the proposed deep learning model by utilizing grad
CAM technology (Panwar, 2020). By assigning an
importance score to every feature in a complex
machine learning model, Shapley Additive
Explanations (SHAP) is a powerful method for
interpreting predictions. The trust and confidence in
classification results is enhanced by the transparency
and explanatory power of SHAP, which leads to
better decision-making and more accurate treatment
strategies (Nahiduzzaman, 2024).
The Comprehensive Investigation of Lung Disease Classification Based on SGD
437
Transfer learning involves leveraging knowledge
gained from solving one problem and applying it to a
different but related problem. In the classification of
lung diseases, the data set of lung diseases may be
small or unbalanced, and migration learning can solve
this problem by using the information in other large
data sets. If a well-trained model has been used for
the classification of a certain lung disease, it can be
used as a pre training model, and then applied to solve
the problem of new lung disease classification
through fine-tuning.
Federated learning is a machine learning method
designed to train models without sending raw data
from devices to a central server. On the contrary, the
model is trained on the local device, and then only
updates or gradients of the model are sent to the
central server, which updates the global model after
aggregation. Federated learning provides a solution to
protect user data privacy, such as medical records and
personal preferences, by training models on local
devices and aggregating updates. In addition, the
developed algorithms should rely on more advanced
hardware or transmission mechanisms to achieve
higher processing speeds and more accurate
identification capabilities (Deng, 2023; Sugaya,
2019).
4 CONCLUSIONS
Through this research, a systematic summary and
analysis have been conducted on the use of SGD
algorithm for lung disease classification. Through a
comprehensive evaluation of multiple cases and
research results such as SGDRE, Automated
pneumonia detection, automated LUS, The SGD
algorithm has shown good performance and
effectiveness in lung disease classification tasks.
The SGD algorithm has strong scalability and
generalization ability, can adapt to different types and
scales of lung disease datasets, and has a certain
degree of noise resistance and robustness. It can
achieve high accuracy and stability on medical
imaging datasets, providing strong support for the
accurate diagnosis of lung diseases. Compared with
other traditional machine learning algorithms and
deep learning methods, SGD algorithm has
significant advantages in computational efficiency
and model convergence speed. This makes SGD an
important choice for processing large-scale medical
imaging data.
Although the SGD algorithm has made significant
progress in lung disease classification, it still faces
some challenges and limitations, such as the quality
of data annotations and user privacy, which require
further improvement and exploration. Future research
can focus on improving the accuracy and
interpretability of the SGD algorithm in lung disease
classification and promoting its widespread
application in clinical practice.
REFERENCES
Deng, X., Oda, S., Kawano, Y., 2023. Graphene-based
midinfrared photodetector with bull’ s eye plasmonic
antenna. Optical Engineering, 62(9), p. 097102-
097102.
Gopinath, A., et al. 2023. Computer aided model for lung
cancer classification using cat optimized convolutional
neural networks. Measurement: Sensors.
He, K. M., et al. 2016. Deep Residual Learning for Image
Recognition. 2016 IEEE Conference on Computer
Vision and Pattern Recognition.
Jayalakshmy, S., & Sudha, G. F. 2020. Scalogram based
prediction model for respiratory disorders using
optimized convolutional neural networks. Artificial
Intelligence in Medicine, 103, 10.
Loshchilov, I., & Hutter, F. 2017. SGDR: Stochastic
Gradient Descent with Warm Restarts. In International
Conference on Learning Representations (ICLR).
Luo, J., et al. 2021. Improving the performance of
multisubject motor imagery-based BCIs using twin
cascaded softmax CNNs. Journal of Neural
Engineering, 18.
Luo, L., Xiong, Y., Liu, Y., & Sun, X. 2019. Adaptive
Gradient Methods with Dynamic Bound of Learning
Rate. arXiv preprint arXiv:1902.09843.
Manickam, A., et al. 2021. Automated pneumonia detection
on chest X-ray images: A deep learning approach with
different optimizers and transfer learning architectures.
Measurement.
Nahiduzzaman, M., et al. 2024. A novel framework for lung
cancer classification using lightweight convolutional
neural networks and ridge extreme learning machine
model with SHapley Additive exPlanations (SHAP).
Expert Systems with Applications, 248.
Panwar, H., et al. 2020. A deep learning and grad-CAM
based color visualization approach for fast detection of
COVID-19 cases using chest X-ray and CT-Scan
images. Chaos, Solitons & Fractals, 140.
Qiu, Y., et al. 2022. Pose-guided matching based on deep
learning for assessing quality of action on rehabilitation
training. Biomedical Signal Processing and Control, 72,
103323.
Ruder, S. 2016. An overview of gradient descent
optimization algorithms. arXiv preprint arXiv:
1609.04747.
Sugaya, T., Deng, X., 2019. Resonant frequency tuning of
terahertz plasmonic structures based on solid
EMITI 2024 - International Conference on Engineering Management, Information Technology and Intelligence
438
immersion method. 2019 44th International Conf. on
Infrared, Millimeter, and Terahertz Waves, p.1-2.
Vrbancic, G., et al. 2019. Automatic detection of heartbeats
in heart sound signals using deep convolutional neural
networks. Elektronika Ir Elektrotechnika, 25(3).
Vrbančič, G., & Podgorelec, V. 2022. Efficient ensemble
for image-based identification of Pneumonia utilizing
deep CNN and SGD with warm restarts. Expert
Systems with Applications, 187.
Xing, W., et al. 2022. Automated lung ultrasound scoring
for evaluation of coronavirus disease 2019 pneumonia
using two-stage cascaded deep learning model.
Biomedical Signal Processing and Control.
The Comprehensive Investigation of Lung Disease Classification Based on SGD
439