the SVM model with a linear kernel. The importance
of each feature in classification is shown in Fig. 3. One
feature stands out as more significant than others:
"Diabetes Pedigree Function." Lastly, a performance
evaluation is carried out to assess the predictive
ability, stability, and discriminative power of the
trained models.
Figure 3: Feature Importance Plot (Picture credit: Original).
Table 1 presents the comparison results of the
SVM models under the three kernels. The polynomial
kernel achieves the highest accuracy, while the linear
kernel has the Highest Cross-Validation. Based on the
analysis above, it is evident that the "Diabetes
Pedigree Function" feature significantly influences the
accuracy of the SVM model's prediction of the
likelihood of mortality in DM patients. The specific
values of this feature serve as valuable indicators and
help determine whether a DM patient is likely to die.
These findings hold practical significance for the
medical industry. Collecting the family medical
history of DM patients can provide insights into
whether a patient faces a higher risk of mortality
compared to others. By understanding the health status
and basic information of DM patients, doctors can
take early measures to reduce the likelihood of patient
mortality.
Table 1: Classification Report.
Model
Performance
Accuracy
Highest
Cross-Validation
RBF 76.62% 79.08%
Linear 75.32% 81.04%
Polynomial 75.97% 79.08%
4 CONCLUSION
This paper introduces the SVM to construct the
analysis model. First, EDA is employed to determine
which features might impact diabetes prediction and
how to prepare the data in the required format for
training machine learning models. In this study,
multiple data visualization techniques such as bar
charts and pie charts are used to understand the data
needed for the research. Second, SVM is employed to
predict the presence of diabetes. Linear SVM is used
to make classification decisions by learning
relationships between different features. Last,
obtained accuracy, cross-validation scores, and the
highest cross-validation score lead to identifying
significant factors influencing diabetes. The results
indicate that the "Diabetes Pedigree Function" feature
has a significant impact on the mortality rate of
patients with diabetes. Using this model, researchers
have gained a clear understanding of the main
elements contributing to the mortality rate in diabetes
patients. In the future, studying the impact of dietary
habits on the susceptibility of the general population
to diabetes will be considered as the research objective
for the next stage. This type of analysis on disease risk
assessment could provide valuable assistance for the
advancement of the medical industry and the
implementation of preventive measures for patients.
REFERENCES
L. Yu, T. Chen, H. Jin, B. F. Xu, “Blood Glucose
Prediction is based on the Combination of a Support
Vector Machine and Auto-Regressive Integrated
Moving Average Model,” Chinese Journal of Medical
Physics, vol. 33, 2021 pp. 381-384
M. U. Emon, M. S. Keya, M. S. Kaiser, M. A. islam, T.
Tanha, M. S. Zulfiker, “Primary Stage of Diabetes
Prediction using Machine Learning Approaches,” 2021
International Conference on Artificial Intelligence and
Smart Systems (ICAIS), IEEE, 2021, pp. 364-367
A. M. Zeki, R. Taha, S. Alshakrani, “Developing A
Predictive Model for Diabetes Using Data Mining
Techniques,” 2021 International Conference on
Innovation and Intelligence for Informatics,
Computing, and Technologies (3ICT), IEEE, 2021, pp.
24-28
J. H. Li, Q. R. Gu, “Application Research of Neural
Networks and Data Mining Techniques in Medical
Diagnosis,” Journal of Engineering Science and
Educational Studies, vol. 7, 2010, pp. 154-169
X. Tong, C. Yang, Q. Meng, “Construction of a Risk
Assessment Model for Diabetic Nephropathy in
Traditional Chinese Medicine ("Tong Bing Yi Zheng")
Based on Multi-Label Machine Learning,” Chinese
Journal of General Practice, vol. 20, 2022, p. 6
X, Bai, B. Chen, X. Gao, J. Li, “Correlation Between
Diabetes and Body Composition of Based on Decision
Tree and Neural Network,” 2019 Chinese Control and
Decision Conference (CCDC), IEEE, 2019, pp. 4992-
4997