disease. The table reported the precision, recall, F
1
score and supported for each class and each method.
According to the table, KNN had a high average F1
score in both classes and a better performance in the
positive class than the negative class. It could
sensitively identify the patients with heart disease and
avoid false negatives. SVM had the highest average F1
score in both classes and a better performance in the
negative class than the positive class. It could
accurately identify healthy people and avoid false
positives. Adaboost was the worst method, as it had the
lowest average F1 score in both classes, and a
consistent performance in both classes. It could not
effectively distinguish between the two classes or
avoid errors.
If the target was to minimize the false negative rate,
which meant the probability of misclassifying a patient
with heart disease as healthy, a classification method
that had a high recall on the positive class was favored.
According to the table, KNN had a recall of 0.8056 on
the positive class, which was 0.0470 higher than SVM
and 0.1159 higher than Adaboost. This indicated that
KNN could better identify the patients with heart
disease, and thus avoided the false negatives.
4 CONCLUSION
This research aimed to explore the prediction of heart
disease using various machine learning methods and
data preprocessing techniques. The research also
compared different machine learning methods to find
a better prediction method. The dataset used contained
303 patient observations with 14 different variables.
Several key findings and insights emerged from the
analysis. Age and sex played significant roles in
disease assessment. Female patients were more likely
to have a higher risk of heart disease. However, the risk
of heart attack didn't exhibit a straightforward trend
with age. Different chest pain types were associated
with varying probabilities of heart attacks. Patients
with non-anginal pain (type 2) were more prone to
heart attacks. Patients with higher maximum heart
rates during exercise had a higher risk of heart attack,
especially at younger ages. The slope of the peak
exercise ST segment and ST depression values induced
by exercise relative to rest were indicative of heart
disease risk. Type 2 slope (decrease trend) and lower
ST depression values correlated with higher risk.
Patients without exercise-induced angina were more
likely to suffer from heart attacks. Moreover, the more
major vessels checked, the lower the risk of heart
attack.
The study then evaluated the performance of three
machine learning algorithms (KNN, SVM, and
Adaboost) on both the original and processed datasets.
SVM emerged as the best-performing algorithm based
on accuracy. KNN, on the other hand, had a higher
sensitivity, making it suitable for minimizing false
negatives, which is crucial in heart disease prediction.
In conclusion, the choice of machine learning
algorithm depends on the specific goals of the
prediction model. SVM excelled in overall accuracy,
while KNN showed a higher sensitivity for identifying
patients with heart disease. Further research could
explore ensemble methods and hybrid models to
combine the strengths of different algorithms and
improve predictive accuracy while minimizing false
negatives, ultimately helping in the early diagnosis and
treatment of heart disease, increasing survival rates of
patients.
4.1 Authors Contribution
All the authors contributed equally and their names
were listed in alphabetical order.
REFERENCES
R. J. Goldberg, et al. “The impact of age on the incidence
and prognosis of initial acute myocardial infarction: The
Worcester Heart Attack Study,” American Heart
Journal, 1989, vol. 117, no. 3, pp. 543–549.
C.-M. Chow, L. R. Donovan, D. Manuel, H. Johansen, and
J. V. Tu, "Regional variation in self-reported heart
disease prevalence in Canada," Canadian Journal of
Cardiology, 2005.
M. Y. Zakharova, R. M. Meyer, K.R. Brandy, Y. H. Datta,
M. S. Joseph, P. J. Schreiner et al., “Risk factors for
heart attack, stroke, and venous thrombosis associated
with hormonal contraceptive use,” Clinical and Applied
Thrombosis-Hemostasis, 2011, pp. 323–331.
Paul M. Ridker, “Evaluating novel cardiovascular risk
factors: Can we better predict heart attacks?,” Annals of
Internal Medicine, 1999, vol. 130, no. 11, pp. 933.
C. Wang, Z. Guo, J. Yan, “Research on improved support
vector machine in heart disease prediction,” Computer
Technology and Development, 2022, vol. 32, no. 03, pp.
175-179.
X. Zhang, “Analysis of diagnostic factors of heart disease
based on logistic regression and decision tree,” Modern
Information Technology, 2023, vol. 7, no. 7, pp. 117-
119+123.
R. Xin, Z. Dong, F. Miao, T. Wang, Y. Li, X. Feng,
“Research on heart disease prediction model based on
machine learning,” Journal of Jilin Institute of Chemical
Technology, 2022, vol. 39, no. 9, pp. 27-32.
Z. Zhang, W. Hu, “Heart disease prediction based on feature
selection approach and probabilistic neural network,”
Machine Learing-Based Heart Disease Prediction: Insights and Comparative Analysis
313