experiments are done based on different values of k
and TABLE II shows the results.
Table 2: Experimental Results Based on Cross Validation.
Folds
Metrics
Accuracy Precision Recall F1 score
3 0.983 0.974 0.989 0.981
5 0.994 0.994 1.000 0.994
7 0.997 0.989 0..991 0.995
10 0.997 0.994 1.000 0.997
The results vary with the change in the number
of folds(k). In terms of accuracy, its value shows an
increasing trend as the number of folds increases,
and the value remains relatively stable after the
number of folds reaches 7. In terms of precision,
when there are more and more folds, the value of the
metric fluctuates, and it reaches the maximum value
when there are 5 folds or 10 folds. In terms of recall,
the case is similar, which means that no obvious
monotonous trend is found and the best scenario is
when the number of folds is 5 or 10. In terms of F1
score, more folds can contribute to a higher score,
and the highest F1 score in the experiments is 0.997,
when there are 10 folds. As is seen in the definition
of cross validation, more folds mean a higher
proportion of training set, which makes the model
learn more sufficiently, often leading to better
performance. Sometimes there is a decline in some
metrics when the number of folds increases, which
is possibly due to overfitting. Overall, the values of
metrics calculated in the experiments are greater
than 0.97, which shows excellent performance.
These results show that the method for predicting
heart disease in this paper is really effective.
4 CONCLUSION
In this work, a prediction system centered on the
random forest algorithm is implemented to realize
high-accuracy prediction of heart disease. A heart
disease dataset including physical indicators of 1025
people and whether they are suffering from heart
disease is selected from Kaggle to complete the
research. The original data are preprocessed and
dimensionality reduction is done with PCA. The
core prediction model is trained based on random
forest and the parameters are adjusted with the grid
search method. In the evaluation experiment,
percentage split and cross validation are applied to
test the model. According to the results, the
prediction method is proven to be quite effective,
with all the metrics greater than 0.9 in all the
experiments. It is concluded that random forest is a
very promising technology with great potential for
application in heart disease prediction.
In the future, it is very promising to further
develop this research work into an intelligent
diagnostic system. By connecting with conventional
medical examination equipment, new real-time data
can be imported into the system, which can
automatically make predictions. What is more,
although the method using random forest performs
well on the dataset selected, there is still room for
improvement. The prediction effect may be further
improved if some other supervised learning
algorithms are added through ensemble learning.
REFERENCES
F. B. Ahmad, R. N. Anderson, “The leading causes of death
in the US for 2020,” Jama, vol. 325, no. 18, pp. 1829-
1830, 2021.
K. L. Ho, Q. G. Karwi, D. Connolly, et al, “Metabolic,
structural and biochemical changes in diabetes and the
development of heart failure,” Diabetologia, vol. 65,
no. 3, pp. 411-423, 2022.
M. Djerioui, Y. Brik, M. Ladjal, et al, “Neighborhood
component analysis and support vector machines for
heart disease prediction,” Ingénierie des Systèmes d
Inf., vol. 24, no. 6, pp. 591-595, 2019.
M. T. Islam, S. R. Rafa, M. G. Kibria, “Early prediction of
heart disease using PCA and hybrid genetic algorithm
with k-means,” in 2020 23rd International Conference
on Computer and Information Technology (ICCIT).
IEEE, 2020, pp. 1-6.
A. N. Repaka, S. D. Ravikanti, R. G. Franklin, “Design and
implementing heart disease prediction using naives
Bayesian,” in 2019 3rd International conference on
trends in electronics and informatics (ICOEI). IEEE,
2019, pp. 292-297.
R. Ahmed, M. Bibi, S. Syed, “Improving Heart Disease
Prediction Accuracy Using a Hybrid Machine
Learning Approach: A Comparative study of SVM
and KNN Algorithms,” International Journal of
Computations, Information and Manufacturing
(IJCIM), vol. 3, no. 1, pp. 49-54, 2023.
L. Ali, A. Rahman, A. Khan, et al, “An automated diagnostic
system for heart disease prediction based on χ2
statistical model and optimally configured deep neural
network,” Ieee Access, vol. 7, pp. 34938-34945, 2019.
S. P. Patro, N. Padhy, R. D. Sah, “An Ensemble Approach
for Prediction of Cardiovascular Disease Using Meta
Classifier Boosting Algorithms,” International Journal
of Data Warehousing and Mining (IJDWM), vol. 18,
no. 1, pp. 1-29, 2022.
A. E. Ulloa-Cerna, L. Jing, J. M. Pfeifer, et al,
“RECHOmmend: an ECG-based machine learning