The experiment conducted by Adedayo Ogunpola et
al. can serve as a potential comparison of the effect of
difference in sample size and database (Ogunpola,
2024). They encompass a range of actions such as the
management of missing data, encoding of variables,
normalizing values of different features, and
separation of datasets into training and testing groups.
The sample size and database are different from the
previous two experiments, retrieved from Mendeley
(blue) and Kaggle (orange). The result is shown in
Figure 5.
Figure 5: Graphical illustration of performance of various
ML methods by Adedayo Ogunpola et al (Ogunpola, 2024).
The RF method in this experiment is 98.63% and
94.44% in precision for the two datasets from
Mendeley and Cleveland respectively. This precision
is seemingly higher than that obtained from the two
previous experiments. What’s more, the results
obtained from the two databases are also different for
all of the ML methods used in this report. The
Mendeley database achieves a significantly higher
precision than that of the Kaggle database.
From the results above, it is safe to conclude that
there are already several ML methods that possess
satisfactory performance after a statistical view of the
outcome. However, it is also clear that the
performances are influenced largely by the choice of
database, attributes, sample sizes and other details on
implementations of ML method. There are also
several different statistical parameters describing the
performance of ML methods. This provides better
descriptions but also makes the comparison even
harder. What’s more, even the same ML method
performs differently in different researches discussed
above. Therefore, it is necessary to have systematic
researches into the influence of choices of different
attributes, sample sizes and other details.
4 CONCLUSIONS
This paper provides an extensive overview of various
AI-based methods employed for predicting CVDs,
particularly focusing on RF, NB, and LVQ. A notable
disparity is observed among the relevant literature,
highlighting the significant impact of data
preprocessing methods on ML outcomes, including
feature selection and handling of missing data.
Additionally, sample size and database selection also
play pivotal roles in influencing ML performance.
The HRFLM method presents a promising
advancement over RF, while LVQ has shown
superior outcomes compared to RF in one study by
Saravanan Srinivasan et al., although RF performed
better in other reports. Consequently, distinguishing
the performance differences between these ML
methods with high confidence remains challenging.
Nevertheless, collectively, these ML techniques
demonstrate effectiveness in facilitating CVD
predictions. Despite the abundance of literature
guiding the application of ML methods in CVD
predictions, substantial gaps persist in data
preprocessing, sample sizes, and database utilization.
Hence, future research endeavors should delve deeper
into exploring the impact of each factor. Additionally,
it is encouraged that researchers conduct
reproductions based on existing literature to further
enhance understanding in this domain.
REFERENCES
Altman, D. G., & Bland, J. M. (1994). Diagnostic tests. 1:
Sensitivity and specificity. BMJ: British Medical
Journal, 308(6943), 1552.
Anderies, A., Tchin, J. A. R. W., Putro, P. H., Darmawan,
Y. P., & Gunawan, A. A. S. (2022). Prediction of heart
disease UCI dataset using machine learning algorithms.
Engineering, MAthematics and Computer Science
Journal (EMACS), 4(3), 87-93.
CHERNGS. (2020). Heart Disease Cleveland UCI.
https://www.kaggle.com/datasets/cherngs/heart-
disease-cleveland-uci
Damen, J. A., Hooft, L., Schuit, E., Debray, T. P., Collins,
G. S., Tzoulaki, I., ... & Moons, K. G. (2016).
Prediction models for cardiovascular disease risk in the
general population: systematic review. bmj, 353.
Gaziano, T., Reddy, K. S., Paccaud, F., Horton, S., &
Chaturvedi, V. (2006). Cardiovascular disease. Disease
Control Priorities in Developing Countries. 2nd edition.
Gour, S., Panwar, P., Dwivedi, D., & Mali, C. (2022). A
machine learning approach for heart attack prediction.
In Intelligent Sustainable Systems: Selected Papers of
WorldS4 2021, Volume 1 (pp. 741-747). Springer
Singapore.