The experiment conducted by Adedayo Ogunpola et 
al. can serve as a potential comparison of the effect of 
difference in sample size and database (Ogunpola, 
2024). They encompass a range of actions such as the 
management of missing data, encoding of variables, 
normalizing values of different features, and 
separation of datasets into training and testing groups. 
The sample size and database are different from the 
previous two experiments, retrieved from Mendeley 
(blue) and Kaggle (orange). The result is shown in 
Figure 5. 
 
 
Figure 5: Graphical illustration of performance of various 
ML methods by Adedayo Ogunpola et al (Ogunpola, 2024). 
The RF method in this experiment is 98.63% and 
94.44% in precision for the two datasets from 
Mendeley and Cleveland respectively. This precision 
is seemingly higher than that obtained from the two 
previous experiments. What’s more, the results 
obtained from the two databases are also different for 
all of the ML methods used in this report. The 
Mendeley database achieves a significantly higher 
precision than that of the Kaggle database.  
From the results above, it is safe to conclude that 
there are already several ML methods that possess 
satisfactory performance after a statistical view of the 
outcome. However, it is also clear that the 
performances are influenced largely by the choice of 
database, attributes, sample sizes and other details on 
implementations of ML method. There are also 
several different statistical parameters describing the 
performance of ML methods. This provides better 
descriptions but also makes the comparison even 
harder. What’s more, even the same ML method 
performs differently in different researches discussed 
above. Therefore, it is necessary to have systematic 
researches into the influence of choices of different 
attributes, sample sizes and other details. 
4 CONCLUSIONS 
This paper provides an extensive overview of various 
AI-based methods employed for predicting CVDs, 
particularly focusing on RF, NB, and LVQ. A notable 
disparity is observed among the relevant literature, 
highlighting the significant impact of data 
preprocessing methods on ML outcomes, including 
feature selection and handling of missing data. 
Additionally, sample size and database selection also 
play pivotal roles in influencing ML performance. 
The HRFLM method presents a promising 
advancement over RF, while LVQ has shown 
superior outcomes compared to RF in one study by 
Saravanan Srinivasan et al., although RF performed 
better in other reports. Consequently, distinguishing 
the performance differences between these ML 
methods with high confidence remains challenging. 
Nevertheless, collectively, these ML techniques 
demonstrate effectiveness in facilitating CVD 
predictions. Despite the abundance of literature 
guiding the application of ML methods in CVD 
predictions, substantial gaps persist in data 
preprocessing, sample sizes, and database utilization. 
Hence, future research endeavors should delve deeper 
into exploring the impact of each factor. Additionally, 
it is encouraged that researchers conduct 
reproductions based on existing literature to further 
enhance understanding in this domain. 
REFERENCES 
Altman, D. G., & Bland, J. M. (1994). Diagnostic tests. 1: 
Sensitivity and specificity. BMJ: British Medical 
Journal, 308(6943), 1552. 
Anderies, A., Tchin, J. A. R. W., Putro, P. H., Darmawan, 
Y. P., & Gunawan, A. A. S. (2022). Prediction of heart 
disease UCI dataset using machine learning algorithms. 
Engineering, MAthematics and Computer Science 
Journal (EMACS), 4(3), 87-93. 
CHERNGS. (2020). Heart Disease Cleveland UCI. 
https://www.kaggle.com/datasets/cherngs/heart-
disease-cleveland-uci 
Damen, J. A., Hooft, L., Schuit, E., Debray, T. P., Collins, 
G. S., Tzoulaki, I., ... & Moons, K. G. (2016). 
Prediction models for cardiovascular disease risk in the 
general population: systematic review. bmj, 353. 
Gaziano, T., Reddy, K. S., Paccaud, F., Horton, S., & 
Chaturvedi, V. (2006). Cardiovascular disease. Disease 
Control Priorities in Developing Countries. 2nd edition. 
Gour, S., Panwar, P., Dwivedi, D., & Mali, C. (2022). A 
machine learning approach for heart attack prediction. 
In Intelligent Sustainable Systems: Selected Papers of 
WorldS4 2021, Volume 1 (pp. 741-747). Springer 
Singapore.