Joint SMOTE and Random Forest for Heart Disease Prediction and Characterization
Yi Lu
2023
Abstract
For decades, heart diseases have remained the primary global cause of mortality. Consequently, comprehending the influential elements and forecasting the onset of cardiovascular conditions is imperative, enabling individuals to proactively preserve their well-being. The primary goal of this study is to forecast the occurrence of heart disease while exploring the influencing factors associated with it. The study is conducted on the Personal Key Indicators of Heart Disease dataset from Kaggle. Following the completion of exploratory data analysis (EDA), the research tackles the problem of an uneven distribution of data by integrating the Synthetic Minority Over-sampling Technique (SMOTE) approach into the initial Random Forest (RF) model. Notably, the resultant model achieves commendable performance metrics, boasting an accuracy of 93.39%, precision of 94.25%, recall of 92.42%, and an F1 score of 93.33%. Through the RF feature analysis, it is revealed that Body Mass Index (BMI), overall health status, and age are the top three influential features significantly impacting the model’s predictive performance. This finding provides valuable guidance for heart disease prevention efforts, aiding in the development of more precise intervention measures targeting individual risk factors.
DownloadPaper Citation
in Harvard Style
Lu Y. (2023). Joint SMOTE and Random Forest for Heart Disease Prediction and Characterization. In Proceedings of the 1st International Conference on Data Analysis and Machine Learning - Volume 1: DAML; ISBN 978-989-758-705-4, SciTePress, pages 127-132. DOI: 10.5220/0012816100003885
in Bibtex Style
@conference{daml23,
author={Yi Lu},
title={Joint SMOTE and Random Forest for Heart Disease Prediction and Characterization},
booktitle={Proceedings of the 1st International Conference on Data Analysis and Machine Learning - Volume 1: DAML},
year={2023},
pages={127-132},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012816100003885},
isbn={978-989-758-705-4},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 1st International Conference on Data Analysis and Machine Learning - Volume 1: DAML
TI - Joint SMOTE and Random Forest for Heart Disease Prediction and Characterization
SN - 978-989-758-705-4
AU - Lu Y.
PY - 2023
SP - 127
EP - 132
DO - 10.5220/0012816100003885
PB - SciTePress