advancing the clinical identification of individuals at
high risk of developing CHD, thus facilitating early
intervention strategies. This paper offers a
comprehensive analysis, aiming to enrich the current
understanding in medical data science and set the
groundwork for subsequent research and practical
strategies to address CHD. In summary, this research
is not merely an academic study but a timely
intervention in understanding and predicting a disease
that has widespread implications for public health.
2 RELATED WORK
Ramya G. Franklin and B. Muthukumar introduced a
detailed framework for analyzing cardiovascular
conditions through advanced analytic approaches
(Franklin and Muthukumar 2020). Their multi-staged
approach not only aimed at effective early diagnosis
by examining various risk parameters but also ensured
data security through Advanced Encryption Standard
(AES). A. Lakshmanarao, A. Srisaila, and T. Srinivasa
Ravi Kiran addressed the pressing global issue of
cardiovascular diseases. The authors introduced an
ensemble classifier model specifically designed for
heart disease prediction, utilizing two different
datasets from Kaggle and UCI for validation
(Lakshmanarao et al 2021). The study suggested that
the ensemble model notably outperformed existing
solutions. Priyanka Gupta and D.D. Seth focused on
the crucial task of early detection of Cardiovascular
Diseases (CVDs) (Gupta and Seth 2022). To this end,
the authors explored the efficacy of various Machine
Learning classifiers. Finally, the research developed a
system aimed to streamline medical care by saving
physicians' time and reducing treatment costs.
Meghavi Rana, Mohammad Zia Ur Rehman, and
Srishti Jain leveraged the burgeoning amount of
medical data to utilize artificial intelligence techniques
for analyzing respiratory conditions (Rana et al 2022).
The authors noted the importance of their work for
medical practitioners and researchers seeking to
predict heart disease based on a patient's age.
Ignatious K Pious, K Antony Kumar, Y.Cephas
Soulwin, and E.Nipun Reddy addressed the pressing
issue of heart disease (Pious et al 2022). The authors
concluded that early diagnosis was crucial in
managing heart disease, which had been exacerbated
by today's sedentary lifestyles and stress. Reldean
Williams, Thokozani Shongwe, Ali N. Hasan, and
Vikash Rameshar focused on the critical global health
issue of heart diseases (Williams et al 2021). The paper
underscored the vital role of early prediction in
enabling preventative measures and suggested that
incorporating additional variables like family history
could further improve model performance. A.
Ordonez focused on leveraging association rules for
forecasting cardiovascular disorders (Ordonez 2006).
The paper addressed two primary challenges: the
generation of an excessive number of medically
irrelevant rules and the lack of validation on an
independent test set. The study confirmed that the use
of search constraints and validation techniques
significantly reduced the number of irrelevant or
poorly generalizing rules, providing a set of high-
accuracy predictive rules.
Xiaoming Yuan, Jiahui Chen, Kuan Zhang, Yuan
Wu, and Tingting Yang addressed the limitations of
existing heart disease prediction models, which often
only determined the presence of disease but not its
severity (Yuan et al 2022). The authors confirmed that
their Bagging-Fuzzy-GBDT model demonstrated
outstanding accuracy and consistency in not only
detecting the presence but also determining the
severity of heart disease. Senthilkumar Mohan,
Chandrasegar Thirumalai, and Gautam Srivastava
tackled the critical challenge of predicting
cardiovascular diseases, a leading cause of death
globally (Mohan et al 2019). They proposed a novel
machine learning-based model that emphasized
feature selection to improve prediction accuracy.
Azam Mehmood Qadri, Ali Raza, Kashif Munir, and
Mubarak S. Almutairi introduced a novel feature
engineering approach to select the most significant
patient health parameters (Qadri et al 2023). The study
validated the performance of all applied methods
through cross-validation.
3 METHODS
The methodology for predicting the onset of coronary
heart disease (CHD) in this research paper employs a
comprehensive analysis of the Framingham Heart
Study dataset using five machine learning models. The
experimental steps begin with data collection, then
working on data preprocessing, exploratory data
analysis (EDA), feature selection, model training and
evaluation, and finally, comparative analysis. In the
data collection phase, the Framingham dataset is
imported, which consists of a variety of variables such
as age, sex, cholesterol levels, and smoking status,
among others, each contributing differently to CHD
risks. The data preprocessing step involves handling
missing values, either by imputation or deletion, and
normalization or scaling of variables if needed. This
ensures that the dataset is fit for further analysis.