Prediction of Heart Attack on Random Forest and Logistic Regression
Xinyi Huang
Applied Statistics, Northeastern University at Qinhuangdao, Qinhuangdao, China
Keywords: Heart Attack Prediction, Random Forest, Logistic Regression.
Abstract: The heart is one of the most important organs. There are various kinds of heart diseases, with coronary artery
disease (CAD) being the most prevalent ones. CAD refers to a group of diseases caused by the damage to the
coronary arteries that supply oxygen and blood, resulting in ischemia, hypoxia or necrosis. About 2% of the
world's population suffers from CAD, which causes 17.5 million deaths worldwide every year. Therefore,
monitoring and prevention of heart disease is essential. The study uses two different algorithms, Random
Forest Classifier and Logistic Regression to establish models, with respective accuracies of 0.9798 and 0.8965.
RF is found to be more effective than LR in this small classification problem. Future research should focus
on integrating larger, more diverse datasets and introducing other advanced machine learning algorithms in
conjunction with the RF algorithm to explore hidden patterns in the data. These models can help predict the
risk of heart disease.
1 INTRODUCTION
Trans fats have been added to a wide variety of foods.
However, trans fats will increase Low-Density
Lipoprotein cholesterol, which is a compound that
clogs arteries and contributes to heart attacks
(Ganguly and Pierce 2015). According to newly
released data from the World Health Organization,
due to COVID-19, individuals with pre-existing heart
disease have a higher risk of suffering from severe
illness or death. Therefore, predicting heart disease is
important for both individuals and society. The
traditional methods of detecting heart disease are
expensive and time-consuming. Owing to the
application of machine learning in the medical field,
machine learning algorithms can autonomously learn
and extract valuable information and patterns from
historical data, enabling automated decision-making
and prediction (Litjens et al 2017).
It has been discovered that some factors
contributing to a heart attack can be detected through
daily measurements and blood checks. Therefore, this
paper uses machine learning to effectively predict
heart attacks. The selection of suitable machine
learning algorithms based on the dataset
characteristics is crucial for achieving accurate results.
Harshit Jindal employed the K-nearest neighbors
(KNN) algorithm as the primary model to improve the
accuracy of the model based on a collected dataset of
medical attributes. Additionally, Ashir Javeed
developed a novel approach called Random Search
Algorithm - Random Force (RSA-RF) model, which
effectively discovers the optimal subset of features
and reduces the feature dimensionality, thus reducing
the time complexity.
Machine learning techniques are widely used in
predicting heart diseases, with methods such as
Support Vector Machines (SVM) (Raju et al 2018),
Random Forest Classifier (RF) and Logistic
Regression (LR) receiving significant attention.
RF is an ensemble learning method known for its
high accuracy and robustness. The construction of a
RF involves two stages: training and prediction.
During the training stage, multiple decision trees are
trained by randomly sampling data instances and
selecting subsets of features (Speiser et al 2019). Each
decision tree is generated based on different subsets of
data and features. In the prediction stage, input
samples are classified through each decision tree, and
the final classification result is determined by voting
or averaging the outcomes (Breiman 1984). However,
the predictive performance of RF may sometimes fall
short of expectations. To achieve optimal
performance, the number of trees can be adjusted
using cross-validation techniques.
LR is a common binary classification algorithm
used to predict the probability of an event occurring. It
transforms the input variables through a sigmoid
function into output variables, mapping continuous
real values to probability values between 0 and 1,
466
Huang, X.
Prediction of Heart Attack on Random Forest and Logistic Regression.
DOI: 10.5220/0012816000003885
Paper published under CC license (CC BY-NC-ND 4.0)
In Proceedings of the 1st International Conference on Data Analysis and Machine Learning (DAML 2023), pages 466-470
ISBN: 978-989-758-705-4
Proceedings Copyright © 2024 by SCITEPRESS – Science and Technology Publications, Lda.