2 RELATED WORKS
Over the last 10 years, more number of people died
because of heart disease (Jaymin Patel et al. 2016).
The researchers analyse the classification tree
techniques in data mining. WEKA tool is an open
source software tool used by the researchers for this
research work. This software consists of the various
machine learning algorithms for Data Mining
applications. The objective of this research was to
compare different classification techniques perform
for a heart disease dataset. The J48 algorithm, logistic
model trees algorithm, and Random Forest algorithm
were used to perform the classification on UC Irvine
(UCI) repository. The J48 achieved train error
0.1423221 and test error 0.1666667, logistic model
tree algorithm achieved train error 0.1656716 and test
error 0.237931, and random forest algorithm
achieved train error 0 and test error 0.2. The J48
technique turned out as the best classifier for
predicting heart disease. The building time of this
algorithm was much less and achieved higher
accuracy.
Another research work was published using
machine learning algorithms (Prerana et al., 2015).
They elaborated the research work in five sections,
first one described the theoretical knowledge about
reducing attributes from dataset, second was the
implementation of the machine learning algorithms
Naïve Bayes and PAC Algorithm for predicting heart
disease, big data was processed using Hadoop Map-
Reduce programming in the third section, in the
fourth section deployment of centralized system,
happened on cloud platform and conclusion along
with the future scope came in the fifth phase. The UCI
dataset was considered for the experiment and 13
attributes were involved in the experiment. As an
input, the big data file containing patient records was
used and the dataset fed into the classification model.
Two models were used namely Naïve Bayes
Classifier and Probabilistic Analysis and
Classification. These algorithms were implemented
to determine the heart disease risk and then the
comparison was made in the form of graph. The
experimental analysis revealed that the Naivebayes
continuous variable achieved accuracy 89.80%,
Naivebayes Discrete variable achieved 95.21%
accuracy, and Probabilistic Analysis achieved better
accuracy 97.48%.
Heart disease prediction system was introduced
with different classifier techniques (Sonam Nikhar et
al., 2015). This article focus on analysis of algorithms
comparing accuracy of the algorithms. The
techniques used were ID3 decision tree algorithm,
Naïve Bayes classifier and K-means clustering.
Decision tree handles missing values and removes
outliers. The decision tree can be built even the data
is not cleaned. Main disadvantage of ID3 algorithms
is over-fitting and difficult to implement. The Naïve-
Bayesian classifier considers the variables as
independent variable and predicts without proper
relation cases. K-means clustering clusters dataset on
nearest-neighbor principle with the help of data
similarity. They used R tool for the experiment. They
observed decision trees produces inaccurate results,
Naïve Bayes results accurate if the data is cleaned and
maintained well. The ID3 can clean dataset but unable
produce accurate results. But combination of Naïve
Bayes and K-means produces accuracy results.
This article focused on different algorithms,
where combinations of several target attributes were
predicted (K Srinivas et al., 2011). Effective heart
attack prediction methods were presented using data
mining techniques. The authors have provided an
efficient approach to extract the significant patterns
from the data warehouse of heart disease to predict
the heart attack efficiently based on calculated
significant weightage. Those patterns were frequent
and having the value greater than predefined
threshold were selected. The study used National
Behavioural Risk Factor Surveillance System
(BRFSS) data to assess CVD rates. This research was
performed in coal mining areas after and before the
control for individual level covariates. This includes
smoking, obesity, alcohol consumption, and others.
They tested the hypothesis checked that CVD rates
will be significantly elevated around the coal mining
region residents after controlling for covariates.
In this article (Niti Guru et al., 2007), authors
proposed a system based on neural networks. Further,
it is trained using Back Propagation algorithm. The
system proposed was trained for 78 patient’s records.
The doctor provided the patient data then the system
generated a list of all possible diseases from which
the patient may suffering. This system assists doctor
to avoid human mistakes.
This article (Sellappan Palaniappan, 2008)
proposed a prototype Intelligent Heart Disease
Prediction System (IHDPS) using Decision Tree,
Naïve Bayes, and Neural Network models. It had six
major phases: business understanding, data
understanding, data preparation, modeling,
evaluation, and deployment. Two data mining
classification modelling techniques were used in
developing this system. DMX query language and
functions were used for building and accessing the
models. They used for model training, model
creation, prediction, and content access of the model.
Prediction and Classification of Heart Disease using AML and Power BI
509