The earliest data using neural networks to predict
hypertension can be traced back to the paper written
by Poli, R. et al., which uses Artificial Neural
Networks (ANN) to explore the prediction effect of
feedforward network models under the construction of
2-layer, 3-3-layer and 6-layer networks (Poli et al
1991). Memory data and diastolic and systolic blood
pressure time throughout the day are used as input
values, and antihypertensive drug doses are used as
output values. Models that explore different levels of
complexity simulate the reasoning of doctors using
different diagnostic modalities (Poli et al 1991).
As mentioned earlier, the data mostly comes from
the subjects' memory data and blood drug
concentrations, which are variables that can be
directly linked to the prediction of high blood pressure.
However, some patients do not know they have
hypertension in real life because they do not take drugs.
The above methods are not very useful to them, so
some scholars have explored how to predict high
blood pressure from other angles.
Nematollahi, M.A. and others have taken a
different approach by focusing on body composition
index to see if it can predict high blood pressure
(Nematollahi et al 2023). The study used more than
ten algorithms in machine learning to classify the data
individually and look for the features most relevant to
high blood pressure among all the features
(Nematollahi et al 2023).
Although many factors are used as inputs in
machine learning, this is unreasonable, and some
factors are not very helpful for predicting
classification, such as the patient's ID number, age and
gender. A few or even one crucial indicator is enough
for doctors to infer whether the patient is ill. Too many
input variables will lead to problems such as
overfitting when the algorithm is used after learning,
and it also wastes the role of doctors in clinical
diagnosis, resulting in a waste of resources (Filho et al
2021).
3 METHOD: NEURAL
NETWORK MODEL
The input, hidden, and output layers are the three
primary components of the Multi-Layer Perceptron
neural network. As seen in Fig. 1, the input layer is the
first column, the output layer is the last, and the hidden
layer is everything in between (Rivas and Montoya
2020). According to the content of the dataset, the
input layer has a total of 13 input variables, that is, 13
factors related to hypertension. The number of hidden
layers is not specified here explicitly. Because
modifying the number of hidden layers is a variable in
the procedure of optimization and has an influence on
the model's accuracy. Finally, the output layer has
only two variables, 0 and 1. Having high blood
pressure is indicated by a score of 0, whereas
hypertension is indicated by a score of 1. Each column
is linked by weights, representing weights; i
represents the ith neuron in the next layer of the
network, j represents the jth neuron in the previous
network, and h represents the weight of the h layer
(Rivas and Montoya 2020).
Figure 1: MLP signal transmission between layers (Picture
credit: Original).
4 RESULT AND DISCUSSION
4.1 Data Set
4.1.1 Introduction to Data Sets
The dataset used in this article is from the Centers for
Disease Control (CDC) and Prevention using BRFSS
Survey Data from 2015 (Hypertension data set 2023).
In the extensive data framework of “Diabetes,
Hypertension and Stroke Prediction”,
hypertension_data.csv file was chosen as the dataset
for this study.
Use Python to read the imported hypertension data
and perform a numerical statistical description of the
stored framework to check whether the values are
missing and whether each feature is reasonably
distributed. From table I, the Sex item in the first row
is only 206058, which is 25 less than the other 206058
items, meaning there is a missing problem with the
data. After performing Python operations, it is found
that the missing items are all sex column data. The
data in the sex column are all 0-1 distributions
(Bernoulli distribution), with 0 representing females
and 1 representing males. The value returned is half of
DAML 2023 - International Conference on Data Analysis and Machine Learning
280