variables as input to forecast the chance of a desired
result. Logistic regression has similarities to the
principle of multiple linear regression, which first
determines the best-fitting regression line to represent
the connection between the independent variable (x)
and the dependent variable (y), the regression line
(Wang, 2022). The model form is related to the linear
equation. If the independent variable is a data set, the
equation can be shown as a matrix:
baxy +=
bxaxaxaxaz
nn
+⋅++⋅+⋅+⋅=
)()()3()3()2()2()1()1(
(1)
Logistic regression functions the linear equation
corresponding to a state p, determining the size of the
dependent variable based on the values of p and 1-p.
The dependent variables of logistic regression can be
dichotomous or multiple classifications, and the
independent variables can be continuous or discrete.
There are three types of logistic regression:
ordinal, multinomial, and binary.
The results of this paper are divided into obesity
and non-obesity after data processing, so binomial
logistic regression is adopted. The dependent variable
of binomial logistic regression is essentially a
dichotomy method, that is, there are only 0 or 1 results,
and the probability distribution is as follows:
ee
e
xx
x
w
xYP
w
w
xYP
TT
T
⋅⋅
⋅
+
==
+
==
1
1
)|0(
1
)|1( ,
(2)
2.2 Decision Tree
The decision tree technique is a prediction technique
for creating target variables or a categorization
scheme based on several variables. This algorithm
can effectively handle large data sets. Common uses
of the decision tree model include variable selection,
evaluation of variable importance, processing of
missing values, and prediction (Song, 2015). The
duality of the results enables a good application of
decision trees for obesity prediction in this
experimental study.
The main parts of the decision tree model are the
nodes and branches and the construction of the model
includes splitting, stopping and pruning (Song, 2015).
The nodes of the decision tree can be divided into
three types: (1) root nodes, also called decision nodes.
(2) Internal nodes. (3) The decision tree's ultimate
outcome is represented by the leaf node, sometimes
referred to as the end node.
The decision tree is a continuous model that
combines a series of tests and compares the feature
values in each test to the threshold value(Navada,
Ansari, Patil, 2011). Each node in the decision tree
corresponds to an analysis of data properties, The
decision tree model links the dataset's observations to
the conclusions(Sharma, Kumar, 2016) of the
pertinent target values, with each branch denoting the
analysis's findings.
2.3 Random Forest
The classification and regression tree model is further
improved by the random forest technique, which is
composed of a large number of decision trees created
by randomization, which can be used for prediction
once constructed. Because the validity of the decision
tree for binary classification applies to the prediction
of obesity, the random forest model was adopted by
this experiment. The average of the outputs of a
random forest with several decision trees is
aggregated into a single output of reference (Rigatti,
2017). Formally, a model made up of several random
base regression trees is called a random forest {rn (x,
Θ m, Dn), m 1}, where Θ 1, Θ 2,... Is the output of the
random variable, Θ. Combining these random trees
forms an aggregate regression estimate of research
(Biau, 2016 ). The formula is as follows:
)()(
[]
nnn
n
DXDX ,,, ΘΕ=
Θ
γ
γ
(3)
Based on each decision tree produces a result
based on the input data; and the final output of the
random forest is the paper obtained after integrating
the multiple results (Liu, 2014).
The advantage of random forest is that it can find
the interaction between the predicted variables and
the non-linear relationship, but it is difficult to judge
which variables have a greater impact on the
prediction results.
2.4 GBDT Model
GBDT, whose full name is a gradient-lifting decision
tree, is an iterative decision tree algorithm that is also
applicable to the binary characteristics of this study.
To obtain the ultimate prediction outcome, the model
aggregates the outcomes of several decision trees.
The next weak classifier fits the residual function of
the predicted value, which is the difference between
the predicted value and the true value, and the
principle adds the results of all the weak classifiers to
the predicted value. The decision tree is the common
learner in the GBDT model, which is an integrated
learning model (Zhang, 2021 ).