logistic function transforms linear combinations of
input variables into probabilities.
In LR, the input variables are multiplied by
regression coefficients (weights) and added to a
constant term (intercept) to create a linear predictor.
This linear prediction is then converted using a
Boolean function to generate a probability value
between 0 and 1. The output of the logistic regression
model is the predicted probability of an event
occurring for a given input.
One of the key advantages of LR is its simplicity
and interpretability. The model can be expressed as a
set of linear equations, making it easy to understand
and visualize. Additionally, LR is robust to outliers
and can handle both binary and continuous response
variables.
In summary, LR is a simple and interpretable
statistical method for binary classification problems.
It transforms linear combinations of input variables
into probabilities using the logistic function and can
handle various types of response variables.
2.2.3 DT
DT is a flowchart-like tree structure, where rectangles
represent each inner node and ellipses represent
individual end nodes. DT can be applied sequentially
or in parallel, depending on the amount of
information, the position of the available memory in
computing resources, and the measurement of the
algorithm (Priyam et al. 2013).
In DT training, the DTs are retrieved from
identified learning cases, represented by pipes with
the value of attributes and layers markers. Destination
tree training usually starts with an empty tree with all
the learning information. This is a recursive process
from top to bottom. Select attributes, find the data
attributes that can best format the part, use them as
the root attribute, and then divide the training data
into non-overlapping subsets corresponding to the
values of split attributes.
DT has many interesting features, such as
simplicity, ease of knowledge, and the ability to
handle mixed data types with some freedom. This
makes DT-Learning one of the most successful
learning algorithms among machine learning
algorithms today (Song & Ying 2015). Compared
with other classification methods, the construction
speed of decision trees is relatively fast. Trees can be
easily converted into SQL statements for efficient
access to databases. Compared with other
classification methods, decision tree classifiers
achieve similar, sometimes even better accuracy
(LaValley 2018).
2.2.4 RF
RF technology is a regression tree technology that
allows users to control aggressive and predictable
plantings with high predictive accuracy. Biermann’s
RF uses randomization to generate many DTs.
Combine the production of these plants into a single
output, either by voting on classification problems or
mean regression problems.
There are two ways to randomize. Firstly, perform
replacement sampling (bootstrap sampling) on the
dataset. The process of aggregating new samples in
this way is called "guided aggregation" or "bagging".
It cannot be guaranteed that every subject will appear
in the new sample, and some subjects may appear
more than once. In a big dataset with n subjects, the
probability of being excluded from bootstrap samples
of size n converges to 1/e or approximately 37%.
These omitted or "out of the box" subjects constitute
a useful set of data for testing decision trees
developed from sample subjects.
The second randomization occurs at the decision
node. At each node, a certain number of predictors
will be selected. For a set with p predictive factors, a
typical number is the rounded square root of p -
although this parameter can be chosen by analysts.
Then, the algorithm tests all possible thresholds for
all selected variables and selects the combination of
variable thresholds that produces the best
segmentation - for example, the segmentation that
most effectively separates cases from controls. This
random selection of variables and threshold testing
will continue until reaching a "pure" node (containing
only cases or controls) or some other predefined
endpoint. Repeat the entire tree growth process
(usually 100 to 1000 times) to grow an RF.
The biggest advantage of RF is that their
interactions or nonlinear properties do not need to be
specified in advance as required by other parameter
survival models (Biau & Scornet 2016).
2.2.5 GBDT
GBDT is a combination of a gradient enhancement
algorithm and a decision tree algorithm. Weak
students who choose GBDT are an important tree for
optimizing the loss function. Boosting, a technique
that combines and creates weak learners through an
iterative approach to strong learners, was chosen as
the primary blended learning method of GBDT. The
gradient pulse algorithm differs from other pulse
methods in that it updates the loss and gradient
functions to complete the learning process. The
decision tree algorithm has a tree structure that