placement. Through this systematic approach,
enterprises can gain a more precise understanding of
advertising effectiveness, optimize advertising
strategies, increase ad click-through rates, and
achieve more effective advertising placement.
2.2.1 Logistic Regression
In handling binary classification tasks, it is Logistic
regression that is widely used. Although its name
contains the word "regression," it is actually a
classification algorithm specifically designed to
predict the probabilities of outcomes, which are
typically between 0 and 1. The algorithm uses logical
functions to map linear combinations of features into
probability Spaces. Then performs binary
classification by setting thresholds. Logistic
regression is simple and efficient, so it is often used
as a baseline model for classification problems, or for
ranking the importance of features.
2.2.2 Decision Tree
Decision Trees are tree-based classification
algorithms that recursively split the data into different
nodes based on feature selection. The hierarchical
structure is formed by nodes in the tree, where each
node represents a decision stemming from a specific
feature value. Decision Trees are simple to
understand and explain, but they can suffer from
overfitting.
Firstly, information entropy serves as a widely
used metric for assessing the purity of a sample set,
where smaller values indicate higher purity in the
dataset. The ratio of the k-th class of samples in the
set D, denoted as pk, further contributes to this
measure.
Secondly, information gain comes into play by
considering the varied sample sizes in different
branch nodes and assigning weights accordingly. The
significance of a discrete attribute 'a' with V possible
values is emphasized, and a higher information gain
is favored, highlighting the importance of attributes
in the decision tree, as:
𝑒𝑛𝑡(𝐷) = −
∑
𝑝
𝑙𝑜𝑔
𝑝
(1)
Moving on, the gain rate criterion introduces a
preference for attributes with fewer values, in contrast
to the information gain criterion. The C4.5 algorithm
uses a strategic approach that does not directly select
the candidate partition attribute with the highest gain
rate. Instead, the algorithm first propagates those
properties that have a higher-than-average
information gain for candidate partition properties,
and then selects the properties with the highest gain
rate from them, as:
𝐺𝑎𝑖𝑛
(
𝐷,𝑎
)
=𝑒𝑛𝑡
(
𝐷
)
−
∑
|
|
|
|
𝑒𝑛𝑡
(
𝐷
)
(2)
In conclusion, the Gini Index offers an evaluation
of the data's purity by precisely determining the
likelihood that two randomly chosen samples from
dataset D would have contradictory class labels. A
lower Gini (D) score suggests that the dataset is purer.
The attribute that minimizes the Gini coefficient is
chosen after partitioning, resulting in the
identification of the ideal partitioning attribute. This
is why the Gini coefficient of the attribute "a" is
significant in the context of attribute partitioning,
𝐺𝑎𝑖𝑛
(
,
)
=
(,)
()
(3)
𝐼𝑉
(
𝑎
)
=−
∑
|
|
𝑙𝑜𝑔
|
|
|
|
(4)
According to attribute a's Gini index, the optimum
partitioning attribute will be identified by minimizing
the Gini index following partitioning, as:
𝐺𝑖𝑛𝑖
(
𝐷
)
=1−
∑
𝑝
|
|
(5)
𝐺𝑖𝑛𝑖
(
,
)
=
∑
|
|
𝐺𝑖𝑛𝑖(𝐷
) (6)
2.2.3 Random Forest
An ensemble learning technique called random forest
uses many decision trees. By randomly selecting
features and data samples, the random forest
constructs several decision trees., and then combines
the results through averaging or voting to enhance
accuracy and robustness. Random Forest is suitable
for large datasets and performs well with high-
dimensional data and numerous features.
Here are the specific steps for a random forest.
Firstly, problems with regression and classification
can be resolved with random forests. In classification,
the input consists of a training dataset with labels,
while in regression, it includes a training dataset with
target variables. Secondly, bootstrapped sampling
involves randomly selecting samples from the
training data with replacement, creating a new dataset
of the same size as the original. This process allows
some samples to appear multiple times in the new
dataset, while others may not appear. Thirdly, random
feature selection occurs at each decision tree node,
where a subset of features is randomly chosen for
splitting. This ensures that each decision tree does not
overly rely on specific features, adding diversity to
the model. Next, as the base learner, decision trees are