generated and construct a probabilistic model that
incorporates these assumptions. They use a set of
classified training examples to estimate the model
parameters. Classification in the new examples is
done with the Bayes rule by selecting the class that is
most likely to have generated that example
(McCallum and Nigam, 1998).
The Naive Bayes is a probabilistic algorithm
based on Bayes’ theorem and is the simplest classifier
of these algorithms since it is assumed that all
attributes are independent given the class context.
Although this assumption is false in most real-world
data, this classifier performs well most of the time.
Thanks to this assumption, the parameters for each
attribute can be learned separately, and thus there is a
simplification of learning, especially with many
attributes.
This method works with several probabilities for
each class. These probabilities are reflected in the
conditioned probability of each value of the attribute
given to the class, as well as the probability of the
class (Langley, Iba, and Thompson, 1992).
2.5.2 Random Forest
Random Forest is a supervised learning algorithm,
and as the name implies it creates a forest and makes
it somehow random. The “forest” is an ensemble of
Decision Trees (Loh and Shin, 1997), most of the
time trained with the “bagging” method. The general
idea of this method is that a combination of learning
models increases the overall result. In a simple way,
this algorithm builds multiple decision trees and
merges them together to get a more accurate and
stable prediction.
This method adds additional randomness to the
model, while growing trees. Instead of searching for
the most import feature while splitting a node, it
searches for the best feature among a random subset
of features. This results in a wide diversity that
generally results in a better model.
This algorithm is a collection of Decision Trees
but exist some differences. If we input a training
dataset with features and labels into a decision tree, it
will formulate some rules, which will be used to make
the predictions. In comparison, the Random Forest
randomly selects observations and features to build
several decision trees and then averages the results.
One of the vantages of this algorithm is that it
prevents overfitting (in a simple way, it is when a
model learns too much noise) (Technopedia, n.d.)
most of the time, by creating random subsets of the
features and building smaller trees using them.
Afterwards, it combines the subtrees. With decision
trees, the more we increase the depth of the tree the
more likely there is to be overfitting (Donges, 2018).
2.5.3 Logistic Regression
Logistic regression is used in classification problems
in which the attributes are numerical, it is an
adaptation of linear regression methods. Considering
a dataset where the target is a binary categorical
variable, the value of 0 and 1 is given to each of the
categories respectively, and instead of the regression
executing the response directly, it executes the
probability that the response belongs to a category (0
or 1).
If the model is done following the linear
regression approach, the attributes that have values
close to zero will have a negative probability and if
they have very high values the probability will exceed
the value 1 (James, Witten, Hastie, and Tibshirani,
2013). These predictions are not correct because a
true probability, regardless of the value of the
attribute, must be between 0 and 1. Whenever a
straight line is fitted to a binary response that is coded
as 0 or 1, it always be possible to predict p(X) < 0 and
p(X) > 1 at the outset (unless the X range is limited).
To avoid this problem, one must make the
probability model using a function that provides
outputs between 0 and 1 for all values of X.
2.5.4 Support Vector Machine
The objective of the support vector machine
algorithm is to find a hyperplane (decision boundaries
that help classify the data points) in an N-dimensional
space, when the N is the number of features, that
distinctly classifies the data points. To separate the
two classes of data points, there are many possible
hyperplanes that could be chosen. The main objective
is to find a plane that has the maximum distance
between data points of both classes. Therefore, it is
possible to provide some reinforcement so that future
data points can be classified with more confidence.
Support Vectors are data points that are closer to
the hyperplane and influence the position and
orientation of the hyperplane, using this support
vectors it is possible to maximize the margin of the
classifier. Hyperplanes and support vectors are the
core for building an SVM algorithm (Gandhi, 2018).
2.6 Evaluation Metrics
Finally, it is necessary to evaluate the performance of
the model created. For this, there are several
evaluation metrics (
Sunasra, 2017):