guidance. We are giving the input data while
knowing exactly how the output should be. We can
identify two types of data, the first one being
Numerical data which is the most used one, and the
second one the categorical data which contains
characters rather than numbers.
For supervised learning, we have Classification
or regression, which will be later on explained. For
unsupervised learning, we do not supervise the
model; in other words we let the model work on its
own to discover the information. It uses machine
learning algorithms that conclude unlabelled data.
For this type of learning, we have clustering, it finds
patterns and groupings from unlabelled data.
Unsupervised learning has more difficult
algorithms than supervised learning, which is logical
since we know little to no information about the
outcome. With unsupervised learning, we are
looking to perform dimensionality reduction.
As already mentioned, classification and
regression are part of supervised learning.
Regression is about continuous values, mapping the
input to some real number as an output.
Classification is the process of taking an input
and mapping it to an output, which will be some
discreet label. The main goal of classification is to
identify the category or class the new data will fall
under. There are six types of classification
algorithms:
KNN (K-nearest neighbours), also known as lazy
learners, is the laziest algorithm in machine learning;
there is little or no prior knowledge about the
distribution of data.
Decision Tree: Starts with a single node, which
branches into possible outcomes forming additional
nodes that lead into other possibilities, this gives it a
tree-like shape. They can be used to map out an
algorithm that predicts the best choice
mathematically.
Random Forest: Uses many decision trees; each one
is different from the other. When we get new data,
we take the majority vote of the ensemble to get the
result.
Naïve Bayes: Works on the principle of Bayes
Theorem and finds the probability of an event
occurring given the probability of another event that
has already happened.
Support Vector Machine (SVM): The Maximal-
Margin classifier is a hypothetical classifier. In
SVM, a hyperplane is selected to best separate the
points in the input variable space by their class.[5]
Logistic Regression: Input values are combined
linearly using weights or coefficient values to
predict an output different from linear regression
because the output being modelled is binary instead
of continuous.
We can take as an example Intrusion detection
systems using a hybrid system DSSVM, meaning
the use of both SVM (Support Vector Machine) and
the distance sum, this approach has been discussed
by Chun Guo, Yajian Zhou, Yuan Ping, Zhongkun
Zhang, Guole Liu and Yixian Yang in the paper A
distance sum-based hybrid method for intrusion
detection. DSSVM is based on integrating two
techniques; the first is used to optimize the learning
performance, and the second to predict. For the
implementation part, they used the KDD’99 dataset
to demonstrate that the detection rate using K-NN is
lower than SVM, and they are both lower than the
detection rate of DSSVM. We can also quote the
paper DDoS Attack Detection Based On Neural
Network written by Jin Li, Yong Liu and Lin Gu,
where they proposed a DDOS detection method
using Learning Vector Quantisation Neural Network
(LVQ NN) where they achieved a 99.732%
recognition rate for host anomaly detection and
compared it to BP Neural network with 89.9%
recognition rate. They have set two types of results;
the first category was normal and the second one
attack. They identified five implementing phases:
Data Set collect system, pre-processing Data Set,
Determining the LVQ NN, training System, and
testing system. The results were obtained by redoing
the same thing ten times for both LVQ neural
network and BP neural network to improve the
authenticity of the results.
4 PROPOSED APPROACH
The Principal Component Analyses (PCA) is used to
reduce the number of variables. The main idea of
PCA is to identify patterns in a data set so it can be
transformed into another data set with lower
dimensions without losing any vital information; the
same is done by altering the variables, also known as
Principal Components (PCs) and are orthogonal.
They are ordered so that the retention of variation
present in the original variables decreases as we
move down the order. So, in general, PCA is a tool
used to reduce features and to lower dimensions
while retraining most of the information and finding
patterns in the data of high dimensions. PCAs’ key
advantages are their low voice sensibility, reduce the
need for capacity and memory.
Let us consider a data set of X= [x
1
,x
2
,…,x
n
],
where d is the dimensionality of data and n is the
number of training samples. The covariance matrix