portunities and resources management (Farooq et al.,
2015). Authors in (Nskh et al., 2016) have employed
dimensional reduction technique with a Support Vec-
tor Machine (SVM) classifier for intrusion detection
based on the KDD 99 data sets. Pajouh in (Pajouh
et al., 2016) proposed a similar method, but based on
the NSL-KDD 99 data set, and described a theoretical
approach for determining computational complexity.
Authors in (Fekade et al., 2018) and (Lopez-Martin
et al., 2017) have implemented the IoT data recovery
methods for intrusion detection. Reducing the num-
ber of features within the data set has shown an im-
proved performance. Their scheme was capable of
saving memory requirement among sensors at the ar-
chitecture level. Also, Memos in (Memos et al., 2018)
proposed an algorithm for IoT security.
An enhancement of an AIS algorithm has been
proposed in (Li
´
skiewicz and Textor, 2010) without
generating detectors, and the run time complexity ex-
panded from polynomial to exponential. In (Nskh
et al., 2016), there is neither experimental record for
calculating the computational complexity, nor a the-
oretical description. Most of the previous implemen-
tations were conducted using the oldest KDD 99 data
set that has been regarded as an outdated data sets.
From the literature, only a few researchers tested the
overall records of the KDD 99 data using the AIS al-
gorithm due to the implementation complexity. In this
paper, we focus on reducing the overall computational
cost of running monitoring algorithms using AIS as a
case study. The integrated resource reduction tech-
niques are capable of reducing the required memory
resources and processor running time in an embedded
IoT devices.
3 THEORETICAL
BACKGROUND
Recent development in IoT cyber security and higher
dimensionality of data resulted to the increase in vol-
ume, velocity, and variety requires careful deploy-
ment of feature reduction techniques. Promisingly,
feature reduction method can improve the efficiency
of ML algorithms.
3.1 Artificial Immune System
Computer scientists have been inspired by the bio-
logical systems in developing techniques for solving
problems. Pamukov in (Pamukov and Poulkov, 2017)
applied Negative Selection Algorithm (NSA) from an
AIS for IoT intrusion while, Zhuo in (Zhu et al., 2017)
employed NSA for classification task. This algorithm,
trains a population of antibodies called detectors us-
ing a normal sample from the population. A Real
Value Negative Selection Algorithm (RNSA) gener-
ates random detectors and tests them against the sam-
ple of the self-class for affinity measure. Affinity is
measured based on distances as Euclidean, Manhat-
tan, or Cosine. There is no perfect shape for an an-
tibody as long as it can be implemented; however,
RNSA has been implemented using a hypersphere an-
tibody. In this work, we employed RNSA as the se-
lected AIS algorithm.
In the implementation of RNSA using real value
data sets, it makes sense to view every vector as its
location within the shape space. While working with
the data, each element in a vector corresponds to a
specific feature in the data sets. This makes it easier
to normalize the values in the data within the range of
[0, 1]; thus, each feature vector is now associated with
a point in the shape space. In the case of the RNSA al-
gorithm that handles numerical data, the shape space
(as well as the feature vector values) are continuous.
Formally, Eq. 1 has notated the RNSA.
X = R
d
(1)
where X ∈ {x
1
, x
2
, x
3
, ..., x
d
} is the total sample, R
is the real valued data field, and d is the number of
dimensions. Moreover, Y ∈ {y
1
, y
2
, y
3
, ..., y
n
} repre-
sents the class label of the sample in a space having n
dimension.
3.2 Resource Reduction
Resource reduction is important before passing the
data to a ML algorithm. The rationale is to extract
useful features only from a huge amount of available
data, in order to alleviate over-fitting and noise.
(i) Principal Component Analysis (PCA)
PCA, known as the Karhunen-Loeve, is a statis-
tical procedure that transform an observed set of
possibly correlated variables into a set of values
of linearly uncorrelated variables, called principal
components. The number of decomposed princi-
pal components are fewer than, or equal to, the
original number of variables. The rational for
PCA is to identify the subspace in which the data
clusters. For instance, an n dimensional data ob-
servation might be confined into into an n −1 dis-
tinct principal components. Such capability in
data reduction, while retaining most of the vari-
ation presents in the original data, has made PCA
useful. Hong in (Hoang and Nguyen, 2018) ap-
plied PCA using substantial data sample for IoT
anomaly detection.
SECRYPT 2019 - 16th International Conference on Security and Cryptography
524