3. Proposing the “HMM-SGD” to model the se-
quence data samples.
The structure of the remaining sections are as fol-
lows:
In section 2, we show related works on insider
threat. In section 3, we explain how we implement
and train our models to detect insiders. Section 4 pro-
vides a brief explanation of the CERT data set. Sec-
tions 5 and 6 present the final results of the two mod-
els along with the evaluation analysis. Section7 pro-
vides a case study similar to the one in (Rashid et al.,
2016). Finally, we briefly wrap up our work with the
work’s limitations and conclusion sections.
2 LITERATURE SURVEY
HMMs have been used with intrusion detection mod-
eling for years. The authors in (Jain and Abouza-
khar, 2012) used HMMs to model TCP network data
from KDD Cup 1999 dataset and proposed their intru-
sion detection system. They used Baum-Welch train-
ing (BWT) to train the model parameters. To evalu-
ate their model, they applied Forward and Backward
algorithms to calculate the likelihood for each sam-
ple. Additionally, the Receiver Operating Character-
istic curves (or ROC curves) were used to measure
the general model effectiveness.Furthermore, the au-
thors in (Lee et al., 2008) proposed a Multi-Stage in-
trusion detection system using HMM. They evaluated
their system by adapting the headmost section data
of the “DARPA 2000 intrusion detection” dataset.
This dataset provides five different stages or scenar-
ios. They applied HMM on each one of these sce-
narios independently to create their multi-stage in-
trusion detection system. The authors in (Rashid
et al., 2016), claim to be the first to adapt the Hid-
den Markov Model to the domain of insiders threats
detection. In addition to their application of using the
original HMM platform, they proposed a new con-
cept of using a moment of inertia with HMM to im-
prove the results’ accuracy. To train and test their
work they used the same CERT division dataset as
in (Bose et al., 2017), but they used an updated ver-
sion r4.2. To evaluate their work, they used the ROC
curve method. Their highest accuracy using original
HMM was 0.797, while their efficiency of using the
proposed approach was 0.829.
3 MACHINE LEARNING BASED
MODELS
In the presented work, we used sequence-based data
samples. Section 4.3 shows how we reformed and
generated our data samples or events sequences. We
modeled the data samples using the Hidden Markov
Model in two different approaches, i.e. the base
HMM and HMM-SGD approach.
3.1 Training of Hidden Markov Model
This section illustrates how we train the proposed ap-
proach. HMM has three parameters that need to be
prepared: initial probability vector (π), transition ma-
trix (A), and emission matrix (B).We use the Baum-
Welch algorithm to train the parameters of our model.
The Baum-Welch is an HMM context algorithm of the
expectation maximization (EM) algorithm. Details of
EM algorithm can be found in (Bilmes, 1998). The
training process can be set according to the structure
of the adapted model. For example, Figure 1 illus-
trates a four-state structure HMM. We need to find the
initial distribution of each of the four states and the
transition distribution between them. Also, the distri-
bution of the observed symbols at each state should
be determined as well. The list below shows how the
model parameters are trained:
1. Initializing model parameters π, A, B with posi-
tive random numbers between 0 and 1, where:
• (π) : The initial distribution of the states. The
most probable state that the model will start
with.
• (A) : The initial distribution of the transitions
between states.
• (B) : The initial distribution of the observed
symbols.
2. Baum-Welch algorithm is applied to learn HMM
parameters. The details of the Baum-Welch algo-
rithm are also presented in (Rabiner, 1989).
3. To make sure that there are no zeros within any of
trained HMM parameters, we add a small number
to each one of the parameters, followed by a scal-
ing process to ensure the probability condition; all
numbers in the symbols matrix add up to one. In
addition to that, we use the scaled version of Hid-
den Markov Model, which also works on over-
coming the resolution problems during the train-
ing process. Information about the scaled version
of HMM is provided in (Rabiner, 1989).
The training process aims to find the model pa-
rameters that maximize the likelihood of the se-
quences that represent the user’s normal behavior and
ICISSP 2019 - 5th International Conference on Information Systems Security and Privacy
462