defining the attack scenarios. Examples of this group
are number of file creations, number of operations
on access control files, number of root accesses, etc.
Group 2 includes features describing the
connection specifications. This group includes a set
of features that present the technical aspects of the
connection. Examples of this group include:
protocol type, flags, duration, service types, number
of data bytes from source to destination, etc.
Group 3 includes features describing the
connections to the same host in last 2 seconds.
Examples of this group are: number of connections
having the same destination host and using the same
service, % of connections to the current host that
have a rejection error, % of different services on the
current host, etc..
During inspection of the data it turned out that
the values of six features (land, urgent,
num_failed_logins, num_shells, is_host_login
num_outbound_cmds) were constantly zero over all
data records (see (Mukkamala, 2002) for
descriptions). Clearly these features could not have
any effect on classification and only made it more
complicated and time consuming. They were
excluded from the data vector. Hence the data vector
was a 35 dimensional vector Different possible
values for selected features were extracted and a
numerical value was attributed to each of them. For
example, for the protocol type the possible
numerical values were: tcp=0, udp=1, icmp=2. This
numerical representation was necessary because the
feature vector fed to the input of the neural network
has to be numerical.
The ranges of the features were different and
this made them incomparable. Some of the features
had binary values where some others had a
continuous numerical range (such as duration of
connection). As a result, the features were
normalized by mapping all the different values for
each feature to [0, 1] range.
2.2 Implementation: Training and
Validation Method
The present study was aimed to solve a multi class
problem. Here, a three class case is described which
can be extended to cases with more attack types.
An output layer with three neurons (output states)
was used: [1 0 0] for normal conditions, [0 1 0] for
Neptune attack and [0 0 1] for the Satan attack.
The desired output vectors used in training,
validation, and testing phases were simply as
mentioned above. In practice, sometimes the output
of the neural network showed other patterns like [1
1 0] which were considered irrelevant. It is
straightforward to show that there are 6 possible
irrelevant cases.
In this paper, a three layer
neural network means
a neural network with two hidden layers (the input
layer is not counted because it acts just like a buffer
and no processing takes place in it; however, the
output layer is counted). The universal
approximation theorem states that an MLP (with
one or more hidden layers) can approximate any
function with arbitrary precision and of course the
price is an increase in the number of neurons in the
hidden layer (Theodorios, 1999). The question is if
anything is gained by using more than one hidden
layer. One answer is that using more than one layer
may lead to more efficient approximation or to
achieving the same accuracy with fewer neurons in
the neural network.
The performance of a 2 layer neural network is
seldom reported in the previous studies as
described in Section II. One of the objectives of the
present study is to evaluate the possibility of
achieving the same results with this less complicated
neural network structure. Using a less complicated
neural network is more computationally efficient.
Also it would decrease the training time.
MATLAB
TM
Neural Network Toolbox was used
for the implementation of the MLP networks. Using
this tool one can define specifications like number of
layers, number of neurons in each layer, activation
functions of neurons in different layers, and number
of training epochs. Then the training feature vectors
and the corresponding desired outputs can be ted to
the neural network to begin training.
All the implemented neural networks had 35
input neurons (equal to the dimension of the
feature vector) and three output neurons (equal to
the number of classes). Number of the hidden layers
and neurons in each were parameters used for the
optimization of the architecture of the neural
network. Error back-propagation algorithm was
used for training.
One problem that can occur during neural
network training is over-fitting. In an over fitted
ANN, the error (number of incorrectly classified
patterns) on the training set is driven to a very small
value, however, when new data is presented, the
error is large. In these cases, the ANN has
memorized the training examples; however, it has
not learnt to generalize the solution to new
situations.
One possible solution for the over-fitting
problem is to find the suitable number of training
epochs by trial and error. In this study, the training
ICAART2015-InternationalConferenceonAgentsandArtificialIntelligence
472