System for Intrusion

Detection with Artificial Neural Network

José Ernesto Luna Domínguez, Anabelem Soberanes Martín and Cristina Juárez Landín

University Center UAEM Chalco Valley, Hermenegildo Galeana #3, Col. María Isabel,

Chalco Valley, Estate of Mexico, Mexico

Keywords: Artificial Neural Networks, Intrusion Detection, Multilayer Perceptron, Training Strategies.

Abstract: With the rapid expansion of computer networks during the past decade, security has become a crucial issue

for computer systems. Different soft-computing based methods have been proposed in recent years for the

development of intrusion detection systems. This paper presents a neural network approach to intrusion

detection. A Multi-Layer Perceptron (MLP) is used for intrusion detection based on an off-line analysis

approach. While most of the previous studies have focused on classification of records in one of the two

general classes - normal and attack, this research aims to solve a multi class problem in which the type of

attack is also detected by the neural network. Different neural network structures are analyzed to find the

optimal neural network with regards to the number of hidden layers. An early stopping validation method is

also applied in the training phase to increase the generalization capability of the neural network. The results

show that the designed system is capable of classifying records with about 91% accuracy with two hidden

layers of neurons in the neural network and 87% accuracy with one hidden layer.

1 INTRODUCTION

The rapid development and expansion of World

Wide Web and local network systems have changed

the computing world in the last decade. However,

this outstanding achievement has an Achilles’ heel:

The highly connected computing world has also

equipped the intruders and hackers with new

facilities for their destructive purposes. The costs of

temporary or permanent damages caused by

unauthorized access of the intruders to computer

systems have urged different organizations to

increasingly implement various systems to monitor

data flow in their networks (Vigna, 2002). These

systems are generally referred to as Intrusion

Detection Systems (IDSs).

There are two main approaches to the design of

IDSs. In a misuse detection based IDS, intrusions

are detected by looking for activities that

correspond to known signatures of intrusions or

vulnerabilities. On the other hand, an anomaly

detection based IDS detects intrusions by searching

for abnormal network traffic. The abnormal traffic

pattern can be defined either as the violation of

accepted thresholds for frequency of events in a

connection or as a user’s violation of the legitimate

profile developed for his/her normal behavior.

One of the most commonly used approaches in

expert-system based intrusion detection systems is

rule-based analysis using Denning’s (Denning,

1987) profile model. Rule-based analysis relies on

sets of predefined rules that are provided by an

administrator or created by the system.

Unfortunately, expert systems require frequent

updates to remain current. This design approach

usually results in an inflexible detection system that

is unable to detect an attack if the sequence of

events is even slightly different from the predefined

profile. The problem may lie in the fact that the

intruder is an intelligent and flexible agent while

the rule- based IDSs obey fixed rules. This

problem can be tackled by the application of soft

computing techniques in IDSs.

Soft computing is a general term for describing

a set of optimization and processing techniques that

are tolerant of imprecision and uncertainty. The

principal constituents of soft computing techniques

are Fuzzy Logic (FL), Artificial Neural Networks

(ANNs), Probabilistic Reasoning (PR), and Genetic

Algorithms (GAs) (Bonissone, 2005). The idea

behind the application of soft computing techniques

and particularly ANNs in implementing IDSs is to

include an intelligent agent in the system that is

capable of disclosing the latent patterns in

470

Luna J., Martín A. and Landín C..

System for Intrusion Detection with Artiﬁcial Neural Network.

DOI: 10.5220/0005256704700475

In Proceedings of the International Conference on Agents and Artiﬁcial Intelligence (ICAART-2015), pages 470-475

ISBN: 978-989-758-074-1

 2015 SCITEPRESS (Science and Technology Publications, Lda.)

abnormal and normal connection audit records, and

to generalize the patterns to new (and slightly

different) connection records of the same class.

In the present study, an off-line intrusion

detection system is implemented using Multi Layer

Perceptron (MLP) artificial neural network. While in

many previous studies (Cannady, 2006) the

implemented system is a neural network with the

capability of detecting normal or attack connections,

in the present study a more general problem is

considered in which the attack type is also detected.

This feature enables the system to suggest proper

actions against possible attacks. The promising

results of the present study show the potential

applicability of ANNs for developing practical IDSs.

Different structures of MLP are examined to find

a minimal architecture that is reasonably capable of

classification of network connection records. The

results show that even an MLP with a single layer of

hidden neurons can generate satisfactory

classification results. Because the generalization

capability of the IDS is critically important, the

training procedure of the neural networks is carried

out using a validation method that increases the

generalization capability of the final neural network.

2 MANUSCRIPT PREPARATION

The 1999 version of MIT Lincoln Laboratory–

DARPA (Defense Advanced Research Projects

Agency) intrusion detection evaluation data was

used in this research (Vigna, 2002). The sample

version of the dataset included more than 450,000

connection records. A subset of the data that

contained the desired attack types and a reasonable

number of normal events were selected manually.

The final dataset used in this study included 20,055

records.

2.1 Attack Types

There are at least four different known categories of

computer attacks including denial of service attacks,

user to root attacks, remote to user attacks and

probing attacks (Kendall, 1999). Two different

attack types were included in the dataset used for

this study: SYN Flood (Neptune) and Satan. These

two attack types were selected from two different

attack categories (denial of service and probing) to

check for the ability of the intrusion detection

system to identify attacks from different categories.

Availability of enough data records was the other

factor in choosing these two specific types.

Furthermore, there are studies that have used the

same attack types (Cannady, 2006) Therefore,

evaluation of the results by comparing them to

previous studies was possible. In the following

paragraphs, a description of the attack types is

provided.

SYN Flood (Neptune) is a denial of service attack

to which every TCP/IP implementation is vulnerable

(to some degree). For distinguishing a Neptune

attack network traffic is monitored for a number of

simultaneous SYN packets destined for a particular

machine. The host sending these packets is usually

unreachable (Kendall, 1999).

Satan is a probing intrusion which automatically

scans a network of computers to gather information

or find known vulnerabilities. The network probes

are quite useful for attackers planning a future attack

(Kendall, 1999).

Table 1: Distribution of data vectors in different subsets

for training, validation, and testing sets.

Record

Types

Training

SET

Validation

Set

Test Set

Normal 4,752 200 3,824

Neptune 3,360 200 2,231

Satan 2,625 200 1,214

Table 1 shows detailed information about the

number of records from normal and two attack types

included in training, validation, and testing sets.

There were 7,725 records of normal connections,

5,251 records of Neptune attack, and 2,474 records

of Satan attack in the dataset.

In DARPA dataset each event (connection) is

described with 41 features. 22 of these features

describe the connection itself and 19 of them

describe the properties of connections to the same

host in last two seconds. In many attack scenarios,

the signature of the attack record is identified

through examination of some features in a sequence

of records. Therefore, the IDS should analyze the

service types used by the same user in previous

connections and for this purpose these 19 features

describing past events in the computer network were

included in the feature vector.

A complete description of all 41 features is

available (Mukkamala, 2002), (Vigna, 2002).

Instead of describing all the features, here we divide

them into three groups and provide descriptions and

examples for each group.

Group 1 includes features describing the

commands used in the connection (instead of the

commands themselves). These features describe the

aspects of the commands that have a key role in

SystemforIntrusionDetectionwithArtificialNeuralNetwork

471

defining the attack scenarios. Examples of this group

are number of file creations, number of operations

on access control files, number of root accesses, etc.

Group 2 includes features describing the

connection specifications. This group includes a set

of features that present the technical aspects of the

connection. Examples of this group include:

protocol type, flags, duration, service types, number

of data bytes from source to destination, etc.

Group 3 includes features describing the

connections to the same host in last 2 seconds.

Examples of this group are: number of connections

having the same destination host and using the same

service, % of connections to the current host that

have a rejection error, % of different services on the

current host, etc..

During inspection of the data it turned out that

the values of six features (land, urgent,

num_failed_logins, num_shells, is_host_login

num_outbound_cmds) were constantly zero over all

data records (see (Mukkamala, 2002) for

descriptions). Clearly these features could not have

any effect on classification and only made it more

complicated and time consuming. They were

excluded from the data vector. Hence the data vector

was a 35 dimensional vector Different possible

values for selected features were extracted and a

numerical value was attributed to each of them. For

example, for the protocol type the possible

numerical values were: tcp=0, udp=1, icmp=2. This

numerical representation was necessary because the

feature vector fed to the input of the neural network

has to be numerical.

The ranges of the features were different and

this made them incomparable. Some of the features

had binary values where some others had a

continuous numerical range (such as duration of

connection). As a result, the features were

normalized by mapping all the different values for

each feature to [0, 1] range.

2.2 Implementation: Training and

Validation Method

The present study was aimed to solve a multi class

problem. Here, a three class case is described which

can be extended to cases with more attack types.

An output layer with three neurons (output states)

was used: [1 0 0] for normal conditions, [0 1 0] for

Neptune attack and [0 0 1] for the Satan attack.

The desired output vectors used in training,

validation, and testing phases were simply as

mentioned above. In practice, sometimes the output

of the neural network showed other patterns like [1

1 0] which were considered irrelevant. It is

straightforward to show that there are 6 possible

irrelevant cases.

In this paper, a three layer

neural network means

a neural network with two hidden layers (the input

layer is not counted because it acts just like a buffer

and no processing takes place in it; however, the

output layer is counted). The universal

approximation theorem states that an MLP (with

one or more hidden layers) can approximate any

function with arbitrary precision and of course the

price is an increase in the number of neurons in the

hidden layer (Theodorios, 1999). The question is if

anything is gained by using more than one hidden

layer. One answer is that using more than one layer

may lead to more efficient approximation or to

achieving the same accuracy with fewer neurons in

the neural network.

The performance of a 2 layer neural network is

seldom reported in the previous studies as

described in Section II. One of the objectives of the

present study is to evaluate the possibility of

achieving the same results with this less complicated

neural network structure. Using a less complicated

neural network is more computationally efficient.

Also it would decrease the training time.

MATLAB

Neural Network Toolbox was used

for the implementation of the MLP networks. Using

this tool one can define specifications like number of

layers, number of neurons in each layer, activation

functions of neurons in different layers, and number

of training epochs. Then the training feature vectors

and the corresponding desired outputs can be ted to

the neural network to begin training.

All the implemented neural networks had 35

input neurons (equal to the dimension of the

feature vector) and three output neurons (equal to

the number of classes). Number of the hidden layers

and neurons in each were parameters used for the

optimization of the architecture of the neural

network. Error back-propagation algorithm was

used for training.

One problem that can occur during neural

network training is over-fitting. In an over fitted

ANN, the error (number of incorrectly classified

patterns) on the training set is driven to a very small

value, however, when new data is presented, the

error is large. In these cases, the ANN has

memorized the training examples; however, it has

not learnt to generalize the solution to new

situations.

One possible solution for the over-fitting

problem is to find the suitable number of training

epochs by trial and error. In this study, the training

ICAART2015-InternationalConferenceonAgentsandArtificialIntelligence

472

time was too long (25 hours in the first experiment).

Therefore, it was not reasonable to find the optimal

number of epochs by trial and error. A more

reasonable method for improving generalization is

called early stopping. In this technique, the

available data is divided into three subsets. The first

subset is the training set, which is used for training

and updating the ANN parameters. The second

subset is the validation set. The error on the

validation set is monitored during the training

process. The validation error will normally decrease

during the initial phase of training similar to the

training set error. However, when the ANN begins

to over-fit the data, the error on the validation set

will typically begin to rise. When the validation

error increases for a specified number of iterations,

the training is stopped, and the weights that

produced the minimum error on the validation set

are retrieved. In the present study, this training-

validation strategy was used in order to maximize

the generalization capability of the ANN.

3 EXPERIMENTAL RESULTS

The first implemented intrusion detector was a three

layer MLP (two hidden layers with 35 neurons in

each). This structure is referred to as: {35 35 35 3}.

At this stage early stopping validation was not

applied and the training was performed for 200

times. The training process took more than 25

hours. Figure 1 shows the mean square error of the

back propagation training process versus the

progress of training epochs. The error clearly

decreased to an outstanding level (comparable to

zero). Therefore, it was expected to have good

classification results. The final correct classification

rate on training set confirmed this theory: it was

very close to 100%. However, when unseen data

(test set) was fed to the neural network, the result

was undesirable. The correct classification rate was

less than 80%.

3.1 Application of Early Stopping

Validation Method

The initial result was a clear indication of over-

fitting of the neural network. As explained, the

reasonable solution was to define a validation data

set and monitor the classification error on this data

set while the neural network was being trained. The

validation set used in this study consisted of 900

data records (300 of each class). The same neural

network {35 35 35 3} was trained this time by

applying early stopping validation method. Figure 2

shows the error of the training process versus

progress of training epochs for one training session.

Figure

The

mean

square

error

the

back-propagation

training

procedure versus

training

epochs

for

layer

neural

network

{35

3}.

Figure

The

training

process

error

when

the

early

stopping

validation

method is

applied

The error on the training set (darker curve) was

decreasing after epoch number 45; however, the

training process was stopped because the error on

the validation set was constant for ten epochs.

As expected, the correct classification rate on the

training set declined slightly (98% compared to

100% in the first experiment). Instead, when unseen

data (test set) was fed to the neural network the

result was considerably better than the first

experiment in which the early stopping method was

not applied. The correct classification rate was more

than 90% showing an 11% increase (from 80% in

the first experiment).

There was another advantage associated with

application of early stopping method: the training

time was decreased because the number of training

epochs was restricted by early stopping. The

training-validation time in this implementation was

SystemforIntrusionDetectionwithArtificialNeuralNetwork

473

less than 5 hours which is an improvement over 25

hours training time in the first experiment?

Because of the stochastic nature of the neural

networks, it is usually common to report results of

multiple training- testing procedures. Table 2

illustrates the results of three training-validation-

testing sessions of {35 35 35 3} MLP used in this

study. The correct classification results are reported

separately for training and test data sets.

Table 2: Correct classification rates in three different

training.

Training

Session

Correct

Classification on

Training Set

Correct Classification

on Test Set

1 98.2 89.2

2 98.1 90.9

3 96.9 90.3

Average 97.46 90.13

3.2 Discussion

There were three categories of incorrect outputs:

false positive, false negative, and irrelevant neural

network output. The irrelevant outputs were those

that did not represent any of the output classes in

the data set (normal, Neptune attack, Satan attack).

While in a two state neural network implemented

with one output neuron there is no irrelevant output

state, in a three output neural network, there are 6

irrelevant states. An analysis showed that in the

three layer neural network with 90.9% correct

classification, more than half of the incorrect

results were from the category of irrelevant results.

The number of incorrect classifications of this

category can be decreased by classifying each

irrelevant pattern in the class corresponding to the

output neuron that has the highest value of activation

function.

Are the results presented in the previous section

satisfactory? To answer this question, they should be

compared to the results of similar studies. In a

previous study (Mukkamala, 2002), a result of more

than 99% correct classification on this dataset

using the neural network structure {41-40-40-1}

was reported. However, a two class problem was

solved in which the records were classified either in

normal or in attack classes. In another similar study

with different dataset (Cannady, 2006), the success

rate was comparable to the results of the present

study (89-99%) and again a two class problem was

implemented.

4 CONCLUSION AND FUTURE

WORK

An approach for a neural network based intrusion

detection system, intended to classify the normal

and attack patterns and the type of the attack, has

been presented in this paper. We applied the early

stopping validation method which increased the

generalization capability of the neural network and

at the same time decreased the training time. It

should be mentioned that the long training time of

the neural network was mostly due to the huge

number of training vectors of computation facilities.

However, when the neural network parameters were

determined by training, classification of a single

record was done in a negligible time. Therefore, the

neural network based IDS can operate as an online

classifier for the attack types that it has been

trained for. The only factor that makes the neural

network off-line is the time used for gathering

information necessary to compute the features.

A two layer neural network was also successfully

used for the classification of connection records.

Although the classification results were slightly

better in the three layer network, application of a

less complicated neural network was more

computationally and memory wise efficient.

From the practical point of view, the

experimental results imply that there is more to do

in the field of artificial neural network based

intrusion detection systems. The implemented

system solved a three class problem. However, its

further development to several classes is

straightforward. As a possible future development to

the present study, one can include more attack

scenarios in the dataset. Practical IDSs should

include several attack types. In order to avoid

unreasonable complexity in the neural network, an

initial classification of the connection records to

normal and general categories of attacks can be the

first step. The records in each category of intrusions

can then be further classified to the attack types.

REFERENCES

Becker, M., 2001. A neural network component for an

intrusion detection system. Oakland, s.n., pp. 240-250.

Bonissone, P. P., 2005. Soft computing: the convergence

of emerging reasoning technologies. Soft Computing

Journal, 3(4), pp. 6-18.

Cannady, J., 2006. Artificial Neural networks for misuse

detection. Arlington, s.n.

Denning, D. E., 1987. An Intrusion detection model. IEEE

ICAART2015-InternationalConferenceonAgentsandArtificialIntelligence

474

Transactions on software engineering, pp. 222-232.

Fox, K., 2004. A neural network approach towards

intrusion detection. Baltimore, s.n., pp. 125-134.

Kendall, K., 1999. A database of computer attacks for the

evaluation of intrusion detection systems, s.l.: MIT.

Lippmann, R., 2000. Improving intrusion detection

performance using kerword selection and neural

networks. Purdue, s.n.

Mukkamala, S., 2002. Intrusion detection using neural

networks and support vector machine. Honolulu, s.n.

Poole, D., 2003. Computational Intelligence. New York:

Oxford University Press.

Sinclair, C., 1999. An application of machine learning to

network intrusion detection. Phoenix, s.n., pp. 371-

377.

Theodorios, S., 1999. Pattern Recognition. Cambridge:

Academic Press.

Vigna, G., 2002. Intrusion detection: a brief history and

overview. Phoenix: s.n.

Zincir, A. N., 2002. Host-based intrusion detection using

self-organizing maps. Honolulu, s.n., pp. 1714-1719.

SystemforIntrusionDetectionwithArtificialNeuralNetwork

475