Can a Fuzzy Rule Extraction Find an Extremely Tiny

non-Self Region?

Akira Imada

Brest State Technical University

Moskowskaja 267, 224017 Brest, Republic of Belarus

Abstract. This paper reports one snapshot of our on-going experiments in which

a common target we call a-tiny-island-in-a-huge-lake is explored with different

methods ranging from a data-mining technique to an artiﬁcial immune system.

Our implicit interest is a network intrusion detection where we usually do not

knowwhat does an illegal transaction pattern look like until it completed intrusion

when it was too late. Hence our ﬁrst interest is (i) if it is possible to train the

intrusion detection system only using legal patterns. From this context we assume

data ﬂoating in the lake are normal while ones found on the island is abnormal.

Our second concern is then (ii) to study the limit of the size of the detectable area,

that is, until what size can the detector detect it when we decrease the size of the

island shrinking to zero, which is sometimes called a-needle-in-a-haystack. In

this paper, a fuzzy rule extraction implemented by a neural network architecture

is employed for the purpose.

1 Introduction

This paper is a snapshot of our on-going investigation into how do already proposed

methods, each of which claims successful, work on a special situation of what we call

a-tiny-island-in-a-huge-lake. To be more speciﬁc, we now are exploring a couple of so

far proposed approaches for the purpose of searching for an extremely tiny unknown re-

gion of non-self data surrounded by an overwhelming amount of known self data partly

with an interest from a view point of network intrusion detection where self data imply

normal transactions while non-self data imply anomaly.

Assuming our whole universe is n-dimensional Euclidean space all of whose coordi-

nate are in [−1, 1], a region of non-self data are our targets to be searched for, that is,

a tiny hyper-rectangle all of the coordinates inside are in [−a, a] (a < 1), or a tiny

hyper-sphere whose center locates at the origin and radius is a (a < 1). In a network

intrusion detection, the number of abnormal patterns which we assume as non-self is

extremely fewer than the number of normal patterns which we assume as self data This

is the reason why our target should be extremely tiny. In other words, a ≈ 0 is our

condition.

Imada A. (2005).

Can a Fuzzy Rule Extraction Find an Extremely Tiny non-Self Region?.

In Proceedings of the 1st International Workshop on Artiﬁcial Neural Networks and Intelligent Information Processing, pages 35-41

DOI: 10.5220/0001197000350041

 SciTePress

We have so far exploited (i) artiﬁcial immune system approach, especially a negative

selection algorithm in which constant or variable sized hyper-sphere detectors detect

non-self cells; (ii) immuno-fuzzy approach where a set of random fuzzy rules eventually

evolves to cover non-self region; (iii) evolutionary computation approach where also an

evolution of a set of random detectors ﬁnally detect non-self; and so on.

In this paper, we study a fuzzy rule extraction using a neural network proposed by

Castellano et al. [1]. The system they proposed were very clearly described and it seems

to be very sound and efﬁcient, except for the way in that their data applied by the sys-

tem. They employed an Iris-database in a popular public domain. The database contains

three different classes of iris family and one class is assumed to be self whilst the other

two are assumed to be non-self. The training samples are chosen at random from these

two classes and train the system. Then system is tested using the rest of the data in

the database. The result was successful. We, however, doubt the real applicability of

idea of using artiﬁcial data set in such a way in a context of intrusion detection. This

is principally because of the two following reasons: Usually, in the context of intrusion

detection, (i) the number of non-self (anomaly) data is extremely fewer than the number

of self (normal) data; and (ii) we don’t know what does a non-self datum look like until

it completes its intrusion successfully. It would be too late.

Hence our current interest is also two-fold: First, the non-self region should be tiny

and secondly, training should be made only by self data. We explore these two points

using above mentioned fuzzy rule extraction by neural network proposed by Castellano

et al. [1].

2 Method

The goal is to classify each of the data from n-dimensional data-set into either of m

classes. For the purpose, Castellano et al. [1] used the inference mechanism of a zero-

order Takagi-Sugeno fuzzy model; then realized the idea by a fuzzy neural network

model. To train the fuzzy neural network, they employed a combination of a compet-

itive learning to determine the architecture of the fuzzy neural network and a gradient

descent learning to optimize the synaptic weights. We, on the other hand, employ, an

evolutionary computation technique to train the network since we already knew the net-

work structure under our current interest, that is, all we need to detect island is just one

rule, and as such, our concern is just to obtain the solution of weight conﬁguration of

the network, and an evolutionary computation is expected to ﬁnd it more simply than

the proposed approach.

In the following three subsections (i) Takagi-Sugeno fuzzy model, (ii) a realization of

the model by fuzzy neural network, and (iii) how we optimize the weight of the fuzzy

neural network by an evolutionary computation.

2.1 Takagi-Sugeno Model

Though Castellano et al. [1] stated the method very clearly in their paper, let us brieﬂy

describe it with an intention of making this paper self-contained. Takagi-Sugeno fuzzy

inference model is made up of a set of H rules, such as

: IF (x

is A

) and · · · and (x

is A

)

THEN (y

is ν

) and · · · and (y

is ν

)

where R

is the k-th rule (k = 1, · · · H), x

is the i-th variable of the input data

(i = 1, · · · , n), y

is j-th output variable, A

are fuzzy sets which are usually ex-

pressed by linguistic terms such as “medium large” but here expressed by a shape of

membership function deﬁned one by one on the corresponding input variable, and ν

are fuzzy singletons deﬁned on the output variables indicating the likeliness of the j-th

class inferred by k-th rule.

is deﬁned by Gaussian membership functions

) = exp{−(x

− w

)

/σ

Then defuzziﬁcation for input x

= (x

, · · · , x

) is according to

k=1

) · ν

k=1

)

where

) =

i=1

)

is the results of application of Larsen product operator.

In other words, the procedure of inference is as follows. When an input x = (x

, · · · x

)

is given, each of the H rules evaluates the x and output the likeliness of the class, from

one class to the next, to which x belongs to. The evaluation by k-th rule of x

is by the

corresponding membership function µ

) which is speciﬁed by giving two parame-

ters w

and σ

returning a value ranging from 0 to 1. See, for example, Fig. 1 where

i-th coordinate of the input x is evaluated by A

, i-th antecedent of the IF part of the

, which is represented a set of membership function not a usual linguistic term like

“Large”. The returned membership value of this example in the Figure is 0.71, suggest-

ing, say, the likeliness of if the variable is “Medium Large” is 0.71 ...

Each of the H rules calculates µ

(x) from those n values of µ

). Finally, those H

values are combined to calculate m values of y

, the result of defuzziﬁcation for each

of the m classes.

This procedure is realized when we assume a neural network such as depicted in Fig. 2.

The ﬁrst layer is made up of n input neurons. The second layer is made up of H groups

0.2

0.4

0.6

0.8

-1 -0.5 0 0.5 1

= 0.348

(0.71)

= 0.70

= 0.60

Fig.1. A ﬁctitious sketch of an evaluation of x

, i-th entry of the input x, by A

, the i-th an-

tecedent part of a k-th rule.

each contains n neurons where i-th neuron of k-th group has a connection to i-th neu-

ron in the ﬁrst layer with the synapse which has a pair of weights (w

, σ

). Then k-th

group in the second layer calculate the value µ

(x) from the values its n neurons re-

ceived. Third layer is made up of m neurons each of which collects the H value from

the output of the second layer, that is j-th neuron of the third layer receives the value

from k-th output in the second layer with the synapse which has the weight ν

2.2 How it learns?

Castellano et al. [1] used (1) a competitive learning to determine how many rules are

needed and initial weights. Then in order to optimize these initial weights they use (2)

a gradient method performing the steepest descent on a surface in the weight space em-

ploying the same training data, that is, supervised learning.

Here, on the other hand, we use a simple genetic algorithm. Since our target space is

speciﬁc enough to know the network structure already, our concern is just to obtain the

solution of weight conﬁguration of the network. That is to say, all we want to know is a

set of parameters w

, σ

and ν

(i = 1, · · · n, k = 1, · · · H, j = 1, · · · m) where n is

the dimension of data, H is the number of rules, and m is the number of outputs. Hence

our chromosome has those n × H × m genes. Starting with a population of chromo-

somes whose genes are randomly created, they evolve under simple truncate selection

where higher ﬁtness chromosome are chosen, with uniform crossover and occasional

mutation by replacing some of a few genes with randomly created other parameters,

expecting higher ﬁtness chromosomes will be emerged. These settings are determined

by trials and errors experimentally.

To evaluate ﬁtness of each chromosome, we use a measure originally proposed by Lopes

et al. [2] and used widely nowadays in which four quantities, i.e., (i) true-positive, (ii)

true-negative, (iii) false positive, and (iv) false negative are used. Here we assume posi-

tive sample is non-self and negative sample is self, since detectors is designed to detect

non-self cells. Hence, these four terms are deﬁned in a sence that (i) t

(true positive) —

...

... ...

1 2 3

Rule

y y y

......

Fig.2. Architecture of the proposed fuzzy neural network which infers how an input x =

, · · · x

) is likely to belong to the j-th class by generating outputs y

each of which reﬂects

the degree of the likeliness. In this Figure 20-dimensional data input will infer which of the three

classes the input belongs by using two rules.

true declaration of positive sample, i.e., non-self declared as non-self (ii) f

(false pos-

itive) — false declaration of positive sample, i.e., self declared as non-self (iii) t

(true

negative) — true declaration of negative sample, i.e., self declared as self (iv) f

(false

negative) — false declaration of negative sample, i.e., non-self declared as self. Under

these deﬁnitions d

= t

/(t

+ f

) implies detection rate, and f

= f

/(t

+ f

)

implies false alarm rate. Then we plot d

versus f

, and the resultant graph is called

Receiver Operating Characteristics (OCR) [3] which reﬂects a tradeoff between false

alarm rate and detection rate.

3 Experiment

Castellano et al. [1] used IRIS data found in a public domain in a very clever way,

writing as follows.

The validity of our approach to fuzzy inference and rule extraction has been

tested on the well-known benchmark IRIS data problem. The classiﬁcation

problem of the IRIS data consists of classifying three species of iris ﬂowers

(setosa, versicolor and virginica). There are 150 samples for this problem, 50

of each class. A sample is a four-dimensional pattern vector representing four

attributes of the iris ﬂower (sepal length, sepal width, petal length, and petal

width).

However, we doubt this way of using IRIS data as an artiﬁcial data at least in the context

of network intrusion detection. One reason of our doubt is that illegal patterns are usu-

ally unknown and it cannot be represented by certain speciﬁc patterns, if not al all. The

other reason is the sparseness of this data set. What if the system meets a pattern which

does not belong to either of the two classes? See another report of us [4] regarding this

issue.

Our target problem is what we call a-tiny-ﬂat-island-in-a-huge-lake which we came

across when we had explored a ﬁtness landscape deﬁned on all the possible synaptic

weight values of a fully-connected spiking neurons to give them a function of associa-

tive memory [5]. To simplify it, we formalized the problem in more general form as

follows.

Test-function 1 (A tiny island in a huge lake - 1) Find an algorithm to locate a point

in the region A all of whose coordinates are in [−a, a] (a ≈ 0) in an universe of the

n-dimensional hypercube all of whose coordinate x

lie in [−1, 1] (i = 1, · · · , n).

Test-function 2 (A tiny island in a huge lake -2) Find an algorithm to locate a point

in the region A all of whose coordinates are in the hyper-sphere whose radius is a (a ≈

0) and its center locates at the origin in an universe of the n-dimensional hypercube all

of whose coordinate x

lie in [−1, 1] (i = 1, · · · , n).

An experiment was carried out in the 20-dimensional space. Our assumption is normal

data exist in the lake region while abnormal data in the island region. We control the size

of the island by changing the value a. Hence it is easy to guess that only one inference

rule is enough to classify input into either of two classes. The architecture of the fuzzy

network is, therefore, twenty inputs, one rules, and two outputs.

4 Results and Discussion

Though our experiments have sometimes reversed our expectations depending on pa-

rameters determined, we are obtaining a series of successful results, such as shown in

Fig. 3 where an example of obtained membership function corresponding to one an-

tecedent of the rule (Left), as well as one of the two output singletons of the same

experiment (Right). Training Samples are from the assumed legal data exist in the lake

to identify the illegal data exists in the tiny island deﬁned as a = 0.1.

In the Figure, although only one example of membership function out of 20 other 19

membership functions are more or less similar to the one shown in the ﬁgure. This

suggest that

: IF all of x

is near the origin THEN (y

is HIGH) and (y

is LAW).

That is the input belongs to the abnormal class.

0.2

0.4

0.6

0.8

-1 -0.5 0 0.5 1

0.2

0.4

0.6

0.8

1.2

0.250 0.75 10.5

Fig.3. An example of experimental result of a membership function of one antecedent mem-

bership function of a rule (Left), and one of the two output singletons of the same experiment

(Right). Training Samples are from the assumed legal data exist in the lake, while the illegal data

is assumed to be in the tiny island deﬁned as a = 0.1.

5 Summary

In this paper, we have reported our on-going investigations, that is, how already pro-

posed methods work on a special situation of what we call a-tiny-island-in-a-huge-lake.

When we increase the difﬁculty of the problem by making the size of the island shrink

to zero, it becomes what they call a-needle-in-a-haystack. As far as we know, this issue

has resisted to be fully solved and still remains open. Though our results so far has not

been matured yet, we hope a lot of experiments which might result in positive obser-

vations in considering how we design a network intrusion detection system await our

exploration.

References

1. G. Castellano and A. M. Fanelli (2000) ”Fuzzy Inference and Rule Extraction using a Neural

Network.” Neural Network World Journal Vol. 3, pp. 361–371.

2. H. S. Lopes, M. S. Coutinho, W. C. Lima (1997) “An Evolutionary Approach to Simulate

Cognitive Feedback Learning in Medical Domain.” Genetic Algorithms and Fuzzy Logic

Systems. World Scientiﬁc, pp. 193–207.

3. F. Provost, T. Fawcett, and R. Kohavi (1989) “The case against accuracy estimation for com-

paring induction algorithms.” Proceedings of international conference on machine learning,

pp. 445–453.

4. A. Imada (2005) “When One Iris Flower is Normal Then are Others Abnormal?” Proceedings

of International Conference on Pattern Recognition and Information Processing,

5. A. Imada (2004) “How a Peak on a Completely-ﬂatland-elsewhere can be Searched for? — A

Fitness Landscape of Associative Memory by Spiking Neurons.” Proceedings of Advanced

Computer Systems (ACS) and Computer Information Systems and Industrial Management

Applications (CISIM), Vol.2, pp. 171–150.