A LOW COST WORM DETECTION TECHNIQUE

BASED ON FLOW PAYLOAD SIMILARITY

Youhei Suzuki, Yuji Waizumi, Hiroshi Tsunoda and Yoshiaki Nemoto

Graduate School of Information Sciences, Tohoku University

6-6-05, Aramaki-Aza-Aoba, Aobaku, Sendai-shi, Miyagi, 980-8579, Japan

Keywords:

Worm, Similarity of Flow Payloads, Clustering, Intrusion Detection.

Abstract:

Recently, damages of information systems by worms have been reported at global level. Signature based

Intrusion Detection Systems (IDSs) are widley used to prevent these damages. To handle newly created worms,

automatic signature generation techniques based on common strings in the payloads of multiple worm ﬂows

of the same kind have been proposed. Because these techniques need to use multiple strings as a signature

for each kind of worm to acheive high detection accuracy, the calculation cost to detect worms is a serious

issue. In this paper, we propose a novel scheme that does not use common character strings. The proposed

scheme uses a 256-dimensional vector based on the appearance frequencies of 256 character codes. This

vector is generated automatically and used as a mean to detect worms with low cost. In addition, we construct

a cheap worm detection system by using the proposed method as the ﬁrst stage analysis of conventional IDS.

We evaluate the proposed scheme through experiments and present its performance.

1 INTRODUCTION

Internet worms are one of the most serious threats in

the Internet. With improvement in the speed of net-

works and computers, the diffusion speed of worms

is also increasing vigorously. Worms are responsi-

ble for a large bulk of damages caused to informa-

tion systems (Yaneza et al., 2005). In order to control

the damage by these worms, highly accurate Intrusion

Detection Systems (IDSs) need to be implemented.

Most IDSs like Snort (Snort, 1998) adopt signature

matching techniques to detect worms. Although this

approach can detect known worms with high accu-

racy, the signature matching process is computation-

ally expensive. Since a new signature has to be added

to the signature database of the IDSs in order to detect

a new kind of worm and its subspecies, the computa-

tional cost for searching signatures constantly gets in-

creased. This can be a serious problem as new kinds

of worms are created every day.

Same kinds of worms carry similar payloads (a

set of the payload of all the packets contained in a

ﬂow) reported in (Akritidis et al., 2005). Because a

worm may spread by its own copy to other hosts at

the time of diffusion, payloads of ﬂows transmitted

from the same kind of worms can have high similar-

ity. Based on this fact, systems to generate signatures

automatically from common strings in the payloads of

multiple worm ﬂows are proposed in (Kim and Karp,

2004) (Simkhada et al., 2005) (Newsome et al., 2005)

(Singh et al., 2004) (Wang et al., 2005). Although

these systems can shorten the time to generate signa-

tures, they can not reduce the detection cost because

they use multiple strings as signatures for each kind of

worm. The calculation cost to detect worms remains

a serious issue for network security.

2 BACKGROUND

Most signature based IDSs detect worms by matching

the signature strings to worm payloads. Since most

worms can be detected by using only one string, of

the computational cost of these IDSs to detect a worm

ﬂow is O(LN), where L and N denote the lengths of

payloads of a worm ﬂow and the number of strings

of a signature, respectively. To reduce the signature

generation time, automatic signature generation tech-

414

Suzuki Y., Waizumi Y., Tsunoda H. and Nemoto Y. (2007).

A LOW COST WORM DETECTION TECHNIQUE BASED ON FLOW PAYLOAD SIMILARITY.

In Proceedings of the Third International Conference on Web Information Systems and Technologies - Internet Technology, pages 414-417

DOI: 10.5220/0001279704140417

 SciTePress

niques based on similarities of worm ﬂows are pro-

posed in (Akritidis et al., 2005),(Singh et al., 2004).

Because these approach need two or more common

signature strings to achieve a high detection accuracy,

the computational cost gets signiﬁcantly high.

(Singh et al., 2004)(Wang et al., 2005)(Kruegel

et al., 2002)(Tsuji et al., 2005) have shown that it

is possible to evaluate similarities between ﬂow pay-

loads in terms of a 256-dimensional vector based on

histograms of the appearance probabilities of 256 byte

codes. We call this vector

h vector and express it as

h = (h

,··· , h

255

) (1)

where h

is the appearance probability of code i. The

vector exhibits the feature of whole ﬂow payloads of a

ﬂow, and is of ﬁxed length 256. Consequently, by us-

ing the

h vector as the signature, we can detect worms

with a lower calculation cost. The computationcost to

extract

h vector from a ﬂow payload is proportional to

the length of the ﬂow, i.e. O(L). The cost of evaluat-

ing the similarity between ﬂows by using

h vectors is

constant, which is equal to the dimension of

h vector.

Thus, the total calculation cost for detecting worms

by methods using

h vectors is O(L+256C

), whereC

is the average number of signatures.

In this paper, we use

h vectors to reduce the

calculation cost for detection, and build a low cost

worm detection system which consists of two de-

tection stages. The ﬁrst stage uses

h vectors to de-

tect worms and reduce the number of ﬂows required

to be analyzed during the second stage. The sec-

ond stage adopts common string signatures to detect

worms whose signature

h vectors do not exist in the

signature database.

3 WORM DETECTION USING

VECTORS

In this section we use

h vector for worm detection

and investigate its performance. We adopt a cluster-

ing technique to extract these worms. Similar worm

ﬂows are clustered in the 256-dimensional space. The

average position of the ﬂows in each cluster as used

as the

h vector of the corresponding cluster. (Waizumi

et al., 2005) reports that

h vectors of a kind of worm

can be present in multiple clusters. Signature

h vec-

tors are then calculated. Worms are detected by using

these signature vectors.

3.1 Signature Vector Generation

Let,

i, j

denote the

h vector of ﬂow j of a kind of

worm i. And ~m

i,c

represent the

h vector of worm clus-

ter c. The clustering algorithm is shown as follows:

Begin

i,1

← h

i,1

j ← j + 1

w ← argmin

′

(D(

i, j

,~m

i,c

′

))

i, j

− ~m

i,w

) < θ

then

i,w

← ~m

i,w

· (n

i,w

− 1)/|w| +

i, j

/|w|

else ~m

i,c

←

i, j

← c

+ 1

until

i, j

== NULL

end

where, |w| is a number of elements included in cluster

w, and c

is number of clusters. Moreover, the dis-

tance D(

i, j

, ~m

i,c

) between

i, j

and ~m

i,c

is calculated

as,

i, j

, ~m

i,c

) =

255

∑

k=0

i, j,k

− m

i,c,k

)

(2)

where, h

i, j,k

and m

i,c,k

are the elements of the k

di-

mension of

i, j

and ~m

i,c

Two or more clusters with radius is θ

are gen-

erated by this clustering algorithm. Flows whose

vectors are far from each other are clustered into dif-

ferent clusters. If the same kind of worm has multiple

h vectors, multiple signature vectors are generated for

the worm.

3.2 Detection By Signature Vectors

Observed ﬂows which are signiﬁcantly near the sig-

nature vector are detected as worms. The criterion of

detecting worm is deﬁned by a threshold distance θ

If a newly observed ﬂow is less than θ

from a signa-

ture vector, the ﬂow is detected as a worm ﬂow.

3.3 Performance Evaluation

3.3.1 Experimental Environment

In this experiment, we use an off-line real network

trafﬁc containing worm ﬂows. By using a signature

provided by Bleeding threats (Bleeding Edge Threats,

2004), Bagle, MyDoom and Netsky.P worm ﬂows are

extracted from the trafﬁc and are used as test ﬂows.

In the same way, about 13,000 normal ﬂows are ex-

tracted from the trafﬁc in one day and are used for

evaluating false alarms.

A LOW COST WORM DETECTION TECHNIQUE BASED ON FLOW PAYLOAD SIMILARITY

415

0 50 100 150 200 250 300

Number of flow

Detection time (sec)

Conventional method N=10

Conventional method N= 1

Proposed method C

i = 1

Proposed method C

i = 10

Figure 1: Comparison the detection time of the proposed

method and the document (Simkhada et al., 2005).

3.3.2 Evaluation of Detection Time

We compare the detection time of the proposed

method with that of a conventional method which

uses common strings as signatures (Simkhada et al.,

2005) by using Netsky.P worm ﬂows. Signature vec-

tors and signature strings are generated from 465

ﬂows. 314 ﬂows are used to investigate the efﬁciency

of both methods. The number of signature vectors C

of the proposed method is set to one and ten by ad-

justing threshold θ

From Figure 1, it is clear that the proposed method

can detect worms with lower calculation cost than ex-

isting method. The rate of increase in the detection

time in the proposed method is also less compared to

the conventional method.

3.3.3 Evaluation of Detection Accuracy

We evaluatethe accuracy of proposed technique in de-

tecting worms. In this evaluation, we use the network

trafﬁc of two months. The signature vectors are gen-

erated from the trafﬁc of the ﬁrst month and worm

ﬂows (Bagle 91 ﬂows, MyDoom 73 ﬂows and Net-

sky.P 465 ﬂows). Worm ﬂows from a separate data-

base (Bagle 52 ﬂows, MyDoom 62 ﬂows and Net-

sky.P 314 ﬂows) are used to evaluate the detection ac-

curacy. At the same time, the numbers of false alarms

are also evaluated using about 13,000 normal ﬂows.

Figure 2 depicts the detection rate and the false

positive rate when threshold θ

(to generate signature

vectors) and threshold θ

(for detection) are set to a

same value, θ. Figure 2 shows it is possible to achieve

a 100% detection rate with a low false positive rate.

Table 1 shows the highest detection accuracy for each

worm when threshold θ

and θ

vary independently.

The expression (3) shows the system sensitivity used

in the work (Simkhada et al., 2005). The closer the

value of S is to 1, better the system sensitivity. From

these results we can say that the proposed method can

achieve high detection accuracy by selecting an ap-

100

0 0.001 0.002 0.003

0.1

0.2

0.3

detection rate (%)

false positive rate (%)

threshold ƒ¨

MyDoom

Bagle

Netsky.P

detection rate

false possitive rate

Figure 2: Relation between detection rate and false positive

rate by threshold θ change.

Table 1: Detection accuracy with respect to thresholds θ

and θ

Worm θ

Det FP S C

Bagle 0.002 0.002 98.1% 0.04% 0.977 2

MyDoom 0.001 0.0004 100% 0.02% 0.999 2

Netsky.P 0.001 0.0005 100% 0.00% 1.000 3

propriate threshold.

S = (detection rate) × (100− false positive rate)/10000 (3)

From Table 1, it is clear that the proposed scheme is

capable of achieving a high detection accuracy. Most

of the false positive ﬂows were e-mail ﬂows with bi-

nary ﬁles attached along. Because many elements of

h vectors extracted from both worm ﬂows and e-mail

ﬂows attached binary ﬁles tend to be zero, these ﬂows

showed similarity and were detected as worm ﬂows.

4 A TWO-STAGE DETECTION

SYSTEM TO REDUCE THE

CALCULATION COST OF

EXISTING IDS

In previous section, we demonstrated that the detec-

tion technique using

h vectors can discriminate worm

ﬂows from non-worm ﬂows with lower calculation

cost than conventional methods. However, in order

to calculate

h of worm ﬂows and generate signature

vectors, some sample ﬂows of worms are necessary.

In this section, we propose a two-stage worm detec-

tion system which consists of the proposed method

in Section 3 as the ﬁrst stage and the signature string

based detection method as the second stage. The sec-

ond stage sends sample ﬂows to the ﬁrst stage in order

to generate signature vectors (Figure 3).

Figure 3 depicts the components of the proposed

two-stage worm detection system. In the proposed

WEBIST 2007 - International Conference on Web Information Systems and Technologies

416

Figure 3: A two-stage worm detection system.

system, the ﬁrst stage detects worms by using signa-

ture vectors at a low calculation cost. The remaining

ﬂows are sent to the second stage as suspicious ﬂows.

In the second stage, worms are detected from the sus-

picious ﬂows. Because the number of ﬂows analyzed

during the second stage is signiﬁcantly reduced by

the ﬁrst stage analysis, the total calculation cost of

the proposed system is lower than that of the conven-

tional detection system. At least one sample of worm

ﬂow is required for the proposed technique to gener-

ate a signature vector. Worms detected in the second

stage are used as sample ﬂows. The system conducts

the process only when the number of the same kind

of sample ﬂows exceeds a constant number or if ﬁxed

time passes from the last process in order to reduce

the signature vector generating cost.

In the same environment as Section 3.3, we eval-

uated the detection performance of two-stage system

using Netsky.P worm. The number of signature vec-

tors ﬁnally generated was three. Moreover, when the

number of sample ﬂows was 40 or more, all 314 ﬂows

could be detected in the ﬁrst stage. Consequently, the

analysis of the 314 Netsky.P ﬂows by the second stage

would not be conducted, the calculation cost could be

reduced by the proposed system.

5 CONCLUSIONS AND FUTURE

WORK

In this paper, we proposed a time efﬁcient and low

cost worm detection system. The proposed worm de-

tection method evaluates ﬂow similarity by a vector

based on the appearance probability of the byte code

of ﬂow payloads. The evaluation experiment showed

that the method achieves a high detection accuracy

while signiﬁcantly reducing the calculation cost dur-

ing detection. We also proposed the worm detection

system which uses the above-mentioned method as

the ﬁrst stage and existing IDS as the second stage.

Through evaluation experiment, we showed that the

proposed system is a highly accurate and a low-cost

worm detection system.

Future work is to use the proposed method alone

to achieve low-cost worm detection. At the same

time, a high accuracy is required. A distributed

scheme, as introduced by (Staniford et al., 2002),

where signatures are shared amongst networks can

further enhance the effectiveness of the proposed

scheme.

REFERENCES

Akritidis, P., Anagnostakis, K., and Markatos, E. P. (2005).

Efﬁcient content-based detection of zero-day worms.

In Proceedings of the International Conference on

Communications (I CC 2005).

Bleeding Edge Threats (2004).

http://www.

bleedingsnort.com

Kim, H. and Karp, B. (2004). Autograph: toward auto-

mated, distributed worm signature detec tion. In Pro-

ceedings of the 13th USENIX Security Symposium.

Kruegel, C., Toth, T., and Kirda, E. (2002). Service speciﬁc

anomaly detection for network intrusion dete ction. In

Symposium on Applied Computing (SAC).

Newsome, J., James, B., Karp, B., and Song, D. (2005).

Polygraph: Automatically generating signatures for

polymorphic worms. In Proceedings of the 2005 IEEE

Symposium on Security and Pri vacy. IEEE Computer

Society.

Simkhada, K., Tsunoda, H., Waizumi, Y., and Nemoto, Y.

(2005). Differencing worm ﬂows and normal ﬂows for

automatic genera tion of worm signatures. In Proceed-

ings of the Seventh IEEE International Symposium on

Mu ltimedia (ISM).

Singh, S., Estan, C., Varghese, G., and Savage, S. (2004).

Automated worm ﬁngerprinting. In Proceedings of the

6th ACM/USENIX Symposium on Operating System

Design and Implementation (OSDI).

Snort (1998).

http://www.snort.org

Staniford, S., Paxson, V., and Weaver, N. (2002). How to

0wn the Internet in your spare time. In Proceedings of

the 11th USENIX Security Symposium.

Tsuji, M., Waizumi, Y., Tsunoda, H., and Nemoto, Y.

(2005). Detecting worms based on similarity of ﬂow

payloads. In IEICE Tech. Rep. NS2005-112, pages 9–

12.

Waizumi, Y., Tsuji, M., and Nemoto, Y. (2005). A de-

tection technique of epidemic worms using clustering

of p acket payload. In IEICE Tech. Rep. CS2005-19,

pages 19–24.

Wang, K., Cretu, G., and Stolfo, S. (2005). Anomalous

payload-based worm detection and signature genera-

tion.˙In Proceedings of the Eighth International Sym-

posium on Recent Adva nces in Intrusion Detection.

Yaneza, J. L. A., Mantes, C., and Avena, E. (2005). The

Trend of Malware Today: Annual Virus Round-up and

2005 Forecast. Trend Micro.

A LOW COST WORM DETECTION TECHNIQUE BASED ON FLOW PAYLOAD SIMILARITY

417