4.3 Discussion and future work
We consider clustering to be an appropriate technique
for our purposes as it does not require the use of a la-
beled data set for training. As such, it is more appro-
priate for the analysis of IDS alerts (or those produced
in abundance from the logs of network devices) where
the outcome of the suspicious activity cannot be clas-
sified as malicious (real-positive) or non-malicious
(false-positive) a priori. A known limitation of the k-
means technique is that it is sensitive to extreme val-
ues or outliers (Han and Kamber, 2000). In our case,
we consider this fact to be beneficial and our aim is to
explore this property to cluster anomalous behaviour
that is infrequent within the network traffic.
Our next steps are to perform further experiments
to confirm whether the assumption presented in Sec-
tion 4.2 is valid and under what conditions. To this
end, we propose to use results from Honeypot ex-
periments to classify combinations of alerts known to
be associated with malicious behaviour and compare
these to the results obtained from clustering. Further-
more, future work will incorporate alerts from a wide-
variety of network devices (firewalls, routers, appli-
cation servers, etc.) and we expect that such a richer
information base will yield more accurate correlation
rules (as discussed in Section 2.3).
The time parameter used for the “connection” de-
finition, i.e. the allowed time interval between two
subsequent alerts (Section 4.1), also needs to be ex-
plored. Values other than the 10 minutes employed
in the first experiment need to be tested to determine
how this influences the clusters obtained. It is logical
that the smaller this value is, the more difficult it will
be to catch attacks perpetrated at a slower rate. On the
other hand, if this paramenter is too large, it may in-
duce the creation of “connections” of unrelated alerts
that happen to fall within this window.
Finally, we plan to test whether repeatedly apply-
ing the clustering algorithm to the smaller clusters is
beneficial to providing a more detailed determination
of the inter-relationships between the alerts. The al-
ternative approach would be to choose a bigger value
for the parameter k (the number of clusters).
5 CONCLUSION
We have discussed the limitations of IDSs and how
the correlation of events from various network ele-
ments can be used to improve the intrusion detection
rate and reduce the number of false-positives. As the
generation of correlation rules is a time-consuming
task that requires expert knowledge, semi-automatic
techniques that can assist this process are clearly de-
sirable. To this end, we propose the use of data min-
ing to separate frequent (false-positives) from non-
frequent behaviour. In succession, logical expressions
for correlation rules can be written focusing solely on
the latter. Our initial results suggest that this tech-
nique is promising, though more tests are needed to
formulate a precise conclusion.
REFERENCES
Axelsson, S. (1999). The base-rate fallacy and its implica-
tions for the difficulty of intrusion detection. In Pro-
ceedings of the 6th ACM Conference on Computer and
Communications Security, pages 1–7.
Brugger, S. T. (2004). Data mining methods
for network intrusion detection. http:
//www.bruggerink.com/
∼
zow/papers/
brugger
dmnid survey.pdf.
Burns, L., Hellerstein, J. L., Ma, S., Peng, C. S., Raben-
horst, D. A., and Taylor, D. (2000). A systematic ap-
proach to discovering correlation rules for event man-
agement. IBM Research Report RC 21847, IBM.
Debar, H., Curry, D., and Feinstein, B. (2005).
The intrusion detection message exchange for-
mat. http://www.ietf.org/internet-drafts/draft-ietf-
idwg-idmef-xml-14.txt.
Drew, S. (2003). Intrusion detection faq – what is the role
of security event correlation in intrusion detection?
http://www.sans.org/resources/idfaq/role.php.
Han, J. and Kamber, M. (2000). Data Mining: Concepts
and Techniques. Morgan Kaufmann.
Jiang, G. and Cybenko, G. (2004). Temporal and spatial
distributed event correlation for network security. In
Proc. of American Control Conf., pages 996–1001.
Kreibich, C. and Crowcroft, J. (2004). Honeycomb - cre-
ating intrusion detection signatures using honeypots.
SIGCOMM Comput. Commun. Rev., 34(1):51–56.
Manganaris, S., Christensen, M., Zerkle, D., and Her-
miz, K. (2000). A data mining analysis of rtid
alarms. Computer Networks: The International Jour-
nal of Computer and Telecommunications Network-
ing, 34(4):571–577.
Morin, B. and Debar, H. (2003). Correlation of intrusion
symptoms: an application of chronicles. In RAID
2003, volume 2820 of LNCS, pages 94–112. Springer.
Ning, P., Cui, Y., and Reeves, D. S. (2002). Analyzing in-
tensive intrusion alerts via correlation. In RAID 2002,
volume 2516 of LNCS, pages 74–94. Springer.
Yemini, S. A., Kliger, S., Mozes, E., Yemini, Y., and Ohsie,
D. (1996). High speed and robust event correlation.
IEEE Communications Magazine, 34(5):82–90.
Yin, X., Lakkaraju, K., Li, Y., and Yurcik, W. (2003). Se-
lecting log data sources to correlate attack traces for
computer network security: Preliminary results. In
Proc. of the 11th Intl. Conference on Telecommunica-
tion Systems, Modeling and Analysis (ICTSM11).
ICETE 2005 - SECURITY AND RELIABILITY IN INFORMATION SYSTEMS AND NETWORKS
380