acceptable identification rate on the normal packets.
Figure 5: Variation of FP and FN with the number of
attributes for Random Forest, Probe module.
Table 1: TP, FN for different voting strategies.
DoS Probe R2L U2R Normal
TP FN TP FN TP FN TP FN TP FN
Maj 22587 4969 3814 3179 41 556 1 30 40401 4
Avg 22709 4847 3793 3200 43 554 1 30 40402 3
Max 27535 21 6761 232 462 135 5 26 40393 12
Med 6 0 22 0 25 0 11 0 0 40354
Prod 26869 20 3720 216 4 134 0 26 40393 11
Cas 27556 0 6981 12 593 4 28 3 39954 451
Although in our results we have not yet included
the Level 2 classifier, our preliminary experiments
indicate a Local Outlier Factor (LOF) (Breunig,
2000) approach to be the most promising. This
method is appropriate because normal points tend to
group into clusters of homogeneous density, whereas
attacks appear as outliers.
Evaluating the Overall System: All the
configurations previously identified are employed to
build the current version of our system. The results
obtained by evaluating the fully configured system
on the test dataset can be seen in the first column of
the Table 2. The results obtained by our system have
been compared to other systems evaluated on the
KDD’99 dataset. Our system yields significant
improvements in the detection of minority classes
compared to the other systems (Gogoi, 2010);
(Elkan, 2000): 90% correctly labeled instances for
the R2L class and 85% for the U2R.
Table 2: Recognition rates/classes.
Our System
KDD
Winner
Catsub FCM
SVM
+DGSOT
DoS 97% 97% 100% 99% 97%
Probe 100% 83% 37% 93% 91%
R2L 90% 8% 82% 83% 43%
U2R 85% 13% 0% 0% 23%
Normal 89% 99% 82% 96% 95%
4 CONCLUSIONS
To tackle imbalanced problems, we propose a two-
step hybrid classification model combined with a
pre-processing stage. In the first stage, a multiple
classifier which combines the predictions of several
binary base models is used. On the second one, an
additional classifier is employed, specialized on
classifying difficult instances. We also propose a
cascaded prediction combination approach, in which
the binary predictors are ranked and output their
predictions in turn, up to the point where a positive
identification is made.
We have applied our proposed system to a
network intrusion detection problem. We have
compared the results obtained by our system with
previous results on the same problem. We show that
our system achieves significantly higher
identification rates for the least represented classes
than the rest of the systems, without degrading the
identification of majority classes considerably.
REFERENCES
Breunig, M., Kriegel, H., P., Ng, R., Sander, J., 2000.
LOF: identifying density-based local outliers.
Proceedings of the 2000 ACM SIGMOD international
conference on Management of data, vol. 29, no. 2, pp.
93-104.
Chawla, N. V., Bowyer, K. W., Hall, L. O., Kegelmeyer,
W. P., 2002. SMOTE: Synthetic Minority Over-
Sampling Technique. Journal of Artificial Intelligence
Research, vol. 16, pp. 321-357.
Elkan, C., 2000. Results of the KDD’99 Clasiffier learnig.
SIGKDD Exploration, vol.1, no.2, pp. 63-64.
Galar, M., Fernandez, A., Barrenechea, E., Bustince,
Herrera, F., 2011. A Review on Ensembles for the
Class Imbalance Problem: Bagging-, Boosting-, and
Hybrid-Based Approaches. IEE transctions Systems,
Man, and Cybernetics, Part C: Applications and
Reviews, vol.42, no.4, pp. 463-484.
Gogoi, P., Borah, B., Bhattacharyya, D., K., 2010.
Anomaly Detection Analysis of Intrusion Data using
Supervised & Unsupervised Approach. Journal of
Convergence Information Technology, vol. 5, no. 1,
pp. 95-110.
He, H., Garcia, E. A., 2009. Learning from Imbalanced
Data. IEEE Transactions on Knowledge And Data
Engineering, vol. 21, no. 9, pp. 1263-1284.
Kristopher, K., 1999. A Database of Computer Attacks for
the Evaluation of Intrusion Detection Systems. Master
of Engineering on Electrical Engineering and
Computer Science, MIT.
Seni, G., Elder, J., Grossman, R., 2010. Ensemble Methods
in Data Mining: Improving Accuracy Through
AHybridSolutionforImbalancedClassificationProblems-CaseStudyonNetworkIntrusionDetection
351