
 
acceptable identification rate on the normal packets. 
 
 
Figure 5: Variation of FP and FN with the number of 
attributes for Random Forest, Probe module. 
Table 1: TP, FN for different voting strategies. 
 
DoS Probe R2L U2R Normal 
TP FN TP FN TP FN TP FN TP  FN 
Maj 22587 4969 3814 3179 41 556 1  30 40401 4 
Avg 22709 4847 3793 3200 43 554 1  30 40402 3 
Max 27535  21  6761 232 462 135  5  26 40393 12 
Med 6  0 22 0 25 0 11 0  0 40354
Prod 26869  20  3720  216 4  134  0  26 40393 11 
Cas 27556  0  6981  12 593 4  28 3 39954 451 
 
Although in our results we have not yet included 
the  Level 2 classifier, our preliminary experiments 
indicate a Local Outlier Factor (LOF) (Breunig, 
2000) approach to be the most promising. This 
method is appropriate because normal points tend to 
group into clusters of homogeneous density, whereas 
attacks appear as outliers. 
Evaluating the Overall System: All the 
configurations previously identified are employed to 
build the current version of our system. The results 
obtained by evaluating the fully configured system 
on the test dataset can be seen in the first column of 
the Table 2. The results obtained by our system have 
been compared to other systems evaluated on the 
KDD’99 dataset. Our system yields significant 
improvements in the detection of minority classes 
compared to the other systems (Gogoi, 2010); 
(Elkan, 2000): 90% correctly labeled instances for 
the R2L class and 85% for the U2R. 
Table 2: Recognition rates/classes. 
 Our System 
KDD 
Winner 
Catsub FCM 
SVM 
+DGSOT
DoS 97%  97% 100% 99% 97% 
Probe 100%  83%  37% 93%  91%
R2L 90%  8%  82% 83% 43% 
U2R 85%  13%  0% 0%  23%
Normal 89%  99%  82%  96%  95% 
4 CONCLUSIONS 
To tackle imbalanced problems, we propose a two-
step hybrid classification model combined with a 
pre-processing stage. In the first stage, a multiple 
classifier which combines the predictions of several 
binary base models is used. On the second one, an 
additional classifier is employed, specialized on 
classifying difficult instances. We also propose a 
cascaded prediction combination approach, in which 
the binary predictors are ranked and output their 
predictions in turn, up to the point where a positive 
identification is made. 
We have applied our proposed system to a 
network intrusion detection problem. We have 
compared the results obtained by our system with 
previous results on the same problem. We show that 
our system achieves significantly higher 
identification rates for the least represented classes 
than the rest of the systems, without degrading the 
identification of majority classes considerably. 
REFERENCES 
Breunig, M., Kriegel, H., P., Ng, R., Sander, J., 2000. 
LOF: identifying density-based local outliers. 
Proceedings of the 2000 ACM SIGMOD international 
conference on Management of data, vol. 29, no. 2, pp. 
93-104. 
Chawla, N. V., Bowyer, K. W., Hall, L. O., Kegelmeyer, 
W. P., 2002. SMOTE: Synthetic Minority Over-
Sampling Technique. Journal of Artificial Intelligence 
Research, vol. 16, pp. 321-357. 
Elkan, C., 2000. Results of the KDD’99 Clasiffier learnig. 
SIGKDD Exploration, vol.1, no.2, pp. 63-64. 
Galar, M., Fernandez, A., Barrenechea, E., Bustince, 
Herrera, F., 2011. A Review on Ensembles for the 
Class Imbalance Problem: Bagging-, Boosting-, and 
Hybrid-Based Approaches. IEE transctions Systems, 
Man, and Cybernetics, Part C: Applications and 
Reviews, vol.42, no.4, pp. 463-484. 
Gogoi, P., Borah, B., Bhattacharyya, D., K., 2010. 
Anomaly Detection Analysis of Intrusion Data using 
Supervised & Unsupervised Approach. Journal of 
Convergence Information Technology, vol. 5, no. 1, 
pp. 95-110.  
He, H., Garcia, E. A., 2009. Learning from Imbalanced 
Data.  IEEE Transactions on Knowledge And Data 
Engineering, vol. 21, no. 9, pp. 1263-1284. 
Kristopher, K., 1999. A Database of Computer Attacks for 
the Evaluation of Intrusion Detection Systems. Master 
of Engineering on Electrical Engineering and 
Computer Science, MIT. 
Seni, G., Elder, J., Grossman, R., 2010. Ensemble Methods 
in Data Mining: Improving Accuracy Through 
AHybridSolutionforImbalancedClassificationProblems-CaseStudyonNetworkIntrusionDetection
351