A Hybrid Solution for Imbalanced Classification Problems - Case Study on Network Intrusion Detection

Camelia Lemnaru, Andreea Tudose-Vintila, Andrei Coclici, Rodica Potolea

Abstract

Imbalanced classification problems represent a current challenge for the application of data mining techniques to real-world problems, since learning algorithms are biased towards favoring the majority class(es). The present paper proposes a compound classification architecture for dealing with imbalanced multi-class problems. It comprises of a two-level classification system: a multiple classification model on the first level, which combines the predictions of several binary classifiers, and a supplementary classification model, specialized on identifying “difficult” cases, which is currently under development. Particular attention is allocated to the pre-processing step, with specific data manipulation operations included. Also, a new prediction combination strategy is proposed, which applies a hierarchical decision process in generating the output prediction. We have performed evaluations using an instantiation of the proposed model applied to the field of network intrusion detection. The evaluations performed on a dataset derived from the KDD99 data have indicated that our method yields a superior performance for the minority classes to other similar systems from literature, without degrading the overall performance.

References

  1. Breunig, M., Kriegel, H., P., Ng, R., Sander, J., 2000. LOF: identifying density-based local outliers. Proceedings of the 2000 ACM SIGMOD international conference on Management of data, vol. 29, no. 2, pp. 93-104.
  2. Chawla, N. V., Bowyer, K. W., Hall, L. O., Kegelmeyer, W. P., 2002. SMOTE: Synthetic Minority OverSampling Technique. Journal of Artificial Intelligence Research, vol. 16, pp. 321-357.
  3. Elkan, C., 2000. Results of the KDD'99 Clasiffier learnig. SIGKDD Exploration, vol.1, no.2, pp. 63-64.
  4. Galar, M., Fernandez, A., Barrenechea, E., Bustince, Herrera, F., 2011. A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches. IEE transctions Systems, Man, and Cybernetics, Part C: Applications and Reviews, vol.42, no.4, pp. 463-484.
  5. Gogoi, P., Borah, B., Bhattacharyya, D., K., 2010. Anomaly Detection Analysis of Intrusion Data using Supervised & Unsupervised Approach. Journal of Convergence Information Technology, vol. 5, no. 1, pp. 95-110.
  6. He, H., Garcia, E. A., 2009. Learning from Imbalanced Data. IEEE Transactions on Knowledge And Data Engineering, vol. 21, no. 9, pp. 1263-1284.
  7. Kristopher, K., 1999. A Database of Computer Attacks for the Evaluation of Intrusion Detection Systems. Master of Engineering on Electrical Engineering and Computer Science, MIT.
  8. Seni, G., Elder, J., Grossman, R., 2010. Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions. Morgan & Claypool Publishers.
  9. Tavallaee, M,. Bagheri, M., Wei, L., Ghorbani, A. A., 2009. A Detailed Analysis of the KDD CUP 99 Data Set. Proceedings of the IEEE Symposium on Computational Intalligence in Security and Defense Applications, pp. 53-58.
  10. Weiss, G. M., Provost, F., 2003. Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction. Journal of Artificial Intelligence Research 19, pp. 315-354.
  11. Witten, I. H., Frank, E., 2011. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco, 3rd edition.
Download


Paper Citation


in Harvard Style

Lemnaru C., Tudose-Vintila A., Coclici A. and Potolea R. (2012). A Hybrid Solution for Imbalanced Classification Problems - Case Study on Network Intrusion Detection . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2012) ISBN 978-989-8565-29-7, pages 348-352. DOI: 10.5220/0004142803480352


in Bibtex Style

@conference{kdir12,
author={Camelia Lemnaru and Andreea Tudose-Vintila and Andrei Coclici and Rodica Potolea},
title={A Hybrid Solution for Imbalanced Classification Problems - Case Study on Network Intrusion Detection},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2012)},
year={2012},
pages={348-352},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004142803480352},
isbn={978-989-8565-29-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2012)
TI - A Hybrid Solution for Imbalanced Classification Problems - Case Study on Network Intrusion Detection
SN - 978-989-8565-29-7
AU - Lemnaru C.
AU - Tudose-Vintila A.
AU - Coclici A.
AU - Potolea R.
PY - 2012
SP - 348
EP - 352
DO - 10.5220/0004142803480352