Impact on Bayesian Networks Classifiers When Learning from Imbalanced Datasets

M. Julia Flores; José A. Gámez

doi:10.5220/0005201103820389

Impact on Bayesian Networks Classifiers When Learning from Imbalanced Datasets

M. Julia Flores, José A. Gámez

2015

Abstract

In this paper we present a study on the behaviour of some representative Bayesian Networks Classifiers (BNCs), when the dataset they are learned from presents imbalanced data, that is, there are far fewer cases labelled with a particular class value than with the other ones (assuming binary classification problems). This is a typical source of trouble in some datasets, and the development of more robust techniques is currently very important. In this study, we have selected a benchmark of 129 imbalanced datasets, and performed an analytical approach focusing on BNCs. Our results show good performance of these classifiers, that outperform decision trees (C4.5). Finally, an algorithm to improve the performance of any BNC is also given. We have carried out an experimentation where we show how the using of oversampling of the minority class to achieve the desired value for the imbalance ratio (IR), which is the division of the number of cases for the majority class by the cases of the minority class. From this work we can conclude that BNCs show a very good performance for imbalanced datasets, and that our proposal enhance their results for those datasets that provided poor results.

References

Breiman, L. (1998). Arcing classifiers. Annals of Statistics, 26:801-823.
Chawla, N., Bowyer, K., Hall, L. O., and Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling TEchnique. Journal of Artificial Intelligence Research (JAIR), 16:321-357.
Fayyad, U. M. and Irani, K. B. (1993). Multi-interval discretization of continuous valued attributes for classification learning. In Thirteenth International Joint Conference on Artificial Intelligence, volume 2, pages 1022-1027. Morgan Kaufmann Publishers.
Flores, M. J., Gámez, J. A., and Martínez., A. M. (2012). Supervised classification with Bayesian networks: A review on models and applications., chapter 5, pages 72-102. IGI Global.
Flores, M. J., Gámez, J. A., Martínez, A. M., and Puerta, J. M. (2011). Handling numeric attributes when comparing bayesian network classifiers: does the discretization method matter? Applied Intelligence, 34(3):372-385.
Huang, J. and Ling, C. X. (2005). Using AUC and accuracy in evaluating learning algorithms. IEEE Transactions on Knowledge and Data Engineering, 17(3):299-310.
Kononenko, I. (1991). Semi-naive bayesian classifier. In Machine Learning EWSL-91, volume 482 of Lecture Notes in Computer Science, pages 206-219.
Korb, K. B. and Nicholson, A. E. (2010). Bayesian artificial intelligence. Chapman & Hall/CRC, 2nd edition.
Lopez, V., Fernandez, A., Garcia, S., Palade, V., and Herrera, F. (2013). An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Information Sciences, 250(0):113 - 141.
Sun, Y. (2007). Cost-sensitive boosting for classification of imbalanced data. PhD thesis, Department of Electrical and Computer Engineering, University of Waterloo.
Sun, Y., Kamel, M. S., Wong, A. K. C., and Wang, Y. (2007). Cost-sensitive boosting for classification of imbalanced data. Pattern Recognition, 40(12):3358- 3378.
Wasikowski, M. and wen Chen, X. (2010). Combating the small sample class imbalance problem using feature selection. Knowledge and Data Engineering, IEEE Transactions on, 22(10):1388-1400.
Webb, G. I., Boughton, J. R., and Wang, Z. (2005). Not so naive bayes: Aggregating one-dependence estimators. Machine Learning, 58(1):5-24.

Download

Paper Citation

in Harvard Style

Flores M. and Gámez J. (2015). Impact on Bayesian Networks Classifiers When Learning from Imbalanced Datasets . In Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, ISBN 978-989-758-074-1, pages 382-389. DOI: 10.5220/0005201103820389

in Bibtex Style

@conference{icaart15,
author={M. Julia Flores and José A. Gámez},
title={Impact on Bayesian Networks Classifiers When Learning from Imbalanced Datasets},
booktitle={Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,},
year={2015},
pages={382-389},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005201103820389},
isbn={978-989-758-074-1},
}

in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,
TI - Impact on Bayesian Networks Classifiers When Learning from Imbalanced Datasets
SN - 978-989-758-074-1
AU - Flores M.
AU - Gámez J.
PY - 2015
SP - 382
EP - 389
DO - 10.5220/0005201103820389