DEALING WITH IMBALANCED PROBLEMS - Issues and Best Practices

Rodica Potolea, Camelia Lemnaru

2010

Abstract

An imbalanced problem is one in which, in the available data, one class is represented by a smaller number of instances compared to the other classes. The drawbacks induced by the imbalance are analyzed and possible solutions for overcoming these issues are presented. In dealing with imbalanced problems, one should consider a wider context, taking into account the imbalance rate, together with other data-related particularities and the classification algorithms with their associated parameters.

References

  1. Batista, G., Prati R., and Monard M., 2004, A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data. SIGKDD Explorations, 6:20- 24, Volume 6, Issue 1, pp. 20-29.
  2. Chan, P., and Stolfo, S., 1998, Toward scalable learning with non-uniform class and cost distributions: a case study in credit card fraud detection. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, Menlo Park, CA: AAAI Press, pp. 164-168
  3. Chawla, N.V., Japkowicz, N. and Kolcz, A., 2004, Editorial: special issue on learning from I imbalanced data sets, SIGKDD Explorations Special Issue on Learning from Imbalanced Datasets 6 (1), pp. 1-6.
  4. Chawla, N. V., 2006 Data Mining from Imbalanced Data Sets, Data Mining and Knowledge Discovery Handbook, chapter 40, Springer US, pp. 853-867.
  5. Grzymala-Busse, J. W., Stefanowski, J., and Wilk, S., 2005, A comparison of two approaches to data mining from imbalanced data, Journal of Intelligent Manufacturing, 16, 2005 Springer Science+Business Media, Inc. Manufactured in The Netherlands, pp. 565-573.
  6. Hall, L. O., and Joshi, A., 2005, Building Accurate Classifiers from Imbalanced Data Sets, IMACS'05.
  7. Hall, M., et.alt., 2009, The WEKA Data Mining Software; SIGKDD Explorations, Volume 11, Issue 1.
  8. Holte, R. C., Acker, L. E., and Porter., B. W., 1989, Concept learning and the problem of small disjuncts. In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence, pp. 813-818.
  9. Japkowicz, N., 2000, The Class Imbalance Problem: Significance and Strategies, in Proceedings of the 2000 International Conference on Artificial Intelligence (IC-AI'2000), pp. 111-117.
  10. Japkowicz, N., and Stephen, S., 2002, The Class Imbalance Problem: A Systematic Study, Intelligent Data Analysis Journal, Volume 6, Number 5, November 2002, pp. 429 - 449.
  11. Potolea, R., and Lemnaru, C, 2010, The class imbalance problem: experimental study and a solution, paper submitted for ECMLPKDD 2010.
  12. UCI Machine Learning Data Repository, 2010, http://archive.ics.uci.edu/ml/, last accessed Jan. 2010.
  13. Vidrighin Bratu, C., Muresan T., and Potolea, R., 2008, Improving Classification Accuracy through Feature Selection, in Proceedings of the 4th IEEE International Conference on Intelligent Computer Communication and Processing, ICCP 2008, pp. 25- 32.
  14. Visa, S., and Ralescu, A., 2005, Issues in mining imbalanced data sets -a review paper, in Proceedings of the Sixteen Midwest Artificial Intelligence and Cognitive Science Conference, pp. 67-73.
  15. Weiss, G., and Provost, F., 2003, Learning when Training Data are Costly: The Effect of Class Distribution on Tree Induction. Journal of Artificial Intelligence Research 19, pp. 315-354.
  16. Weiss, G., 2004, Mining with rarity: A unifying framework, SIGKDD Explorations 6(1), pp. 7-19.
Download


Paper Citation


in Harvard Style

Potolea R. and Lemnaru C. (2010). DEALING WITH IMBALANCED PROBLEMS - Issues and Best Practices . In Proceedings of the 12th International Conference on Enterprise Information Systems - Volume 2: ICEIS, ISBN 978-989-8425-05-8, pages 443-446. DOI: 10.5220/0003019604430446


in Bibtex Style

@conference{iceis10,
author={Rodica Potolea and Camelia Lemnaru},
title={DEALING WITH IMBALANCED PROBLEMS - Issues and Best Practices},
booktitle={Proceedings of the 12th International Conference on Enterprise Information Systems - Volume 2: ICEIS,},
year={2010},
pages={443-446},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003019604430446},
isbn={978-989-8425-05-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 12th International Conference on Enterprise Information Systems - Volume 2: ICEIS,
TI - DEALING WITH IMBALANCED PROBLEMS - Issues and Best Practices
SN - 978-989-8425-05-8
AU - Potolea R.
AU - Lemnaru C.
PY - 2010
SP - 443
EP - 446
DO - 10.5220/0003019604430446