A Simple Classification Method for Class Imbalanced Data using the Kernel Mean

Yusuke Sato, Kazuyuki Narisawa, Ayumi Shinohara

2014

Abstract

Support vector machines (SVMs) are among the most popular classification algorithms. However, whereas SVMs perform efficiently in a class balanced dataset, their performance declines for class imbalanced datasets. The fuzzy SVMfor class imbalance learning (FSVM-CIL) is a variation of the SVMtype algorithm to accommodate class imbalanced datasets. Considering the class imbalance, FSVM-CIL associates a fuzzy membership to each example, which represents the importance of the example for classification. Based on FSVM-CIL, we present a simple but effective method here to calculate fuzzy memberships using the kernel mean. The kernel mean is a useful statistic for consideration of the probability distribution over the feature space. Our proposed method is simpler than preceding methods because it requires adjustment of fewer parameters and operates at reduced computational cost. Experimental results show that our proposed method is promising.

References

  1. Akbani, R., Kwek, S., and Japkowicz, N. (2004). Applying support vector machines to imbalanced datasets. In Proc. of ECML, pages 39-50.
  2. Bache, K. and Lichman, M. (2013). UCI machine learning repository.
  3. Batuwita, R. and Palade, V. (2009). micropred: effective classification of pre-mirnas for human mirna gene prediction. Bioinformatics, 25(8):989-995.
  4. Batuwita, R. and Palade, V. (2010). FSVM-CIL: Fuzzy support vector machines for class imbalance learning. Trans. Fuz Sys., 18(3):558-571.
  5. Bertinet, A. and Agnan, T. C. (2004). Reproducing Kernel Hilbert Spaces in Probability and Statistics. Kluwer Academic Publishers.
  6. Burges, C. J. C. (1998). A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov., 2(2):121-167.
  7. Chang, C.-C. and Lin, C.-J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1-27:27.
  8. Chun-Fu, L. and Sheng-De, W. (2002). Fuzzy support vector machines. IEEE Transactions on Neural Networks, 13(2):464-471.
  9. Chun-Fu, L. and Sheng-De, W. (2004). Training algorithms for fuzzy support vector machines with noisy data. Pattern Recognition Letters, 25(14):1647-1656.
  10. Cristianini, N., Kandola, J., Elisseeff, A., and ShaweTaylor, J. (2002). On kernel-target alignment. In Advances in NIPS 14, pages 367-373.
  11. Duda, R. O. and Hart, P. E. (1973). Pattern Classification and Scene Analysis. John Wiley & Sons Inc.
  12. Fukumizu, K., Bach, F. R., and Jordan, M. I. (2009). Kernel dimension reduction in regression. The Annals of Statistics, 37(4):1871-1905.
  13. Fukumizu, K., Song, L., and Gretton, A. (2013). Kernel Bayes' rule: Bayesian inference with positive definite kernels. Journal of Machine Learning Research, 14:3753-3783.
  14. He, H. and Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21(9):1263-1284.
  15. He, H. and Ma, Y. (2013). Imbalanced Learning: Foundations, Algorithms, and Applications. Wiley-IEEE Press, 1st edition.
  16. Jiang, X., Yi, Z., and Lv, J. (2006). Fuzzy SVM with a new fuzzy membership function. Neural Computing & Applications, 15(3-4):268-276.
  17. Mohri, M., Rostamizadeh, A., and Talwalkar, A. (2012). Foundations of Machine Learning. The MIT Press.
  18. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825-2830.
  19. Vapnik, V. N. (1995). The nature of statistical learning theory. Springer-Verlag New York, Inc.
  20. Veropoulos, K., Campbell, C., and Cristianini, N. (1999). Controlling the sensitivity of support vector machines. In Proc. of IJCAI, pages 55-60.
  21. Yan, D., Liu, X., and Zou, L. (2013). Probability fuzzy support vector machines. International Journal of Innovative Computing, Information and Control, 9(7):3053-3060.
Download


Paper Citation


in Harvard Style

Sato Y., Narisawa K. and Shinohara A. (2014). A Simple Classification Method for Class Imbalanced Data using the Kernel Mean . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2014) ISBN 978-989-758-048-2, pages 327-334. DOI: 10.5220/0005130103270334


in Bibtex Style

@conference{kdir14,
author={Yusuke Sato and Kazuyuki Narisawa and Ayumi Shinohara},
title={A Simple Classification Method for Class Imbalanced Data using the Kernel Mean},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2014)},
year={2014},
pages={327-334},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005130103270334},
isbn={978-989-758-048-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2014)
TI - A Simple Classification Method for Class Imbalanced Data using the Kernel Mean
SN - 978-989-758-048-2
AU - Sato Y.
AU - Narisawa K.
AU - Shinohara A.
PY - 2014
SP - 327
EP - 334
DO - 10.5220/0005130103270334