A CLASS SPECIFIC DIMENSIONALITY REDUCTION FRAMEWORK FOR CLASS IMBALANCE PROBLEM: CPC SMOTE

T. Maruthi Padmaja, Bapi S. Raju, P. Radha Krishna

Abstract

The performance of the conventional classification algorithms deteriorates due to the class imbalance problem, which occurs when one class of data severely outnumbers the other class. On the other hand the data dimensionality also plays a crucial role in performance deterioration of classification algorithms. Principal Component Analysis (PCA) is a widely used technique for dimensionality reduction. Due to unsupervised nature of PCA, it is not adequate enough to hold class discriminative information for classification problems. In case of unbalanced datasets the occurrence of minority class samples are rare or obtaining them are costly. Moreover, the misclassification cost associated with minority class samples is higher than non-minority class samples. Capturing and validating labeled samples, particularly minority class samples, in PCA subspace is an important issue. We propose a class specific dimensionality reduction and oversampling framework named CPC SMOTE to address this issue. The framework is based on combining class specific PCA subspaces to hold informative features from minority as well as majority class and oversample the combined class specific PCA subspace to compensate lack of data problem. We evaluated the proposed approach using 1 simulated and 5 UCI repository datasets. The evaluation show that the framework is effective when compared to PCA and SMOTE preprocessing methods.

References

  1. Chawla, N. V. Bowyer, K. H. L. and Kegelmeyer, W. (2002). Smote: Synthetic minority over-sampling technique. In Journal of Artificial Intelligence Research.16.
  2. Drummond, C. and Holte, R. (2003). C4.5, class imbalance,and cost sensitivity: Why under-sampling beats over-sampling. In In Workshop on Learning from Imbalanced Data Sets II.
  3. Duda, R.O. Hart, P. and Strok, D. (2001). Pattern classification and scene analysis. Wiley.
  4. Estabrooks, A. Jo, T. and Japkowicz, N. (2004). A multiple resampling method for learning from imbalances data sets. In Computational Intelligence.20.
  5. Japkowicz, N. and Stephen, S. (2002). The class imbalance problem: Systematic study. In Intelligent Data Analysis Journal 6(5).
  6. Kubat, M. and Matwin, S. (1997). Addressing the curse of imbalanced data sets: One-sided sampling. In ICML'04,Proceedings of the Fourteenth International Conference on Machine Learning.
  7. Phua, C. Damminda, A. and Lee, V. (2004). Minority report in fraud detection: Classification of skewed data. In Special Issue on Imbalanced Data Sets 6(1). ACM Sigkdd Explorations.
  8. Vaswani, N. and Chellappa, R. (2006). Principal component null space analysis for image and video classification. In IEEE Trans.Image Processing.
  9. Villalba, S. and Cunningham, P. (2008). An evaluation of dimension reduction techniques for one-class classification. In Artificial Intelligence Review, 27(4).
  10. Weiss, G. and Provost, F. (2003). Learning when training data are costly: The effect of class distribution on tree induction. In Journal of Artificial Intelligence Research.19.
  11. Xudong, J. (2009). Asymmetric principal component and discriminant analyses for pattern classification. In IEEE Trans. Pattern Analysis and Machine Intelligence 31(5).
  12. Yoon, K. and Kwek, S. (2007). A data reduction approach for resolving the imbalanced data issue in functional genomics. In Neural Computing and Applications 16(3).
Download


Paper Citation


in Harvard Style

Maruthi Padmaja T., S. Raju B. and Radha Krishna P. (2010). A CLASS SPECIFIC DIMENSIONALITY REDUCTION FRAMEWORK FOR CLASS IMBALANCE PROBLEM: CPC SMOTE . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010) ISBN 978-989-8425-28-7, pages 237-242. DOI: 10.5220/0003092502370242


in Bibtex Style

@conference{kdir10,
author={T. Maruthi Padmaja and Bapi S. Raju and P. Radha Krishna},
title={A CLASS SPECIFIC DIMENSIONALITY REDUCTION FRAMEWORK FOR CLASS IMBALANCE PROBLEM: CPC SMOTE},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010)},
year={2010},
pages={237-242},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003092502370242},
isbn={978-989-8425-28-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010)
TI - A CLASS SPECIFIC DIMENSIONALITY REDUCTION FRAMEWORK FOR CLASS IMBALANCE PROBLEM: CPC SMOTE
SN - 978-989-8425-28-7
AU - Maruthi Padmaja T.
AU - S. Raju B.
AU - Radha Krishna P.
PY - 2010
SP - 237
EP - 242
DO - 10.5220/0003092502370242