Authors:
T. Maruthi Padmaja
1
;
Bapi S. Raju
2
and
P. Radha Krishna
3
Affiliations:
1
IDRBT Masab Tank and University of Hyderabad, India
;
2
University of Hyderabad, India
;
3
Infosys Technologies Ltd, India
Keyword(s):
Class imbalance problem, Principle component analysis, SMOTE, Decision tree.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Computational Intelligence
;
Evolutionary Computing
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Machine Learning
;
Soft Computing
;
Symbolic Systems
Abstract:
The performance of the conventional classification algorithms deteriorates due to the class imbalance problem, which occurs when one class of data severely outnumbers the other class. On the other hand the data dimensionality also plays a crucial role in performance deterioration of classification algorithms. Principal Component Analysis (PCA) is a widely used technique for dimensionality reduction. Due to unsupervised nature of PCA, it is not adequate enough to hold class discriminative information for classification problems. In case of unbalanced datasets the occurrence of minority class samples are rare or obtaining them are costly. Moreover, the misclassification cost associated with minority class samples is higher than non-minority class samples. Capturing and validating labeled samples, particularly minority class samples, in PCA subspace is an important issue. We propose a class specific dimensionality reduction and oversampling framework named CPC SMOTE to address this issu
e. The framework is based on combining class specific PCA subspaces to hold informative features from minority as well as majority class and oversample the combined class specific PCA subspace to compensate lack of data problem. We evaluated the proposed approach using 1 simulated and 5 UCI repository datasets. The evaluation show that the framework is effective when compared to PCA and SMOTE preprocessing methods.
(More)