PROTOTYPE SELECTION IN IMBALANCED DATA FOR DISSIMILARITY REPRESENTATION - A Preliminary Study

Mónica Millán Giraldo; Vicente García; J. Salvador Sánchez

doi:10.5220/0003795502420247

PROTOTYPE SELECTION IN IMBALANCED DATA FOR DISSIMILARITY REPRESENTATION - A Preliminary Study

Mónica Millán Giraldo, Vicente García, J. Salvador Sánchez

2012

Abstract

In classification problems, the dissimilarity representation has shown to be more robust than using the feature space. In order to build the dissimilarity space, a representation set of r objects is used. Several methods have been proposed for the selection of a suitable representation set that maximizes the classification performance. A recurring and crucial challenge in pattern recognition and machine learning refers to the class imbalance problem, which has been said to hinder the performance of learning algorithms. In this paper, we carry out a preliminary study that pursues to investigate the effects of several prototype selection schemes when data set are imbalanced, and also to foresee the benefits of selecting the representation set when the class imbalance is handled by resampling the data set. Statistical analysis of experimental results using Friedman test demonstrates that the application of resampling significantly improve the performance classification.

References

and Chandola, V., Banerjee, A., and Kumar, V. (2009). Anomaly detection: A survey. ACM Computing Surveys, 41:15:1-15:58.
Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P. (2002). SMOTE: synthetic minority oversampling technique. Journal of Artificial Intelligence Research, 16:321-357.
Dems?ar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7:1-30.
Duda, R. O., Hart, P. E., and Stork, D. G. (2001). Pattern Classification (2nd Edition). Wiley-Interscience, 2 edition.
Fatourechi, M., Ward, R., Mason, S., Huggins, J., Schlogl, A., and Birch, G. (2008). Comparison of evaluation metrics in classification applications with imbalanced datasets. In Proc. 7th International Conference on Machine Learning and Applications, pages 777-782, San Diego, CA.
Fernández, A., García, S., and Herrera, F. (2011). Addressing the classification with imbalanced data: Open problems and new challenges on class distribution. In Corchado, E., Kurzynski, M., and Wozniak, M., editors, Hybrid Artificial Intelligent Systems, volume 6678 of Lecture Notes in Computer Science, pages 1- 10.
Frank, A. and Asuncion, A. (2010). UCI machine learning repository.
Jacobs, D. W., Weinshall, D., and Gdalyahu, Y. (2000). Classification with non-metric distances: Image retrieval and class representation. IEEE Trans. on Pattern Analysis and Machine Intelligence, 22(6):583- 600.
Kamal, A. H. M., Zhu, X., and Narayanan, R. (2009). Gene selection for microarray expression data with imbalanced sample distributions. In Proc. International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing, pages 3-9, Shanghai, China.
Koknar-Tezel, S. and Latecki, L. (2011). Improving SVM classification on imbalanced time series data sets with ghost points. Knowledge and Information Systems, 28:1-23.
Kubat, M. and Matwin, S. (1997). Addressing the curse of imbalanced training sets: one-sided selection. In Proc. 14th International Conference on Machine Learning, pages 179-186, Nashville, TN.
Liao, T. W. (2008). Classification of weld flaws with imbalanced class data. Expert Systems with Applications, 35(3):1041 - 1052.
Lozano, M., Sotoca, J. M., Sánchez, J. S., Pla, F., Pkalska, E., and Duin, R. P. W. (2006). Experimental study on prototype optimisation algorithms for prototype-based classification in vector spaces. Pattern Recognition, 39:1827-1838.
Pekalska, E. and Duin, R. P. W. (2002a). Dissimilarity representations allow for building good classifiers. Pattern Recognition Letters, 23(8):943-956.
Pekalska, E. and Duin, R. P. W. (2002b). Prototype selection for finding efficient representations of dissimilarity data. In Proc. 16th International Conference on Pattern Recognition, volume 3, pages 37-40, Quebec, Canada.
Pekalska, E. and Duin, R. P. W. (2005). The Dissimilarity Representation for Pattern Recognition: Foundations And Applications. World Scientific Publishing Company.
Pekalska, E. and Duin, R. P. W. (2006). Dissimilarity-based classification for vectorial representations. In Proc. 18th International Conference on Pattern Recognition, volume 3, pages 137-140, Hong Kong.
Pekalska, E., Duin, R. P. W., and Paclik, P. (2006). Prototype selection for dissimilarity-based classifiers. Pattern Recognition, 39(2):189-208.
Pekalska, E., Paclik, P., and Duin, R. P. W. (2002). A generalized kernel approach to dissimilarity-based classification. Journal of Machine Learning Research, 2:175-211.
Sousa, A., Mendonca, A., and Campilho, A. (2008a). Dissimilarity-based classification of chromatographic profiles. Pattern Analysis & Applications, 11:409- 423.
Sousa, A., Mendonca, A., and Campilho, A. (2008b). Minimizing the imbalance problem in chromatographic profile classification with one-class classifiers. In Campilho, A. and Kamel, M., editors, Image Analysis and Recognition, volume 5112 of Lecture Notes in Computer Science, pages 413-422.

Download

Paper Citation

in Harvard Style

Millán Giraldo M., García V. and Salvador Sánchez J. (2012). PROTOTYPE SELECTION IN IMBALANCED DATA FOR DISSIMILARITY REPRESENTATION - A Preliminary Study . In Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-8425-98-0, pages 242-247. DOI: 10.5220/0003795502420247

in Bibtex Style

@conference{icpram12,
author={Mónica Millán Giraldo and Vicente García and J. Salvador Sánchez},
title={PROTOTYPE SELECTION IN IMBALANCED DATA FOR DISSIMILARITY REPRESENTATION - A Preliminary Study},
booktitle={Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2012},
pages={242-247},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003795502420247},
isbn={978-989-8425-98-0},
}

in EndNote Style

TY - CONF
JO - Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - PROTOTYPE SELECTION IN IMBALANCED DATA FOR DISSIMILARITY REPRESENTATION - A Preliminary Study
SN - 978-989-8425-98-0
AU - Millán Giraldo M.
AU - García V.
AU - Salvador Sánchez J.
PY - 2012
SP - 242
EP - 247
DO - 10.5220/0003795502420247