SMOTE, SMMO and SVM-LWLR. These ones al-
lows us to increase the performance of the classifiers,
that is, it helps to correctly classify more minority
class examples. Future work include several tasks,
such as characterizing the potential benefits of over-
sampling methods and developing heuristics to deter-
mine, given a data set, the amount of over-sampling
that is likely to produce the best results; testing the
method in other real-world applications, for example,
biological structures, and morphological galaxy clas-
sification, where the imbalanced class problem is very
common.
ACKNOWLEDGEMENTS
First author wants to thank PROMEP for supporting
this research work under grant UPPUE-PTC-023.
REFERENCES
Akbani, R., Kwek, S., and Japkowicz, N. (2004). Applying
support vector machines to imbalanced datasets. In
Proceedings of the European Conference on Machine
Learning (ECML), pages 39–50.
Burges, C. (1998). A tutorial on support vector machines
for pattern recognition. Data Mining and Knowledge
Discovery, 2(2):121–167.
Chawla, N., Bowyer, K., Hall, L., and Kegelmeyer, P.
(2002). Smote: Synthetic minority over-sampling
technique. Journal of Artificial Intelligence Research,
16:321–357.
Chawla, N., Lazarevik, A., Hall, L., and Bowyer, K. (2003).
Smoteboost: Improving prediction of the minority
class in boosting. In Proceedings of the seventh
European Conference on Principles and Practice of
Knowledge Discovery in Databases (PKDD), pages
107–119.
de-la Calleja, J. and Fuentes, O. (2007). Automated
star/galaxy discrimination in multispectral wide-field
images. In Proceedings of the Second International
Conference on Computer Vision and Applications
(VISAPP), pages 155–160, Barcelona, Spain.
Domingos, P. (1999). Metacost: A general method for mak-
ing classifiers cost-sensitive. In Knowledge Discovery
and Data Mining, pages 155–164.
Fawcett, T. and Provost, F. (1996). Combining data mining
and machine learning for effective user profile. In Pro-
ceedings of the 2nd International Conference Knowl-
edge Discovery and Data Mining (PKDD), pages 8–
13.
Han, H., Wang, W., and Mao, B. (2005). Borderline-smote:
A new over-sampling method in imbalanced data sets
learning. In Proceedings of ICIC, pages 878–887.
Japkowicz, N. (2000). The class imbalance problem: Sig-
nificance and strategies. In Proceedings of the 2000
International Conference on Artificial Intelligence
(IC-AI’2000): Special Track on Inductive Learning,
pages 111–117.
Japkowicz, N., Myers, C., and Gluck, M. (1995). A nov-
elty detection approach to classification. In Proceed-
ings of the Fourteen Joint Conference on Artificial In-
teligence, pages 518–523.
Kubat, M., Holte, R., and Matwin, S. (1998). Machine
learning for the detection of oil spills in satellite radar
images. Machine Learning, 30:195–215.
Kubat, M. and Matwin, S. (1997). Addressing the curse of
imbalanced training sets: One sided selection. In Pro-
ceedings of the Fourteenth International Conference
on Machine Learning (ICML), pages 179–186.
Liu, Y., An, A., and Huang, X. (2006). Boosting predicion
accuracy on imbalanced datasets with svm ensembles.
In Proceedings of PAKDD, LNAI, number 3918, pages
107–118.
Mitchell, T. (1997). Machine Learning. Prentice Hall.
Pazzani, M., Merz, C., Murphy, P., Ali, K., Hume, T., and
Brunk, C. (1994). Reducing misclassification costs. In
Proceedings of the Eleventh International Conference
on Machine Learning (ICML), pages 217–225.
Riddle, P., Secal, R., and Etzioni, O. (1994). Representation
design and bruteforce induction in a boeing manufac-
turing domain. Applied Artificial Intelligence, 8:125–
147.
Zheng, Z., Wu, X., and Srihari, R. (2004). Feature selection
for text categorization on imbalanced data. In Pro-
ceedings of the SIGKDD Explorations, pages 80–89.
KDIR 2009 - International Conference on Knowledge Discovery and Information Retrieval
310