amination of the sum of ranks reveals that the weight-
ing variation was still preferred in combination with
bagging and Naive Bayes. This means that there is
no universally best variation – as it is expected in the
field of classification, where universally best classifier
cannot exist.
Further work will include using our methods with
more classification algorithms, to determine what
kind of algorithms work better with sampling or
weighting and how to choose appropriate variation.
Future work could also include the usage of other
metrics for evaluation in our fitness function.
REFERENCES
Angiulli, F. (2005). Fast condensed nearest neighbor rule.
In Proceedings of the 22Nd International Conference
on Machine Learning, ICML ’05, pages 25–32, New
York, NY, USA. ACM.
Bezdek, J. C. and Kuncheva, L. I. (2001). Nearest proto-
type classifier designs: An experimental study. Inter-
national Journal of Intelligent Systems, 16(12):1445–
1473.
Breiman, L. (1996). Bagging predictors. Machine learning,
24(2):123–140.
Cano, A., Zafra, A., and Ventura, S. (2013). Weighted data
gravitation classification for standard and imbalanced
data. Cybernetics, IEEE Transactions on, 43(6):1672–
1687.
Cano, J. R., Herrera, F., and Lozano, M. (2003). Using
evolutionary algorithms as instance selection for data
reduction in kdd: an experimental study. Evolutionary
Computation, IEEE Transactions on, 7(6):561–575.
Cateni, S., Colla, V., and Vannucci, M. (2014). A method
for resampling imbalanced datasets in binary classifi-
cation tasks for real-world problems. Neurocomput-
ing, 135(0):32 – 41.
Chou, C.-H., Kuo, B.-H., and Chang, F. (2006). The gen-
eralized condensed nearest neighbor rule as a data re-
duction method. In Pattern Recognition, 2006. ICPR
2006. 18th International Conference on, volume 2,
pages 556–559. IEEE.
Dietterich, T. G. (2000). Ensemble methods in machine
learning. In Multiple classifier systems, pages 1–15.
Springer.
Freund, Y., Schapire, R. E., et al. (1996). Experiments with
a new boosting algorithm. In ICML, volume 96, pages
148–156.
Garca-Pedrajas, N. and Prez-Rodrguez, J. (2012). Multi-
selection of instances: A straightforward way to im-
prove evolutionary instance selection. Applied Soft
Computing, 12(11):3590 – 3602.
Holland, J. H. (1992). Adaptation in natural and artificial
systems: an introductory analysis with applications to
biology, control, and artificial intelligence. MIT press.
Japkowicz, N. (2000). The class imbalance problem: Sig-
nificance and strategies. In Proc. of the Intl Conf. on
Artificial Intelligence. Citeseer.
Japkowicz, N. and Stephen, S. (2002). The class imbalance
problem: A systematic study intelligent data analysis.
John, G. H. and Langley, P. (1995). Estimating continuous
distributions in bayesian classifiers. In Proceedings
of the Eleventh conference on Uncertainty in artificial
intelligence, pages 338–345. Morgan Kaufmann Pub-
lishers Inc.
Kim, K.-j. (2006). Artificial neural networks with evo-
lutionary instance selection for financial forecasting.
Expert Systems with Applications, 30(3):519–526.
Kotsiantis, S. and Pintelas, P. (2003). Mixture of expert
agents for handling imbalanced data sets. Annals of
Mathematics, Computing & Teleinformatics, 1(1):46–
55.
Kubat, M. and Matwin, S. (1997). Addressing the curse of
imbalanced data sets: One sided sampling. In Proc. of
the Int’l Conf. on Machine Learning.
Kuncheva, L. I. and Bezdek, J. C. (1998). Nearest proto-
type classification: clustering, genetic algorithms, or
random search? Systems, Man, and Cybernetics, Part
C: Applications and Reviews, IEEE Transactions on,
28(1):160–164.
Lichman, M. (2013). UCI machine learning repository.
Lindenbaum, M., Markovitch, S., and Rusakov, D. (2004).
Selective sampling for nearest neighbor classifiers.
Machine learning, 54(2):125–152.
Liu, H. (2010). Instance selection and construction for data
mining. Springer-Verlag.
Liu, J.-F. and Yu, D.-R. (2007). A weighted rough set
method to address the class imbalance problem. In
Machine Learning and Cybernetics, 2007 Interna-
tional Conference on, volume 7, pages 3693–3698.
Liu, X.-Y., Li, Q.-Q., and Zhou, Z.-H. (2013). Learning
imbalanced multi-class data with optimal dichotomy
weights. In Data Mining (ICDM), 2013 IEEE 13th
International Conference on, pages 478–487.
Olvera-L
´
opez, J. A., Carrasco-Ochoa, J. A., Mart
´
ınez-
Trinidad, J. F., and Kittler, J. (2010). A review of
instance selection methods. Artificial Intelligence Re-
view, 34(2):133–143.
Quinlan, J. R. (1993). C4.5: programs for machine learn-
ing. Elsevier.
Stefanowski, J. and Wilk, S. (2008). Selective pre-
processing of imbalanced data for improving classi-
fication performance. In Song, I.-Y., Eder, J., and
Nguyen, T., editors, Data Warehousing and Knowl-
edge Discovery, volume 5182 of Lecture Notes in
Computer Science, pages 283–292. Springer Berlin
Heidelberg.
Ting, K. M. (2002). An instance-weighting method to in-
duce cost-sensitive trees. Knowledge and Data Engi-
neering, IEEE Transactions on, 14(3):659–665.
Tsai, C.-F., Eberle, W., and Chu, C.-Y. (2013). Ge-
netic algorithms in feature and instance selection.
Knowledge-Based Systems, 39(0):240 – 247.
Wilson, D. R. and Martinez, T. R. (2000). Reduction tech-
niques for instance-based learning algorithms. Ma-
chine learning, 38(3):257–286.
Zhao, H. (2008). Instance weighting versus threshold ad-
justing for cost-sensitive classification. Knowledge
and Information Systems, 15(3):321–334.
Weighting and Sampling Data for Individual Classifiers and Bagging with Genetic Algorithms
187