“Glass” and “Pima”, changing α induces some jitter
on the AUC. However, there is neither an increasing
nor a decreasing trend—all results have the same or-
der of magnitude. We hence conclude that small α
values suffice and that our method is not overly sensi-
tive w.r.t. specific choices of α. Hence, the answer to
question Q1 is in the affirmative, and fix α = 3% for
the other experiments, i.e., no optimization over α is
performed.
4.3.2 Classification Performance
The results of our large comparison of classification
methods is shown in Table 2. In most real-world ap-
plications, the F1-score of the minority class (“Min.”)
is of exceptional importance. Not surprisingly, in
almost all cases, the baseline methods DT, RF and
GBM are outperformed by methods for imbalanced
data. There is no clear winner among the state-of-the-
art undersampling, oversampling, or SMOTE tech-
niques. It depends on the particular data set which
strategy delivers the best F1-score for the minor-
ity class. It is thus especially remarkable that our
method achieves the best and second best F
1
-score—
depending on the choice of quality measures Q —for
the minority class on five of six data sets. Only de-
cision tree UnderBagging achieves a better minority
F1-score on the “Pima” data. Nevertheless, note that
our proposed methods exhibits a lower standard er-
ror. Thus, a “typical” run of our method might out-
perform DT+UBA. In terms of majority (“Maj.”) F1-
score, our method is outperformed in 5 of 6 cases.
However, in two cases, the best result is delivered by
a plain random forest without any technique to han-
dle class imbalance—the corresponding minority F1-
score is thus far from optimal. The other three cases
are led by RF + oversampling, RF + SMOTE, and RF
+ Adaboost—all three are variants of the random for-
est. However, the decline majority F1-score of our
proposed method is in four of five cases below 3 per-
cent. Assuming that F1-score on the minority class
is the most important measure, we answer Q2 in the
affirmative.
5 CONCLUSION
In this paper, we presented a new ensemble method
for classification in the presence of imbalanced
classes. We started by reviewing the state-of-the-art
in the area of classification with imbalanced classes.
Real data sets are always finite which leads to a cor-
ruption of empirical class frequencies. Moreover, we
proved that these corruptions are unlikely to be large
in case of class imbalance and proposed a method
to correct for small corruptions. The correction is
performed by generating specialized validation sets
which correspond to different scenarios. Each vali-
dation set may then be used to induce an ensemble
of classifiers. We discussed how the classifiers for
different scenarios can be combined into an ensem-
ble and proposed different choices for the ensemble
weights. In an experimental demonstration, we val-
idated our theoretical findings, and showed that our
method outperforms several state-of-the-art methods
in terms of F
1
-score. Since our insights about class
imbalance and erroneous empirical class frequencies
are completely new, our work may serve as the basis
for multiple new research directions.
REFERENCES
Barandela, R., Valdovinos, R., and S
´
anchez, J. (2003). New
applications of ensembles of classifiers. Pattern Anal-
ysis & Applications, 6(3):245–256.
Batista, G. E., Carvalho, A. C., and Monard, M. C. (2000).
Applying one-sided selection to unbalanced datasets.
In Proc. of MICAI, pages 315–325. Springer.
Breiman, L. (2001). Random forests. Machine Learning,
45(1):5–32.
Breiman, L., Friedman, J., Stone, C. J., and Olshen, R. A.
(1984). Classification and regression trees. CRC
press.
Cateni, S., Colla, V., and Vannucci, M. (2014). A method
for resampling imbalanced datasets in binary classifi-
cation tasks for real-world problems. Neurocomput-
ing, 135:32 – 41.
Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer,
W. P. (2002). SMOTE: synthetic minority over-
sampling technique. Journal of Artificial Intelligence
Research, 16:321–357.
Chawla, N. V., Lazarevic, A., Hall, L. O., and Bowyer,
K. W. (2003). Smoteboost: Improving prediction of
the minority class in boosting. In Proc. of the 7th
PKDD, pages 107–119.
Foucart, S. and Rauhut, H. (2013). A Mathematical Intro-
duction to Compressive Sensing. Springer New York.
Freund, Y. and Schapire, R. E. (1997). A decision-theoretic
generalization of on-line learning and an application
to boosting. Journal of Computer and System Sci-
ences, 55(1):119–139.
Freund, Y., Schapire, R. E., et al. (1996). Experiments with
a new boosting algorithm. In Proc. of ICML 1996,
volume 96, pages 148–156.
Friedman, J. H. (2001). Greedy function approximation: A
gradient boosting machine. The Annals of Statistics,
29(5):1189–1232.
Galar, M., Fernandez, A., Barrenechea, E., Bustince, H.,
and Herrera, F. (2012). A review on ensembles for
the class imbalance problem: bagging-, boosting-, and
hybrid-based approaches. IEEE Transactions on Sys-
ICPRAM 2019 - 8th International Conference on Pattern Recognition Applications and Methods
872