presents the minority class.
Our experiments consist on balancing the two
classes of each data set by the use of the four studied
under-sampling methods, i.e. RR, RS, RF and RC.
Then we evaluate the performance of the three
classifiers on the balanced data sets.
Our results show that performance obtained on
DSMR is better than that obtained on DSPo. This
proves that the more the data set is unbalanced the
more the results are bad.
As a comparison between under-sampling
methods, we can say that, generally, the four
methods give near results. But iQn most of cases RR
yields the best results. RF is not recommended for
NB, it is rather recommended for SVM. For kNN,
we do not recommend to use RS.
As future works, we look for performing the
same experiments on unbalanced data sets that are
more homogeneous so as to validate our hypothesis
about the impact of heterogeneity on the
performance of the proposed techniques. We will
also study the effectiveness of the four under-
sampling methods by decreasing progressively
majority class size. On one hand, we aim to see
whether it is necessary to achieve a balance of 50%-
50% to have the best results. On the other hand, we
aim to observe the behaviour of our classifiers, by
using the different under-sampling methods, toward
the different steps of majority class decreasing.
Finally, we have as perspective too the study of
feature selection techniques on unbalanced data sets
of SA.
