8 CONCLUSION AND
PERSPECTIVES
In conclusion, oversampling techniques provide a
valuable approach to address class imbalance in ma-
chine learning. Nevertheless, their effectiveness can
be hindered by the quality of synthetic data gen-
erated during the oversampling process. To over-
come this limitation, the proposed filtering oversam-
pling method selectively filters out unrealistic syn-
thetic data, thereby enhancing the performance of ma-
chine learning models on imbalanced datasets. This
leads to improved performance on real-world datasets
as the model becomes less reliant on predicting syn-
thetic instances and gains better generalization capa-
bilities beyond the synthetic data distribution.
For future research, promising directions include
incorporating explainability and interpretability as-
pects into the filtering oversampling method. Devel-
oping techniques to understand the impact of filtered
synthetic data on the model’s decision-making pro-
cess can enhance insights and prediction trustworthi-
ness. Additionally, extending the research to multi-
class classification problems, beyond initial binary
classification tasks, will assess the method’s effective-
ness across a broader range of scenarios.
We aim to advance the understanding and capa-
bilities of handling imbalanced datasets by pursuing
these future research directions, ultimately enhancing
the performance of machine learning models in real-
world applications.
ACKNOWLEDGEMENTS
This work was partially funded by the French Na-
tional Re- search Agency (ANR) through the ABiMed
Project [grant number ANR-20-CE19-0017-02].
REFERENCES
Bunkhumpornpat, C., Sinapiromsaran, K., and Lursinsap,
C. (2009). Safe-level-smote: Safe-level-synthetic mi-
nority over-sampling technique for handling the class
imbalanced problem. In Pacific-Asia Conference on
Knowledge Discovery and Data Mining, pages 475–
482.
Camino, R., Hammerschmidt, C., and State, R. (2018).
Generating multi-categorical samples with generative
adversarial networks. In ICML 2018 Workshop on
Theoretical Foundations and Applications of Deep
Generative Models, pages 1–7.
Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer,
W. P. (2002). Smote: Synthetic minority over-
sampling technique. Journal of Artificial Intelligence
Research, 16(1):321–357.
Chen, C., Liaw, A., and Breiman, L. (2004). Using random
forest to learn imbalanced data. University of Califor-
nia, Berkeley, 110:24–31.
Choi, E., Biswal, S., Malin, B., Duke, J., Stewart, W. F.,
and Sun, J. (2017). Generating multi-label discrete pa-
tient records using generative adversarial networks. In
Machine Learning for Healthcare Conference, pages
286–305.
Drummond, C. and Holte, R. (2003). C4.5, class imbalance,
and cost sensitivity: Why under-sampling beats over-
sampling. In Proceedings of the ICML’03 Workshop
on Learning from Imbalanced Datasets.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B.,
Warde-Farley, D., Ozair, S., Courville, A., and Ben-
gio, Y. (2014). Generative adversarial nets. In Neural
Information Processing Systems, pages 2672–2680.
Han, H., Wang, W. Y., and Mao, B. H. (2005). Borderline-
smote: A new over-sampling method in imbalanced
data sets learning. In International Conference on In-
telligent Computing, pages 878–887.
He, H., Bai, Y., Garcia, E. A., and Li, S. (2008). Adasyn:
Adaptive synthetic sampling approach for imbalanced
learning. In 2008 IEEE International Joint Confer-
ence on Neural Networks (IEEE World Congress on
Computational Intelligence), pages 1322–1328.
He, H. and Garcia, E. (2009). Learning from imbalanced
data. IEEE Transactions on Knowledge and Data En-
gineering, 21(9):1263–1284.
Huang, G. B., Zhu, Q. Y., and Siew, C. K. (2006). Extreme
learning machine: Theory and applications. Neuro-
computing, 70(1-3):489–501.
Mirza, M. and Osindero, S. (2014). Conditional generative
adversarial nets.
Powers, D. (2011). Evaluation: From precision, recall and
f-factor to roc, informedness, markedness and corre-
lation. Journal of Machine Learning Technologies,
2(1):37–63.
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B.,
and Lee, H. (2016). Generative adversarial text to im-
age synthesis. In International Conference on Ma-
chine Learning, volume 48, pages 1060–1069.
Rodr
´
ıguez-Torres, F., Mart
´
ınez-Trinidad, J. F., and
Carrasco-Ochoa, J. A. (2022). An oversampling
method for class imbalance problems on large
datasets. Applied Sciences, 12(7):3424.
Tarawneh, S., Al-Betar, M. A., and Mirjalili, S. (2022). Stop
oversampling for class imbalance learning: A review.
IEEE Transactions on Neural Networks and Learning
Systems, 33(2):340–354.
Zhang, Y., Gan, Z., Fan, K., Chen, Z., Henao, R., Shen, D.,
and Carin, L. (2017). Adversarial feature matching
for text generation. In International Conference on
Machine Learning, pages 4006–4015.
ICAART 2024 - 16th International Conference on Agents and Artificial Intelligence
298