In conclusion, oversampling techniques provide a
valuable approach to address class imbalance in ma-
chine learning. Nevertheless, their effectiveness can
be hindered by the quality of synthetic data gen-
erated during the oversampling process. To over-
come this limitation, the proposed filtering oversam-
pling method selectively filters out unrealistic syn-
thetic data, thereby enhancing the performance of ma-
chine learning models on imbalanced datasets. This
leads to improved performance on real-world datasets
as the model becomes less reliant on predicting syn-
thetic instances and gains better generalization capa-
bilities beyond the synthetic data distribution.
For future research, promising directions include
incorporating explainability and interpretability as-
pects into the filtering oversampling method. Devel-
oping techniques to understand the impact of filtered
synthetic data on the model’s decision-making pro-
cess can enhance insights and prediction trustworthi-
ness. Additionally, extending the research to multi-
class classification problems, beyond initial binary
classification tasks, will assess the method’s effective-
ness across a broader range of scenarios.
We aim to advance the understanding and capa-
bilities of handling imbalanced datasets by pursuing
these future research directions, ultimately enhancing
the performance of machine learning models in real-
world applications.
This work was partially funded by the French Na-
tional Re- search Agency (ANR) through the ABiMed
Project [grant number ANR-20-CE19-0017-02].
