SMOTE: Are We Learning to Classify or to Detect Synthetic Data?
Nada Boudegzdame, Karima Sedki, Rosy Tspora, Rosy Tspora, Rosy Tspora, Jean-Baptiste Lamy
2024
Abstract
Oversampling algorithms are used as preprocess in machine learning, in the case of highly imbalanced data in an attempt to balance the number of samples per class, and therefore improve the quality of models learned. While oversampling can be effective in improving the performance of classification models on minority classes, it can also introduce several problems. From our work, it came to light that the models learn to detect the noise added by the oversampling algorithms instead of the underlying patterns. In this article, we will define oversampling, and present the most common techniques, before proposing a method for evaluating oversampling algorithms.
DownloadPaper Citation
in Harvard Style
Boudegzdame N., Sedki K., Tspora R. and Lamy J. (2024). SMOTE: Are We Learning to Classify or to Detect Synthetic Data?. In Proceedings of the 16th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART; ISBN 978-989-758-680-4, SciTePress, pages 283-290. DOI: 10.5220/0012325300003636
in Bibtex Style
@conference{icaart24,
author={Nada Boudegzdame and Karima Sedki and Rosy Tspora and Jean-Baptiste Lamy},
title={SMOTE: Are We Learning to Classify or to Detect Synthetic Data?},
booktitle={Proceedings of the 16th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART},
year={2024},
pages={283-290},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012325300003636},
isbn={978-989-758-680-4},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 16th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART
TI - SMOTE: Are We Learning to Classify or to Detect Synthetic Data?
SN - 978-989-758-680-4
AU - Boudegzdame N.
AU - Sedki K.
AU - Tspora R.
AU - Lamy J.
PY - 2024
SP - 283
EP - 290
DO - 10.5220/0012325300003636
PB - SciTePress