Dataset Balancing in Disease Prediction
Vincenza Carchiolo, Michele Malgeri
2024
Abstract
The utilization of machine learning in the prevention of serious diseases such as cancer or heart disease is increasingly crucial. Various studies have demonstrated that enhanced forecasting performance can significantly extend patients’ life expectancy. Naturally, having sufficient datasets is vital for employing techniques to classify the clinical situation of patients, facilitating predictions regarding disease onset. However, available datasets often exhibit imbalances, with more records featuring positive metrics than negative ones. Hence, data preprocessing assumes a pivotal role. In this paper, we aim to assess the impact of machine learning and SMOTE (Synthetic Minority Over-sampling Technique) methods on prediction performance using a given set of examples. Furthermore, we will illustrate how the selection of an appropriate SMOTE process significantly enhances performance, as evidenced by several metrics. Nonetheless, in certain instances, the effect of SMOTE is scarcely noticeable, contingent upon the dataset and machine learning methods employed.
DownloadPaper Citation
in Harvard Style
Carchiolo V. and Malgeri M. (2024). Dataset Balancing in Disease Prediction. In Proceedings of the 13th International Conference on Data Science, Technology and Applications - Volume 1: DATA; ISBN 978-989-758-707-8, SciTePress, pages 293-300. DOI: 10.5220/0012755700003756
in Bibtex Style
@conference{data24,
author={Vincenza Carchiolo and Michele Malgeri},
title={Dataset Balancing in Disease Prediction},
booktitle={Proceedings of the 13th International Conference on Data Science, Technology and Applications - Volume 1: DATA},
year={2024},
pages={293-300},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012755700003756},
isbn={978-989-758-707-8},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 13th International Conference on Data Science, Technology and Applications - Volume 1: DATA
TI - Dataset Balancing in Disease Prediction
SN - 978-989-758-707-8
AU - Carchiolo V.
AU - Malgeri M.
PY - 2024
SP - 293
EP - 300
DO - 10.5220/0012755700003756
PB - SciTePress