Dataset Balancing in Disease Prediction

Vincenza Carchiolo, Michele Malgeri

2024

Abstract

The utilization of machine learning in the prevention of serious diseases such as cancer or heart disease is increasingly crucial. Various studies have demonstrated that enhanced forecasting performance can significantly extend patients’ life expectancy. Naturally, having sufficient datasets is vital for employing techniques to classify the clinical situation of patients, facilitating predictions regarding disease onset. However, available datasets often exhibit imbalances, with more records featuring positive metrics than negative ones. Hence, data preprocessing assumes a pivotal role. In this paper, we aim to assess the impact of machine learning and SMOTE (Synthetic Minority Over-sampling Technique) methods on prediction performance using a given set of examples. Furthermore, we will illustrate how the selection of an appropriate SMOTE process significantly enhances performance, as evidenced by several metrics. Nonetheless, in certain instances, the effect of SMOTE is scarcely noticeable, contingent upon the dataset and machine learning methods employed.

Download


Paper Citation


in Harvard Style

Carchiolo V. and Malgeri M. (2024). Dataset Balancing in Disease Prediction. In Proceedings of the 13th International Conference on Data Science, Technology and Applications - Volume 1: DATA; ISBN 978-989-758-707-8, SciTePress, pages 293-300. DOI: 10.5220/0012755700003756


in Bibtex Style

@conference{data24,
author={Vincenza Carchiolo and Michele Malgeri},
title={Dataset Balancing in Disease Prediction},
booktitle={Proceedings of the 13th International Conference on Data Science, Technology and Applications - Volume 1: DATA},
year={2024},
pages={293-300},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012755700003756},
isbn={978-989-758-707-8},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 13th International Conference on Data Science, Technology and Applications - Volume 1: DATA
TI - Dataset Balancing in Disease Prediction
SN - 978-989-758-707-8
AU - Carchiolo V.
AU - Malgeri M.
PY - 2024
SP - 293
EP - 300
DO - 10.5220/0012755700003756
PB - SciTePress