Authors:
Vincenza Carchiolo
and
Michele Malgeri
Affiliation:
Dip. Ingegneria Elettrica Elettronica Informatica (DIEEI), Università di Catania, Via Santa Sofia 64, Catania, Italy
Keyword(s):
Machine Learning, Data Analysis, Health Informatics.
Abstract:
The utilization of machine learning in the prevention of serious diseases such as cancer or heart disease is increasingly crucial. Various studies have demonstrated that enhanced forecasting performance can significantly extend patients’ life expectancy. Naturally, having sufficient datasets is vital for employing techniques to classify the clinical situation of patients, facilitating predictions regarding disease onset. However, available datasets often exhibit imbalances, with more records featuring positive metrics than negative ones. Hence, data preprocessing assumes a pivotal role. In this paper, we aim to assess the impact of machine learning and SMOTE (Synthetic Minority Over-sampling Technique) methods on prediction performance using a given set of examples. Furthermore, we will illustrate how the selection of an appropriate SMOTE process significantly enhances performance, as evidenced by several metrics. Nonetheless, in certain instances, the effect of SMOTE is scarcely not
iceable, contingent upon the dataset and machine learning methods employed.
(More)