Authors:
Nathan F. Garcia
;
Rômulo A. Strzoda
;
Giancarlo Lucca
and
Eduardo N. Borges
Affiliation:
Centro de Ciências Computacionais, Universidade Federal do Rio Grande, Rio Grande, Brazil
Keyword(s):
Machine Learning, Data Imbalance, Supervised Classification.
Abstract:
In the machine learning field, there are many classification algorithms. Each algorithm performs better in certain scenarios, which are very difficult to define. There is also the concept of grouping multiple classifiers, known as ensembles, which aim to increase the model generalization capacity. Comparing multiple models is costly, as, for certain cases, training classifiers can take a long time. In the literature, many aspects of the data have already been studied to help in the task of classifier selection, such as measures of diversity among classifiers that form an ensemble, data complexity measures, among others. In this context, the main objective of this work is to analyze class imbalance and how this measure can be used to guide the selection of classifiers. We also compare the model’s performances when using class balancing techniques such as oversampling and undersampling.