On different types of data, our method shows in most
cases that the discretized and reduced version of the
data is suited for better classification performance.
The proposed technique allocates a variable num-
ber of bits per feature, showing that many features
reach its maximum possible mutual information with
the class label vector, using only a few bits. Thus, our
method is also suitable for explainability purposes as-
sessing the importance of a feature, given by the allo-
cated number of bits per feature. Some features only
require a binary representation (presence or absence
information) while other features demand more bits
for their accurate representation to maximize the mu-
tual information with the class label.
As future work directions, we aim to fine tune our
method to specific types of data. We also plan to ex-
plore R
enyi and Tsallis definitions of entropy and mu-
tual information and to fine tune their free parameters.
A Mutual Information Based Discretization-Selection Technique