On different types of data, our method shows in most
cases that the discretized and reduced version of the
data is suited for better classification performance.
The proposed technique allocates a variable num-
ber of bits per feature, showing that many features
reach its maximum possible mutual information with
the class label vector, using only a few bits. Thus, our
method is also suitable for explainability purposes as-
sessing the importance of a feature, given by the allo-
cated number of bits per feature. Some features only
require a binary representation (presence or absence
information) while other features demand more bits
for their accurate representation to maximize the mu-
tual information with the class label.
As future work directions, we aim to fine tune our
method to specific types of data. We also plan to ex-
plore R
´
enyi and Tsallis definitions of entropy and mu-
tual information and to fine tune their free parameters.
REFERENCES
Alipoor, G., Mirbagheri, S., Moosavi, S., and Cruz, S.
(2022). Incipient detection of stator inter-turn short-
circuit faults in a doubly-fed induction generator using
deep learning. IET Electric Power Applications.
Biba, M., Esposito, F., Ferilli, S., Di Mauro, N., and Basile,
T. (2007). Unsupervised discretization using kernel
density estimation. In International Joint Conference
on Artificial Intelligence (IJCAI), pages 696–701.
Bishop, C. (1995). Neural Networks for Pattern Recogni-
tion. Oxford University Press.
Chamlal, H., Ouaderhman, T., and Rebbah, F. (2022). A hy-
brid feature selection approach for microarray datasets
using graph theoretic-based method. Information Sci-
ences, 615:449–474.
Cover, T. and Thomas, J. (2006). Elements of information
theory. John Wiley & Sons, second edition.
Dhal, P. and Azad, C. (2022). A comprehensive survey on
feature selection in the various fields of machine learn-
ing. Applied Intelligence, 52(4):4543–45810.
Dougherty, J., Kohavi, R., and Sahami, M. (1995). Super-
vised and unsupervised discretization of continuous
features. In Proceedings of the International Confer-
ence Machine Learning (ICML), pages 194–202.
Duda, R., Hart, P., and Stork, D. (2001). Pattern classifica-
tion. John Wiley & Sons, second edition.
Fayyad, U. and Irani, K. (1993). Multi-interval discretiza-
tion of continuous-valued attributes for classification
learning. In Proceedings of the International Joint
Conference on Uncertainty in AI, pages 1022–1027.
Ferreira, A. and Figueiredo, M. (2012). Efficient feature
selection filters for high-dimensional data. Pattern
Recognition Letters, 33(13):1794 – 1804.
Ferreira, A. and Figueiredo, M. (2013). Relevance and mu-
tual information-based feature discretization. In Pro-
ceedings of the International Conference on Pattern
Recognition Applications and Methods (ICPRAM),
Barcelona, Spain.
Frank, E., Hall, M., and Witten, I. (2016). The WEKA Work-
bench. Online Appendix for ”Data Mining: Practical
Machine Learning Tools and Techniques”. Morgan
Kaufmann, fourth edition.
Garc
´
ıa, S., Luengo, J., S
´
aez, J. A., L
´
opez, V., and Herrera,
F. (2013). A survey of discretization techniques: Tax-
onomy and empirical analysis in supervised learning.
IEEE Transactions on Knowledge and Data Engineer-
ing, 25:734–750.
Guyon, I. and Elisseeff, A. (2003). An introduction to vari-
able and feature selection. Journal of Machine Learn-
ing Research (JMLR), 3:1157–1182.
Guyon, I., Gunn, S., Nikravesh, M., and Zadeh (Editors), L.
(2006). Feature extraction, foundations and applica-
tions. Springer.
Huynh-Cam, T.-T., Nalluri, V., Chen, L.-S., and Yang, Y.-
Y. (2022). IS-DT: A new feature selection method for
determining the important features in programmatic
buying. Big Data and Cognitive Computing, 6(4).
Jeon, Y. and Hwang, G. (2023). Feature selection with
scalable variational gaussian process via sensitivity
analysis based on L2 divergence. Neurocomputing,
518:577–592.
Kurgan, L. and Cios, K. (2004). CAIM discretization al-
gorithm. IEEE Transactions on Knowledge and Data
Engineering (TKDE), 16(2):145–153.
Pudjihartono, N., Fadason, T., Kempa-Liehr, A., and
O’Sullivan, J. (2022). A review of feature selection
methods for machine learning-based disease risk pre-
diction. Frontiers in Bioinformatics, 2:927312.
Ram
´
ırez-Gallego, S., Krawczyk, B., Garc
´
ıa, S., Wo
´
zniak,
M., and Herrera, F. (2017). A survey on data pre-
processing for data stream mining: Current status and
future directions. Neurocomputing, 239:39–57.
Tsai, C.-J., Lee, C.-I., and Yang, W.-P. (2008). A discretiza-
tion algorithm based on class-attribute contingency
coefficient. Information Sciences, 178:714–731.
Witten, I., Frank, E., Hall, M., and Pal, C. (2016). Data min-
ing: practical machine learning tools and techniques.
Morgan Kauffmann, 4th edition.
Yu, L. and Liu, H. (2004). Efficient feature selection via
analysis of relevance and redundancy. Journal of Ma-
chine Learning Research (JMLR), 5:1205–1224.
A Mutual Information Based Discretization-Selection Technique
443