Relevance and Mutual Information-based Feature Discretization

Artur Ferreira, Mario Figueiredo


In many learning problems, feature discretization (FD) techniques yield compact data representations, which often lead to shorter training time and higher classification accuracy. In this paper, we propose two new FD techniques. The first method is based on the classical Linde-Buzo-Gray quantization algorithm, guided by a relevance criterion, and is able to work in unsupervised, supervised, or semi-supervised scenarios, depending on the adopted measure of relevance. The second method is a supervised technique based on the maximization of the mutual information between each discrete feature and the class label. For both methods, our experiments on standard benchmark datasets show their ability to scale up to high-dimensional data, attaining in many cases better accuracy than other FD approaches, while using fewer discretization intervals.


  1. Battiti, R. (1994). Using mutual information for selecting features in supervised neural net learning. IEEE Transactions on Neural Networks, 5:537-550.
  2. Brown, G., Pocock, A., Zhao, M., and Luján, M. (2012). Conditional likelihood maximisation: A unifying framework for information theoretic feature selection. J. Machine Learning Research, 13:27-66.
  3. Chiu, D., Wong, A., and Cheung, B. (1991). Information discovery through hierarchical maximum entropy discretization and synthesis. In Proc. of the Knowledge Discovery in Databases, pages 125-140.
  4. Cover, T. and Thomas, J. (1991). Elements of information theory. John Wiley & Sons.
  5. Demsar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7:1-30.
  6. Dougherty, J., Kohavi, R., and Sahami, M. (1995). Supervised and unsupervised discretization of continuous features. In Int. Conf. Mac. Learn. (ICML), pages 194-202.
  7. Duin, R., Juszczak, P., Paclik, P., Pekalska, E., Ridder, D., Tax, D., and Verzakov, S. (2007). PRTools4.1: A Matlab Toolbox for Pattern Recognition. Technical report, Delft Univ. Technology.
  8. Fayyad, U. and Irani, K. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. In Proc. Int. Joint Conf. on Art. Intell. (IJCAI), pages 1022-1027.
  9. Ferreira, A. and Figueiredo, M. (2012). An unsupervised approach to feature discretization and selection. Pattern Recognition, 45:3048-3060.
  10. Frank, A. and Asuncion, A. (2010). UCI machine learning repository, available at
  11. Friedman, M. (1940). A comparison of alternative tests of significance for the problem of m rankings. The Annals of Mathematical Statistics, 11(1):86-92.
  12. Garcia, S. and Herrera, F. (2008). An extension on ”statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. Journal of Machine Learning Research, 9:2677-2694.
  13. Hellman, M. (1970). Probability of error, equivocation, and the Chernoff bound. IEEE Transactions on Information Theory, 16(4):368-372.
  14. Jin, R., Breitbart, Y., and Muoh, C. (2009). Data discretization unification. Know. Inf. Systems, 19(1):1-29.
  15. Kononenko, I. (1995). On biases in estimating multi-valued attributes. In Proc. Int. Joint Conf. on Art. Intell. (IJCAI), pages 1034-1040.
  16. Kotsiantis, S. and Kanellopoulos, D. (2006). Discretization techniques: A recent survey. GESTS Int. Trans. on Computer Science and Engineering, 32(1).
  17. Kurgan, L. and Cios, K. (2004). CAIM discretization algorithm. IEEE Trans. on Know. and Data Engineering, 16(2):145-153.
  18. Linde, Y., Buzo, A., and Gray, R. (1980). An algorithm for vector quantizer design. IEEE Trans. on Communications, 28:84-94.
  19. Liu, H., Hussain, F., Tan, C., and Dash, M. (2002). Discretization: An Enabling Technique. Data Mining and Knowledge Discovery, 6(4):393-423.
  20. Santhi, N. and Vardy, A. (2006). On an improvement over Rényi's equivocation bound. In 44-th Annual Allerton Conference on Communication, Control, and Computing.
  21. Statnikov, A., Tsamardinos, I., Dosbayev, Y., and Aliferis, C. F. (2005). GEMS: a system for automated cancer diagnosis and biomarker discovery from microarray gene expression data. Int J Med Inf., 74(7-8):491-503.
  22. Tsai, C.-J., Lee, C.-I., and Yang, W.-P. (2008). A discretization algorithm based on class-attribute contingency coefficient. Inf. Sci., 178:714-731.
  23. Witten, I. and Frank, E. (2005). Data Mining: Practical Machine Learning Tools and Techniques. Elsevier, Morgan Kauffmann.
  24. Yang, Y. and Webb, G. (2001). Proportional k-interval discretization for naïve-Bayes classifiers. In 12th Eur. Conf. on Machine Learning, (ECML), pages 564-575.
  25. Zhao, Z., Morstatter, F., Sharma, S., Alelyani, S., Anand, A., and Liu, H. (2010). Advancing feature selection research - ASU feature selection repository. Technical report, Arizona State University.

Paper Citation

in Harvard Style

Ferreira A. and Figueiredo M. (2013). Relevance and Mutual Information-based Feature Discretization . In Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-8565-41-9, pages 68-77. DOI: 10.5220/0004268000680077

in Bibtex Style

author={Artur Ferreira and Mario Figueiredo},
title={Relevance and Mutual Information-based Feature Discretization},
booktitle={Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},

in EndNote Style

JO - Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - Relevance and Mutual Information-based Feature Discretization
SN - 978-989-8565-41-9
AU - Ferreira A.
AU - Figueiredo M.
PY - 2013
SP - 68
EP - 77
DO - 10.5220/0004268000680077