lection algorithms dealing with mixed data is to chose
an appropriate relationship between categorical and
continuous features leading to a sound similarity mea-
sure or neighborood definition between observations.
This new method simply alleviates this problem by
ignoring this unknown relationship. The hope is to
compensate the loss of information induced by this
hypothesis by a more accurate ranking of features of
each type and by the use of a classification model.
Even if the approach requires the explicit building
of prediction models, the number of models to build
is small compared to a pure wrapper approach. More-
over, experimental results on four data sets show the
interest in terms of the accuracy of two classifiers.
All the developments presented in this paper could
be applied to regression problems (problems with a
continuous output); the only modifications needed
would be to use the MI estimator (5) instead of (6)
for the continuous features and the mRmR approach
for the categorical ones. It would thus be interesting
to test the proposed approach on such problems.
REFERENCES
Asuncion, A. and Newman, D. (2007). UCI machine learn-
ing repository. University of California, Irvine, School
of Information and Computer Sciences, available at
http://www.ics.uci.edu/∼mlearn/MLRepository.html.
Battiti, R. (1994). Using mutual information for select-
ing features in supervised neural net learning. IEEE
Transactions on Neural Networks, 5:537–550.
Bellman, R. (1961). Adaptive Control Processes: A Guided
Tour. Princeton University Press.
Boriah, S., Chandola, V., and Kumar, V. (2008). Similarity
measures for categorical data: A comparative evalua-
tion. In SDM’08, pages 243–254.
Fleuret, F. (2004). Fast binary feature selection with con-
ditional mutual information. J. Mach. Learn. Res.,
5:1531–1555.
G
´
omez-Verdejo, V., Verleysen, M., and Fleury, J. (2009).
Information-theoretic feature selection for functional
data classification. Neurocomputing, 72:3580–3589.
Guyon, I. and Elisseeff, A. (2003). An introduction to
variable and feature selection. J. Mach. Learn. Res.,
3:1157–1182.
Hall, M. A. (2000). Correlation-based feature selection
for discrete and numeric class machine learning. In
Proceedings of ICML 2000, pages 359–366. Morgan
Kaufmann Publishers Inc.
Hu, Q., Liu, J., and Yu, D. (2008). Mixed feature selec-
tion based on granulation and approximation. Know.-
Based Syst., 21:294–304.
Kozachenko, L. F. and Leonenko, N. (1987). Sample es-
timate of the entropy of a random vector. Problems
Inform. Transmission, 23:95–101.
Kraskov, A., St
¨
ogbauer, H., and Grassberger, P. (2004). Es-
timating mutual information. Physical review. E, Sta-
tistical, nonlinear, and soft matter physics, 69(6 Pt 2).
Kwak, N. and Choi, C.-H. (2002). Input feature selection
by mutual information based on parzen window. IEEE
Trans. Pattern Anal. Mach. Intell., 24:1667–1671.
Parzen, E. (1962). On the estimation of a probability density
function and mode. Annals of Mathematical Statistics,
33:1065–1076.
Peng, H., Long, F., and Ding, C. (2005). Fea-
ture selection based on mutual information: Cri-
teria of max-dependency, max-relevance and min-
redundancy. IEEE Trans. Pattern Anal. Mach. Intell.,
27:1226–1238.
Rossi, F., Lendasse, A., Franc¸ois, D., Wertz, V., and Verley-
sen, M. (2006). Mutual information for the selection
of relevant variables in spectrometric nonlinear mod-
elling. Chemometrics and Intelligent Laboratory Sys-
tems, 80(2):215–226.
Shannon, C. E. (1948). A mathematical theory of commu-
nication. The Bell system technical journal, 27:379–
423.
Tang, W. and Mao, K. Z. (2007). Feature selection algo-
rithm for mixed data with both nominal and continu-
ous features. Pattern Recogn. Lett., 28:563–571.
Verleysen, M. (2003). Learning high-dimensional data.
Limitations and Future Trends in Neural Computa-
tion, 186:141–162.
Wilson, D. R. and Martinez, T. R. (1997). Improved hetero-
geneous distance functions. J. Artif. Int. Res., 6:1–34.
AN HYBRID APPROACH TO FEATURE SELECTION FOR MIXED CATEGORICAL AND CONTINUOUS DATA
401