Gauthier Doquire, Michel Verleysen


Feature selection is a preprocessing step of great importance for a lot of pattern recognition and machine learning applications, including classification. Even if feature selection has been extensively studied for classical problems, very little work has been done to take into account a possible imprecision or uncertainty in the assignment of the class labels. However, such a situation can be encountered frequently in practice, especially when the labels are given by a human expert having some doubts on the exact class value. In this paper, the problem where each possible class for a given sample is associated with a probability is considered. A feature selection criterion based on the theory of graph Laplacian is proposed and its interest is experimentally demonstrated when compared with basic approaches to handle such imprecise labels.


  1. Asuncion, A. and Newman, D. (2007). UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences, available at http://www.ics.uci.edu/ mlearn/MLRepository.html.
  2. Bezdez, J. C. and Pal, S. K. (1992). Fuzzy models for pattern recognition. IEEE Press, Piscataway, NJ.
  3. Chapelle, O., Schölkopf, B., and Zien, A., editors (2006). Semi-Supervised Learning. MIT Press, Cambridge, MA.
  4. Chung, F. R. K. (1997). Spectral Graph Theory (CBMS Regional Conference Series in Mathematics, No. 92). American Mathematical Society.
  5. Coˆme, E., Oukhellou, L., Denoeux, T., and Aknin, P. (2009). Learning from partially supervised data using mixture models and belief functions. Pattern Recogn., 42:334-348.
  6. Dash, M. and Liu, H. (1997). Feature selection for classification. Intelligent Data Analysis, 1:131-156.
  7. Denoeux, T. and Zouhal, L. M. (2001). Handling possibilistic labels in pattern classification using evidential reasoning. Fuzzy Sets and Systems, 122(3):47-62.
  8. Ding, C. and Peng, H. (2003). Minimum redundancy feature selection from microarray gene expression data. In Proceedings of the IEEE Computer Society Conference on Bioinformatics, CSB 7803, pages 523-528, Washington, DC, USA. IEEE Computer Society.
  9. Friedman, J. H. (1991). Multivariate adaptive regression splines. The Annals of Statistics, 19(1):1-67.
  10. Guyon, I. and Elisseeff, A. (2003). An introduction to variable and feature selection. J. Mach. Learn. Res., 3:1157-1182.
  11. Hall, M. (1999). Correlation-based Feature Selection for Machine Learning. PhD thesis, University of Waikato.
  12. He, X., Cai, D., and Niyogi, P. (2006). Laplacian Score for Feature Selection. In Advances in Neural Information Processing Systems 18, pages 507-514. MIT Press, Cambridge, MA.
  13. Jenhani, I., Amor, N. B., and Elouedi, Z. (2008). Decision trees as possibilistic classifiers. Int. J. Approx. Reasoning, 48:784-807.
  14. Kohavi, R. and John, G. H. (1997). Wrappers for Feature Subset Selection. Artificial Intelligence, 97:273-324.
  15. Kwak, N. and Choi, C.-H. (2002). Input feature selection for classification problems. IEEE Transactions on Neural Networks, 13:143-159.
  16. Meyer, P. E., Schretter, C., and Bontempi, G. (2008). Information-Theoretic Feature Selection in Microarray Data Using Variable Complementarity. Selected Topics in Signal Processing, IEEE Journal of, 2(3):261-274.
  17. Peng, H., Long, F., and Ding, C. (2005). Feature selection based on mutual information criteria of maxdependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8):1226-1238.
  18. Semani, D., Frélicot, C., and Courtellemont, P. (2004). Combinaison d'étiquettes floues/possibilistes pour la sélection de variables. In 14ieme Congrès Francophone AFRIF-AFIA de Reconnaissance des Formes et Intelligence Artificielle, RFIA'04, pages 479-488.
  19. Smets, P., Hsia, Y., Saffiotti, A., Kennes, R., Xu, H., and Umkehren, E. (1991). The transferable belief model. Symbolic and Quantitative Approaches to Uncertainty, pages 91-96.
  20. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society B, 58:267-288.
  21. Wang, B., Jia, Y., Han, Y., and Han, W. (2009). Effective feature selection on data with uncertain labels. In Proceedings of the 2009 IEEE International Conference on Data Engineering, pages 1657-1662, Washington, DC, USA.
  22. Yang, Y. and Pedersen, J. O. (1997). A comparative study on feature selection in text categorization. In Proceedings of the Fourteenth International Conference on Machine Learning, ICML 7897, pages 412-420, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc.
  23. Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society, Series B, 68:49-67.
  24. Zhang, D., Chen, S., and Zhou, Z.-H. (2008). Constraint score: A new filter method for feature selection with pairwise constraints. Pattern Recogn., 41:1440-1451.
  25. Zhao, J., Lu, K., and He, X. (2008). Locality sensitive semi-supervised feature selection. Neurocomputing, 71:1842-1849.
  26. Zhao, Z. and Liu, H. (2007). Semi-supervised Feature Selection via Spectral Analysis. In Proceedings of the 7th SIAM International Conference on Data Mining.

Paper Citation

in Harvard Style

Doquire G. and Verleysen M. (2012). HANDLING IMPRECISE LABELS IN FEATURE SELECTION WITH GRAPH LAPLACIAN . In Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-8425-98-0, pages 162-169. DOI: 10.5220/0003712101620169

in Bibtex Style

author={Gauthier Doquire and Michel Verleysen},
booktitle={Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},

in EndNote Style

JO - Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
SN - 978-989-8425-98-0
AU - Doquire G.
AU - Verleysen M.
PY - 2012
SP - 162
EP - 169
DO - 10.5220/0003712101620169