mC-ReliefF - An Extension of ReliefF for Cost-based Feature Selection

Verónica Bolón-Canedo, Beatriz Remeseiro, Noelia Sánchez-Maroño, Amparo Alonso-Betanzos

2014

Abstract

The proliferation of high-dimensional data in the last few years has brought a necessity to use dimensionality reduction techniques, in which feature selection is arguably the favorite one. Feature selection consists of detecting relevant features and discarding the irrelevant ones. However, there are some situations where the users are not only interested in the relevance of the selected features but also in the costs that they imply, e.g. economical or computational costs. In this paper an extension of the well-known ReliefF method for feature selection is proposed, which consists of adding a new term to the function which updates the weights of the features so as to be able to reach a trade-off between the relevance of a feature and its associated cost. The behavior of the proposed method is tested on twelve heterogeneous classification datasets as well as a real application, using a support vector machine (SVM) as a classifier. The results of the experimental study show that the approach is sound, since it allows the user to reduce the cost significantly without compromising the classification error.

References

  1. Asuncion, A. and Newman, D. (2007). UCI machine learning repository.
  2. Burges, C. J. (1998). A tutorial on support vector machines for pattern recognition. Data mining and knowledge discovery, 2(2):121-167.
  3. Feddema, J. T., Lee, C. G., and Mitchell, O. R. (1991). Weighted selection of image features for resolved rate visual feedback control. Robotics and Automation, IEEE Transactions on, 7(1):31-47.
  4. Forman, G. (2003). An extensive empirical study of feature selection metrics for text classification. The Journal of Machine Learning Research, 3:1289-1305.
  5. Guillon, J.-P. (1998). Non-invasive tearscope plus routine for contact lens fitting. Contact Lens and Anterior Eye, 21:S31-S40.
  6. Guyon, I., Gunn, S., Nikravesh, M., and Zadeh, L. A. (2006). Feature extraction: foundations and applications, volume 207. Springer.
  7. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I. (2009). The WEKA data mining software: An update. ACM SIGKDD Explorations Newsletter, 11(1):10-18.
  8. Han, J., Kamber, M., and Pei, J. (2006). Data mining: concepts and techniques. Morgan kaufmann.
  9. Haralick, R. M., Shanmugam, K., and Dinstein, I. H. (1973). Textural features for image classification. Systems, Man and Cybernetics, IEEE Transactions on, (6):610-621.
  10. Hochberg, Y. and Tamhane, A. C. (1987). Multiple comparison procedures. John Wiley & Sons, Inc.
  11. Huang, C.-L. and Wang, C.-J. (2006). A ga-based feature selection and parameters optimizationfor support vector machines. Expert Systems with applications, 31(2):231-240.
  12. Inza, I., Larran˜aga, P., Blanco, R., and Cerrolaza, A. J. (2004). Filter versus wrapper gene selection approaches in dna microarray domains. Artificial intelligence in medicine, 31(2):91-103.
  13. Kira, K. and Rendell, L. A. (1992). A practical approach to feature selection. In Proceedings of the ninth international workshop on Machine learning, pages 249- 256. Morgan Kaufmann Publishers Inc.
  14. Kononenko, I. (1994). Estimating attributes: analysis and extensions of relief. In Machine Learning: ECML-94, pages 171-182. Springer.
  15. Lee, W., Stolfo, S. J., and Mok, K. W. (2000). Adaptive intrusion detection: A data mining approach. Artificial Intelligence Review, 14(6):533-567.
  16. McLaren, K. (1976). The development of the CIE 1976 (L*a*b) uniform colour-space and colour-difference formula. Journal of the Society of Dyers and Colourists, 92(9):338-341.
  17. Mierswa, I., Wurst, M., Klinkenberg, R., Scholz, M., and Euler, T. (2006). Yale: Rapid prototyping for complex data mining tasks. In Ungar, L., Craven, M., Gunopulos, D., and Eliassi-Rad, T., editors, KDD 7806: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 935-940, New York, NY, USA. ACM.
  18. Min, F., Hu, Q., and Zhu, W. (2013). Feature selection with test cost constraint. International Journal of Approximate Reasoning.
  19. Remeseiro, B., Ramos, L., Penas, M., Martinez, E., Penedo, M. G., and Mosquera, A. (2011). Colour texture analysis for classifying the tear film lipid layer: a comparative study. In Digital Image Computing Techniques and Applications (DICTA), 2011 International Conference on, pages 268-273. IEEE.
  20. Robnik- S?ikonja, M. and Kononenko, I. (2003). Theoretical and empirical analysis of relieff and rrelieff. Machine learning, 53(1-2):23-69.
  21. Sivagaminathan, R. K. and Ramakrishnan, S. (2007). A hybrid approach for feature subset selection using neural networks and ant colony optimization. Expert systems with applications, 33(1):49-60.
  22. Teich, J. (2001). Pareto-front exploration with uncertain objectives. In Evolutionary multi-criterion optimization, pages 314-328. Springer.
  23. VOPTICAL I1 (2012). VOPTICAL I1, VARPA optical dataset annotated by optometrists from the Faculty of Optics and Optometry, University of Santiago de Compostela (Spain). [Online] Available: http://www.varpa.es/voptical I1.html, last access: december 2013.
  24. Yang, J. and Honavar, V. (1998). Feature subset selection using a genetic algorithm. In Feature extraction, construction and selection, pages 117-136. Springer.
  25. Zhao, Z. A. and Liu, H. (2012). Spectral feature selection for data mining. CRC Press.
Download


Paper Citation


in Harvard Style

Bolón-Canedo V., Remeseiro B., Sánchez-Maroño N. and Alonso-Betanzos A. (2014). mC-ReliefF - An Extension of ReliefF for Cost-based Feature Selection . In Proceedings of the 6th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART, ISBN 978-989-758-015-4, pages 42-51. DOI: 10.5220/0004756800420051


in Bibtex Style

@conference{icaart14,
author={Verónica Bolón-Canedo and Beatriz Remeseiro and Noelia Sánchez-Maroño and Amparo Alonso-Betanzos},
title={mC-ReliefF - An Extension of ReliefF for Cost-based Feature Selection},
booktitle={Proceedings of the 6th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,},
year={2014},
pages={42-51},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004756800420051},
isbn={978-989-758-015-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 6th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,
TI - mC-ReliefF - An Extension of ReliefF for Cost-based Feature Selection
SN - 978-989-758-015-4
AU - Bolón-Canedo V.
AU - Remeseiro B.
AU - Sánchez-Maroño N.
AU - Alonso-Betanzos A.
PY - 2014
SP - 42
EP - 51
DO - 10.5220/0004756800420051