Heuristic Ensemble of Filters for Reliable Feature Selection

Ghadah Aldehim, Beatriz de la Iglesia, Wenjia Wang

2014

Abstract

Feature selection has become ever more important in data mining in recent years due to the rapid increase in the dimensionality of data. Filters are preferable in practical applications as they are much faster than wrapper based approaches, but their reliability and consistency vary considerably on different data and yet no rule exists to indicate which one should be used for a particular given dataset. In this paper, we propose a heuristic ensemble approach that combines multiple filters with heuristic rules to improve the overall performance. It consists of two types of filters: subset filters and ranking filters, and a heuristic consensus algorithm. The experimental results demonstrate that our ensemble algorithm is more reliable and effective than individual filters as the features selected by the ensemble consistently achieve better accuracy for typical classifiers on various datasets.

References

  1. Aha, D. W., Kibler, D., and Albert, M. K. (1991). Instance-based learning algorithms. Machine Learning, 6(1):37-66.
  2. Blum, A. L. and Langley, P. (1997). Selection of relevant features and examples in machine learning. Artificial intelligence, 97(1):245-271.
  3. Gutlein, M., Frank, E., Hall, M., and Karwath, A. (2009). Large-scale attribute selection using wrappers. In Computational Intelligence and Data Mining, 2009. CIDM'09. IEEE Symposium on, pages 332-339. IEEE.
  4. Hall, M. A. (1999). Correlation-based feature selection for machine learning. PhD thesis, The University of Waikato, https://www.lri.fr.
  5. John, G. H. and Langley, P. (1995). Estimating continuous distributions in bayesian classifiers. In Proceedings of the Eleventh conference on Uncertainty in artificial intelligence, pages 338-345. Morgan Kaufmann Publishers Inc.
  6. Kira, K. and Rendell, L. A. (1992). The feature selection problem: Traditional methods and a new algorithm. In Proceedings of the National Conference on Artificial Intelligence, pages 129-129. John Wiley & Sons Ltd.
  7. Kohavi, R. and John, G. H. (1997). Wrappers for feature subset selection. Artificial intelligence, 97(1):273- 324.
  8. Kononenko, I. (1994). Estimating attributes: analysis and extensions of relief. In Machine Learning: ECML-94, pages 171-182. Springer.
  9. Moore, J. H. and White, B. C. (2007). Tuning relieff for genome-wide genetic analysis. In Evolutionary computation, machine learning and data mining in bioinformatics, pages 166-175. Springer.
  10. Olsson, J. and Oard, D. W. (2006). Combining feature selectors for text classification. In Proceedings of the 15th ACM international conference on Information and knowledge management, pages 798-799. ACM.
  11. Platt, J. C. (1999). 12 fast training of support vector machines using sequential minimal optimization.
  12. Quinlan, J. R. (1993). C4. 5: programs for machine learning, volume 1. Morgan kaufmann.
  13. Robnik-S? ikonja, M. and Kononenko, I. (2003). Theoretical and empirical analysis of relieff and rrelieff. Machine learning, 53(1-2):23-69.
  14. Saeys, Y., Abeel, T., and Van de Peer, Y. (2008). Robust feature selection using ensemble feature selection techniques. In Machine Learning and Knowledge Discovery in Databases, pages 313-325. Springer.
  15. Saeys, Y., Inza, I., and Larran˜aga, P. (2007). A review of feature selection techniques in bioinformatics. Bioinformatics, 23(19):2507-2517.
  16. Sun, X., Liu, Y., Li, J., Zhu, J., Chen, H., and Liu, X. (2012). Feature evaluation and selection with cooperative game theory. Pattern Recognition, 45(8):2992- 3002.
  17. Wang, H., Khoshgoftaar, T., and Gao, K. (2010a). Ensemble feature selection technique for software quality classification. In Proceedings of the 22nd International Conference on Software Engineering and Knowledge Engineering, pages 215-220.
  18. Wang, H., Khoshgoftaar, T. M., and Napolitano, A. (2010b). A comparative study of ensemble feature selection techniques for software defect prediction. In Machine Learning and Applications (ICMLA), Ninth International Conference on, pages 135-140. IEEE.
  19. Wang, H., Khoshgoftaar, T. M., and Napolitano, A. (2012). Software measurement data reduction using ensemble techniques. Neurocomputing, 92:124-132.
  20. Yang, P., Ho, J., Yang, Y., and Zhou, B. (2011). Gene-gene interaction filtering with ensemble of filters. BMC bioinformatics, 12(Suppl 1):S10.
  21. Yu, L. and Liu, H. (2003). Feature selection for highdimensional data: A fast correlation-based filter solution. In Machine Learning International Workshop, volume 20, page 856.
  22. Yu, L. and Liu, H. (2004). Efficient feature selection via analysis of relevance and redundancy. The Journal of Machine Learning Research, 5:1205-1224.
  23. Zhang, L.-X., Wang, J.-X., Zhao, Y.-N., and Yang, Z.-H. (2003). A novel hybrid feature selection algorithm: using relieff estimation for ga-wrapper search. In Machine Learning and Cybernetics, 2003 International Conference on, volume 1, pages 380-384. IEEE.
  24. Zhang, Y. and Zhang, Z. (2012). Feature subset selection with cumulate conditional mutual information minimization. Expert Systems with Applications, 39(5):6078-6088.
Download


Paper Citation


in Harvard Style

Aldehim G., de la Iglesia B. and Wang W. (2014). Heuristic Ensemble of Filters for Reliable Feature Selection . In Proceedings of the 3rd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-018-5, pages 175-182. DOI: 10.5220/0004812401750182


in Bibtex Style

@conference{icpram14,
author={Ghadah Aldehim and Beatriz de la Iglesia and Wenjia Wang},
title={Heuristic Ensemble of Filters for Reliable Feature Selection},
booktitle={Proceedings of the 3rd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2014},
pages={175-182},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004812401750182},
isbn={978-989-758-018-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 3rd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - Heuristic Ensemble of Filters for Reliable Feature Selection
SN - 978-989-758-018-5
AU - Aldehim G.
AU - de la Iglesia B.
AU - Wang W.
PY - 2014
SP - 175
EP - 182
DO - 10.5220/0004812401750182