Authors:
David Dernoncourt
1
;
Blaise Hanczar
2
and
Jean-Daniel Zucker
3
Affiliations:
1
Institut National de la Santé et de la Recherche Médicale, Université Pierre et Marie-Curie - Paris 6 and Institute of Cardiometabolism and Nutrition, France
;
2
Université Paris Descartes, France
;
3
Institut National de la Santé et de la Recherche Médicale, Université Pierre et Marie-Curie - Paris 6, Institute of Cardiometabolism and Nutrition and Institut de Recherche pour le Développement, France
Keyword(s):
Feature Selection, Stability, Ensemble, Small Sample.
Related
Ontology
Subjects/Areas/Topics:
Applications
;
Bioinformatics and Systems Biology
;
Ensemble Methods
;
Feature Selection and Extraction
;
Pattern Recognition
;
Software Engineering
;
Theory and Methods
Abstract:
Feature selection is an important step when building a classifier. However, the feature selection tends to be
unstable on high-dimension and small-sample size data. This instability reduces the usefulness of selected
features for knowledge discovery: if the selected feature subset is not robust, domain experts can have little
trust that they are relevant. A growing number of studies deal with feature selection stability. Based on the
idea that ensemble methods are commonly used to improve classifiers accuracy and stability, some works
focused on the stability of ensemble feature selection methods. So far, they obtained mixed results, and as far
as we know no study extensively studied how the choice of the aggregation method influences the stability of
ensemble feature selection. This is what we study in this preliminary work. We first present some aggregation
methods, then we study the stability of ensemble feature selection based on them, on both artificial and real
data, as well as
the resulting classification performance.
(More)