Stability of Ensemble Feature Selection on High-Dimension and

Low-Sample Size Data

Inﬂuence of the Aggregation Method

David Dernoncourt

1,2,3

, Blaise Hanczar

and Jean-Daniel Zucker

1,2,3,5

Institut National de la Sant´e et de la Recherche M´edicale, U872, Nutriomique,

Equipe 7,

Centre de Recherches des Cordeliers, Paris 75006, France

Universit´e Pierre et Marie-Curie - Paris 6, Nutriomique, 15 rue de l’

Ecole de M´edecine, Paris 75006, France

Institute of Cardiometabolism and Nutrition, Assistance Publique-Hˆopitaux de Paris, CRNH-

Ile de France,

Piti´e-Salpˆetri`ere, Boulevard de l’Hˆopital, Paris 75013, France

LIPADE, Universit´e Paris Descartes, 45 rue des Saint-P`eres, Paris, F-75006, France

Institut de Recherche pour le D´eveloppement, IRD, UMI 209, UMMISCO, France Nord, F-93143, Bondy, France

Keywords:

Feature Selection, Stability, Ensemble, Small Sample.

Abstract:

Feature selection is an important step when building a classiﬁer. However, the feature selection tends to be

unstable on high-dimension and small-sample size data. This instability reduces the usefulness of selected

features for knowledge discovery: if the selected feature subset is not robust, domain experts can have little

trust that they are relevant. A growing number of studies deal with feature selection stability. Based on the

idea that ensemble methods are commonly used to improve classiﬁers accuracy and stability, some works

focused on the stability of ensemble feature selection methods. So far, they obtained mixed results, and as far

as we know no study extensively studied how the choice of the aggregation method inﬂuences the stability of

ensemble feature selection. This is what we study in this preliminary work. We ﬁrst present some aggregation

methods, then we study the stability of ensemble feature selection based on them, on both artiﬁcial and real

data, as well as the resulting classiﬁcation performance.

1 INTRODUCTION

Feature selection is a critical step of the supervised

classiﬁcation procedure and specially in the small-

sample high dimension settings. Small-sample high

dimension settings refers to problems where the num-

ber of features is higher than the number of examples.

This kind of problem is increasingly frequent, espe-

cially in bioinformatics with the massive production

of ”omics” data. In this context the learning algo-

rithms met several problems called the curse of di-

mensionality (Simon, 2003). In high dimension, ﬁnd-

ing the actual informative features becomes more dif-

ﬁcult and the risk of overﬁting strongly increases. The

consequence is the worse generalization performance

of the classiﬁers (Jain and Chandrasekaran, 1982).

Secondly, very high dimension alone is an issue it-

self, as classiﬁers frequently do not scale well to huge

numbers of features, leading to increased computa-

tion times. Thirdly, a classiﬁer based on a small sub-

set of genes will be easier and less expensive to use in

practice. Moreover, a classiﬁer based on a high num-

ber of features will not be easily interpretable. The

point problem is about the robustness of the selected

features. To obtain a conﬁdent classiﬁer, the selected

subset has to be stable. To deal with all of these prob-

lems a feature selection is necessary in order to reduce

dimensionality of the data.

Feature selection refers to the process of remov-

ing irrelevant or redundant features (in our con-

text, genes) from the original set of features F =

{ f

, f

,..., f

|F |=D

}, so as to retain a subset S ⊂ F con-

taining only informative features useful for classiﬁ-

cation (Liu et al., 2010). Beyond classiﬁcation per-

formance, the other main objective of the gene selec-

tion is to obtain a reliable and robust list of predictive

genes: a gene which is regularly selected over several

datasets dealing with the same problem – or at least

over various random subsamples of the same dataset

– is more likely to be really relevant, and is of greater

interest to domain experts willing to use the classi-

ﬁcation results for knowledge discovery purposes. In

325

Dernoncourt D., Hanczar B. and Zucker J..

Stability of Ensemble Feature Selection on High-Dimension and Low-Sample Size Data - Inﬂuence of the Aggregation Method.

DOI: 10.5220/0004922203250330

In Proceedings of the 3rd International Conference on Pattern Recognition Applications and Methods (ICPRAM-2014), pages 325-330

ISBN: 978-989-758-018-5

 2014 SCITEPRESS (Science and Technology Publications, Lda.)

practice, this is generally not the case, and many stud-

ies on feature selection stability emphasized the difﬁ-

culty to obtain a reproducible gene signature on high-

dimension small-sample data (Ein-Dor et al., 2006;

Haury et al., 2011). Stability of feature selection

strongly depends on the N/D ratio and problem dif-

ﬁculty. For example in one previous work we have

shown that on a simple two Gaussian classes problem

with 1000 features, a t-test based selection has a prob-

ability of more than 0.95 to select the very informative

features when N = 1000, but this probability falls to

less than 0.2 when N = 50 (Dernoncourt et al., 2014).

A growing number of studies have been dealing

with feature selection stability, be it either to intro-

duce stability measures (Kuncheva, 2007; Somol and

Novoviˇcov´a, 2010), to compare the stability of exist-

ing methods (Haury et al., 2011), and/or to propose

innovative, stability-focused feature selection meth-

ods (Saeys et al., 2008; Han and Yu, 2012). Some

works, such as (Somol et al., 2009; Abeel et al., 2010;

Haury et al., 2011), also investigated the stability of

ensemble feature selection methods. The main idea

behind ensemble methods is to produce several indi-

vidual feature selections and combine them in order

to obtain a selection that outperforms every one of

them. This idea is based on the concept of ”wisdom of

the crowd”, which states that under certain controlled

conditions, the aggregation of information from sev-

eral sources, results in decisions that are often su-

perior to those that could have been made by any

single individual - even experts (Surowiecki, 2004;

Rokach, 2010). In practice, ensemble methods have

been widely applied to classiﬁers and since they can

improve classiﬁers accuracy and stability, we can sup-

pose that they should provide similar beneﬁts to fea-

ture selection techniques (Yang et al., 2010). So far,

works on ensemble feature selection have mainly fo-

cused on classiﬁcation accuracy, showing accuracy

gains (or losses) to be problem-dependent (Han et al.,

2013) and ﬁlter-dependent (Wald et al., 2013). Works

which also studied stability have obtained mixed re-

sults too, leaving the general impression that both sta-

bility and accuracy gains (or losses) from ensemble

methods are problem-dependent (Saeys et al., 2008).

However, those works have often measured stabil-

ity over overlapping resamplings, which strongly in-

creases the measured stability (Haury et al., 2011),

and might also impact stability variations, and as far

as we know no study extensively studied how the

choice of the aggregation method inﬂuences the sta-

bility of ensemble feature selection methods.

In this preliminary work, we start investigating

how ensemble methods improve feature selection

(FS) stability, with a focus on the impact of the aggre-

gation method. We ﬁrst brieﬂy present the stability

measures and the three ensemble aggregation meth-

ods we used. Then we perform an empirical analysis

of feature selection stability on both artiﬁcial and real

microarray datasets.

2 ENSEMBLE FEATURE

SELECTION

Creating an ensemble feature selection can be divided

into two steps. The ﬁrst step is to create a set of di-

verse feature selectors. The second step is to aggre-

gate them.

2.1 Diversity Generation

The diversity of feature selectors is a crucial condi-

tion for obtaining a ”wise crowd” (Surowiecki, 2004),

necessary for an efﬁcient ensemble. It can be obtained

via different methods such as:

• manipulating the training sample: typically, re-

sampling the training set so as to perform each FS

of the ensemble on a different training set,

• manipulating the FS method: for instance, use dif-

ferent parameters for each FS, if the FS method

has parameters,

• partitioning the search space: each FS is per-

formed on a different search space, for instance,

random forest learns each tree on a random, dif-

ferent, small subset of features,

• hybridization: use several FS methods in the en-

semble,

or a combination of those (Rokach, 2010). In this pa-

per, we focused on manipulating the training sample,

which is the most commonly used method, and ob-

tained diversity by bootstrapping the training samples

B = 40 times, based on previous works that showed

that ensemble FS doesn’t improve much when in-

creasing the amount of bootstrap samples further than

40 (Abeel et al., 2010) or even 20 (Saeys et al., 2008).

2.2 Aggregation

We tested the following methods of aggregation:

• Average score: on each bootstrap sample, the

FS method outputs a score s

, j

for each gene f

We simply average the score of a gene over the

bootstrap samples in order to obtain the ensemble

score W

∑

j=1

, j

(1)

ICPRAM2014-InternationalConferenceonPatternRecognitionApplicationsandMethods

326

• Average rank (Abeel et al., 2010): on each boot-

strap sample, the score of each gene s

, j

is con-

verted into a rank r

, j

. Then we average the rank

of a gene over the bootstrap samples in order to

obtain the ensemble score:

∑

j=1

, j

(2)

• Stability selection (Meinshausen and Bhlmann,

2010): the ensemble score W

of each gene is ob-

tained by measuring how often the gene ranks in

the top d of each bootstrap sample:

∑

j=1

I( f

, j)

(3)

where I( f

, j) = 1 if gene f

is ranked in the top d

on FS performed on the j

bootstrap sample, and

I( f

, j) = 0 otherwise.

Finally, for each aggregation strategy, the d genes

with the largest score W

are retained as the ﬁnal en-

semble selection.

3 EXPERIMENTAL DESIGN

We performed experiments on both artiﬁcial and real

data in order to assess the impact of ensemble meth-

ods on the stability of feature selection. We used four

base FS methods: t-score, random forest, recursive

feature elimination using support vector machines

(SVM-RFE), and mutual information. For each of

those methods, feature selection was performed with

and without the ensemble aggregation methods de-

scribed in the previous section. We then measured

the stability of the feature selection on strictly non-

overlapping sets (different generated sets on artiﬁcial

data, resamplings on real data), and the average classi-

ﬁcation error rate obtained with a linear discriminant

analysis (LDA) classiﬁer on the test sets.

3.1 Stability Measure

Many different measures have been described to mea-

sure stability. As our main stability measure, we

chose to use the relative weighted consistency (Somol

and Novoviˇcov´a, 2010), CW

rel

, which evaluates how

frequently each feature is selected, among the features

which have been selected at least once. We chose it

because it ignores the stability of non-selected fea-

tures (which would artiﬁcially increase the stability

measure on large datasets where many features are

irrelevant and easy to exclude), and because it’s ad-

justed to take into account the proportion of overlap-

ping features due to chance.

Let S = {S

,...,S

} be a system of ω gene sub-

sets obtained from ω runs of the feature selection rou-

tine on different samplings, Ω =

∑

i=1

| be the total

number of occurrences of any gene in S and F

be the

number of occurrences of gene f ∈ F in system S.

The weighted consistency CW was deﬁned as follow:

CW(S) =

∑

f∈X

Ω

− 1

ω− 1

(4)

and CW

rel

is obtained by adjusting CW on its minimal

and maximal possible values CW

min

and CW

max

rel

(S, F ) =

CW(S) −CW

min

(Ω,ω,F )

max

(Ω,ω) −CW

min

(Ω,ω,F )

(5)

3.2 Artiﬁcial Data

We used three different artiﬁcial data structures, all

based on a two-classes Gaussian model with D =

1000 genes. Each of the two classes follows a nor-

mal distribution deﬁned respectively by N (µ,I) and

N (−µ,Σ), where µ is a vector of means such that

|µ| = D and Σ is the covariance matrix.

In the ﬁrst data structure, NC100, all genes are in-

dependent (Σ is the identity matrix) and the elements

of µ consisted of d = 100 elements µ

= 1 (genes

useful for classiﬁcation) and D − d = 900 elements

= 0 (noise genes). Then µ was scaled down so that

F would yield a speciﬁed Bayes error (ε

Bayes

= 0.10).

In the second data structure, NC, all genes are in-

dependent and the elements µ

of µ were drawn from

a triangular distribution with a lower limit and mode

equal to 0 (probability density function: f(x) = 2−2x

for x ∈ [0;1]). In order to obtain a more realistic prob-

ability density, we then raised µ to a power of γ = 2,

similarly to the method used in (Dernoncourt et al.,

2014). Again, µ was scaled down so that F would

yield ε

Bayes

= 0.10.

In the third data structure, CB, we added correla-

tions within blocks of ten genes, by using the covari-

ance matrix

Σ =



0 ··· 0

0 Σ

··· 0

0 0 ··· Σ

100



, (6)

where Σ is a block diagonal matrix and Σ

is a 10× 10

square matrix with elements 1 along its diagonal and

0.5 off its diagonal, similarly to the method used in

(Han and Yu, 2012). We used the same µ as in the NC

dataset.

From these models, 50 training sets were gener-

ated, on which the FS methods were performed and an

LDA classiﬁer was trained. Each classiﬁer was then

applied to a test set consisting of 10000 instances to

estimate its error rate.

StabilityofEnsembleFeatureSelectiononHigh-DimensionandLow-SampleSizeData-InfluenceoftheAggregation

Method

327

Table 1: Microarray datasets.

Name N D Source

Colon cancer 62 2000 Alon et al., 1999

Leukemia 72 7129 Golub et al., 1999

Breast cancer Pawitan 159 8112 Pawitan et al., 2005

Lung cancer 203 2000 Bhattacharjee et al., 2001

Breast cancer Vijver 294 2000 van de Vijver et al., 2002

3.3 Real Data

We experimented with ﬁve publicly available mi-

croarray datasets, listed in Table 1. For each dataset,

50 training sets were generated by randomly drawing

half instances from the dataset (without replacement).

For each of them, feature selection was performed and

a classiﬁer was trained (using the same methods as

with the artiﬁcial data). Each classiﬁer was then ap-

plied on a test set consisting of the samples not in-

cluded in the corresponding training set. Stability of

the feature selection was measured within each pair

of training and test sets, so as to have no overlap. The

ﬁnal measure of stability corresponds to the average

of those 50 measures.

4 RESULTS

4.1 Artiﬁcial Data

Table 2 presents the stability of feature selections and

the error rate of resulting classiﬁers on the artiﬁcial

datasets. In general, we observed that ensemble fea-

ture selection providessimilarly or more stable results

than non-ensemble (single) feature selection, and a

similar or lower error rate.

T-score obtained the highest stability and lowest

classiﬁcation error rate overall. It further beneﬁted

from its ensemble version, but only with the average

score aggregation: the average rank and stability se-

lection aggregation slightly degraded its stability and

did not improve the classiﬁcation error rate.

Compared to t-score, SVM-RFE had a slightly

lower stability on non-correlated data and half the sta-

bility on correlated data, but with a similar error rate.

It did not beneﬁt from the ensemble version on non-

correlated data, but stability was improved on corre-

lated data, similarly with all aggregation types.

Random forest was the worst performer, with half

the stability of t-score on all datasets, and a higher er-

ror rate on the uncorrelated datasets. It was however

much more stable in its ensemble version, reaching

similar stability levels and error rate as single SVM-

RFE on non-correlated data, and a higher stability

(and somewhat lower error rate) than ensemble SVM-

RFE on correlated data. As with SVM-RFE, the ag-

gregation method did not matter here.

4.2 Real Data

Table 3 presents the stability of feature selections and

the error rate of resulting classiﬁers on the real mi-

croarray datasets. In general, similarly to the artiﬁ-

cial data, we observed that ensemble feature selection

provides similarly or more stable results than single

feature selection, with the exception of t-score on the

leukemia dataset. Unlike what we observed on arti-

ﬁcial data though, the error rate was in some cases

increased by the ensemble selection.

T-score obtained the highest stability on 2 out of

5 datasets. In 4 datasets, its ensemble version had

an improved stability, but only with the average score

aggregation, and in half cases (colon cancer and lung

cancer datasets) at the cost of a largely increased er-

ror rate. On the leukemia dataset, ensemble methods

reduced stability and increased error rate.

SVM-RFE obtained the highest stability on 3 out

of 5 datasets. In all datasets, its ensemble version had

an improved stability, the best improvement occurring

with the average score aggregation, closely followed

by stability selection. Aggregation by average rank

did not perform as well: it providedthe worse stability

increase on the colon cancer and lung cancer dataset,

and no stability increase or even a stability degrada-

tion on the other datasets. The error rate was generally

unchanged by ensemble methods, except for a 10%

increase in the colon cancer dataset (with any aggre-

gation method) and a 20% decrease in the lung cancer

dataset (aggregation by average rank only).

Random forest had the lowest stability on 3 out of

5 datasets, and tied with mutual information on the

Vijver dataset. However it had a competitive error

rate (best without ensemble on Vijver and leukemia

datasets, best with ensemble on the colon cancer

dataset), even though the differences in error rates

between the different feature selection methods were

generally small, except on the Pawitan dataset. Sim-

ilarly to SVM-RFE, its stability was generally in-

creased by ensemble methods. Aggregation by av-

erage score and stability selection improved stability

ICPRAM2014-InternationalConferenceonPatternRecognitionApplicationsandMethods

328

Table 2: Classiﬁcation error rate and selection stability on the artiﬁcial datasets, with feature selection without ensemble and

with three ensemble aggregation methods.

Artiﬁcial Ensemble t-score SVM-RFE Random forest

data type aggregation Error rate CW

rel

Error rate CW

rel

Error rate CW

rel

NC100 Single 0.338 0.075 0.345 0.065 0.383 0.031

Average score 0.290 0.135 0.339 0.071 0.347 0.059

Average rank 0.340 0.072 0.340 0.072 0.347 0.061

Stability selection 0.339 0.071 0.344 0.068 0.345 0.062

NC Single 0.360 0.050 0.360 0.049 0.391 0.027

Average score 0.320 0.071 0.364 0.045 0.365 0.042

Average rank 0.361 0.045 0.364 0.044 0.363 0.045

Stability selection 0.366 0.043 0.363 0.045 0.368 0.040

CB Single 0.239 0.235 0.242 0.113 0.244 0.099

Average score 0.223 0.265 0.238 0.152 0.232 0.193

Average rank 0.235 0.226 0.238 0.154 0.230 0.195

Stability selection 0.237 0.217 0.239 0.150 0.233 0.190

Table 3: Classiﬁcation error rate and selection stability on the microarray datasets, with feature selection without ensemble

and with three ensemble aggregation methods.

Data Ensemble t-score SVM-RFE Random forest Mutual inf

aggregation Error CW

rel

Error CW

rel

Error CW

rel

Error CW

rel

Colon Single 0.188 0.310 0.182 0.448 0.196 0.163 0.181 0.140

Score 0.305 0.327 0.203 0.588 0.179 0.206 0.217 0.149

Rank 0.215 0.277 0.203 0.494 0.199 0.163 0.209 0.149

Stability 0.210 0.262 0.206 0.568 0.177 0.210 0.213 0.145

Leukemia Single 0.049 0.322 0.044 0.525 0.042 0.220 0.050 0.263

Score 0.117 0.315 0.051 0.581 0.043 0.265 0.052 0.300

Rank 0.054 0.269 0.047 0.517 0.067 0.094 0.053 0.297

Stability 0.058 0.246 0.047 0.565 0.046 0.269 0.049 0.294

Pawitan Single 0.342 0.071 0.283 0.129 0.309 0.011 0.313 0.023

Score 0.328 0.085 0.283 0.180 0.320 0.012 0.343 0.024

Rank 0.344 0.065 0.298 0.095 0.324 0.015 0.335 0.024

Stability 0.317 0.047 0.289 0.180 0.314 0.011 0.329 0.022

Lung Single 0.054 0.515 0.084 0.377 0.060 0.342 0.061 0.372

Score 0.076 0.536 0.083 0.498 0.064 0.398 0.063 0.379

Rank 0.058 0.417 0.067 0.444 0.063 0.389 0.064 0.377

Stability 0.058 0.445 0.082 0.487 0.063 0.394 0.064 0.376

Vijver Single 0.382 0.254 0.377 0.159 0.359 0.078 0.360 0.077

Score 0.364 0.345 0.373 0.221 0.368 0.107 0.357 0.091

Rank 0.371 0.237 0.368 0.158 0.371 0.105 0.360 0.092

Stability 0.374 0.237 0.376 0.215 0.371 0.106 0.359 0.088

equally and on all datasets, except for the Pawitan

dataset. Aggregation by average rank was more in-

consistent: similar improvements as the other aggre-

gation methods on Vijver and lung cancer datasets,

a better improvement on the Pawitan dataset, where

random forest had a very low stability compared to

the other methods, no improvement compared to sin-

gle random forest on the colon cancer dataset, and an

important degradation on the leukemia dataset.

Mutual information had the lowest stability on

the colon cancer dataset, yet competitive error rates.

The ensemble versions had a similar (colon cancer,

Pawitan, and lung cancer datasets) or moderately im-

proved (leukemia and Vijver datasets) stability, with

no marked difference in favor of a speciﬁc aggrega-

tion method. Ensemble increased the error rate in

the colon cancer and Pawitan dataset, again with no

marked difference between aggregation methods.

5 CONCLUSIONS AND FUTURE

WORK

In this work, we studied how ensemble feature selec-

tion methods inﬂuence the stability of the gene selec-

StabilityofEnsembleFeatureSelectiononHigh-DimensionandLow-SampleSizeData-InfluenceoftheAggregation

Method

329

tion and, to a lesser extent, the error rate of the re-

sulting classiﬁer on microarray data and artiﬁcial data

of similar dimension. We focused on the aggrega-

tion method used in the ensemble procedure, because

that is an important aspect of the ensemble construc-

tion procedure which, to our knowledge, had scarcely

been investigated in such a setting before. Similarly

to (Haury et al., 2011), we found that average rank ag-

gregation usually performed worse than the other ag-

gregation methods. We also found that average score

aggregation usually resulted, with a few exception, in

the best stability, while stability selection aggregation

was in-between.

We observed, in some cases, a trade-off between

stability and error rate. Previous studies such as

(Saeys et al., 2008) already suggested such a dataset-

dependent trade-off between robustness and classiﬁ-

cation performance. Here, however, we ﬁnd that the

aggregation method can also play a role in this trade-

off, since in some cases error rate and stability were

differently affected by the different ensemble aggre-

gation methods. This trade-off did not seem to apply

to our artiﬁcial data, though: on them, a better er-

ror rate was systematically paired with a better stabil-

ity. This difference could be due to structural differ-

ences with the real data (the latter probably present-

ing much more complex and numerous interactions),

or to a lack of diversity in the artiﬁcial data, since the

trade-off is not observed in all datasets. Nonetheless,

we observed that in all cases, the most stable method

without ensemble could be rendered more stable via

ensemble, with or without a trade-off on the classiﬁ-

cation error rate.

As future work, we think it would be interesting

to study or develop more aggregation methods, such

as average weighted rank or score (giving a higher

weight to higher scores or lower ranks). Weighted

(exponential) rank reportedly performed better than

average rank (Haury et al., 2011), so maybe such an

improvement could be obtained by using the same

method on scores. Hybridizationof different base fea-

ture selection methods also seems to be an interesting

area to explore, which will also require some speciﬁc

work on the aggregation strategies.

REFERENCES

Abeel, T., Helleputte, T., Van de Peer, Y., Dupont, P., and

Saeys, Y. (2010). Robust biomarker identiﬁcation

for cancer diagnosis with ensemble feature selection

methods. Bioinformatics, 26(3):392–398.

Dernoncourt, D., Hanczar, B., and Zucker, J.-D. (2014).

Analysis of feature selection stability on high dimen-

sion and small sample data. Computational Statistics

and Data Analysis, 71(0):681 – 693.

Ein-Dor, L., Zuk, O., and Domany, E. (2006). Thousands

of samples are needed to generate a robust gene list

for predicting outcome in cancer. Proceedings of the

National Academy of Sciences, 103(15):5923–5928.

Han, Y., Yang, Y., and Zhou, X. (2013). Co-regularized

ensemble for feature selection. In Proceedings of the

Twenty-Third International Joint Conference on Arti-

ﬁcial Intelligence, IJCAI’13, pages 1380–1386. AAAI

Press.

Han, Y. and Yu, L. (2012). A variance reduction framework

for stable feature selection. Statistical Analysis and

Data Mining, 5(5):428–445.

Haury, A.-C., Gestraud, P., and Vert, J.-P. (2011). The inﬂu-

ence of feature selection methods on accuracy, stabil-

ity and interpretability of molecular signatures. PLoS

ONE, 6(12):e28210.

Jain, A. K. and Chandrasekaran, B. (1982). 39 dimension-

ality and sample size considerations in pattern recog-

nition practice. Handbook of Statistics, 2:835–855.

Kuncheva, L. I. (2007). A stability index for feature selec-

tion. In Devedzic, V., editor, Artiﬁcial Intelligence and

Applications, pages 421–427. IASTED/ACTA Press.

Liu, H., Liu, L., and Zhang, H. (2010). Ensemble gene se-

lection by grouping for microarray data classiﬁcation.

Journal of biomedical informatics, 43:81–87.

Meinshausen, N. and Bhlmann, P. (2010). Stability selec-

tion. Journal of the Royal Statistical Society: Series B

(Statistical Methodology), 72(4):417–473.

Rokach, L. (2010). Ensemble-based classiﬁers. Artiﬁcial

Intelligence Review, 33(1-2):1–39.

Saeys, Y., Abeel, T., and Peer, Y. (2008). Robust feature

selection using ensemble feature selection techniques.

In Daelemans, W., Goethals, B., and Morik, K., edi-

tors, Machine Learning and Knowledge Discovery in

Databases, volume 5212 of Lecture Notes in Com-

puter Science, pages 313–325. Springer Berlin Hei-

delberg.

Simon, R. (2003). Supervised analysis when the number of

candidate features (p) greatly exceeds the number of

cases (n). SIGKDD Explor. Newsl., 5(2):31–36.

Somol, P., Grim, J., and Pudil, P. (2009). Criteria ensembles

in feature selection. In Benediktsson, J. A., Kittler, J.,

and Roli, F., editors, MCS, volume 5519 of Lecture

Notes in Computer Science, pages 304–313. Springer.

Somol, P. and Novoviˇcov´a, J. (2010). Evaluating stability

and comparing output of feature selectors that opti-

mize feature subset cardinality. IEEE Trans. Pattern

Anal. Mach. Intell., 32(11):1921–1939.

Surowiecki, J. (2004). The Wisdom of Crowds: Why the

Many are Smarter Than the Few and how Collective

Wisdom Shapes Business, Economies, Societies, and

Nations. Doubleday.

Wald, R., Khoshgoftaar, T. M., and Dittman, D. J. (2013).

Ensemble gene selection versus single gene selec-

tion: Which is better? In Boonthum-Denecke, C.

and Youngblood, G. M., editors, FLAIRS Conference.

AAAI Press.

Yang, P., Hwa Yang, Y., B. Zhou, B., and Y. Zomaya, A.

(2010). A review of ensemble methods in bioinfor-

matics. Current Bioinformatics, 5(4):296–308.

ICPRAM2014-InternationalConferenceonPatternRecognitionApplicationsandMethods

330