Table 2: Selected features for artificial data sets. Results not presented in this work are taken from literature (Liu and Motoda,
1998, p.119).
Method Monk1 Monk2 Monk3 Parity5+5
(A1, A2, A5) (A1–A6) (A2,A4, A5) (2–4,6,8)
Branch & Bound A1, A2, A5 A1–A6 A1, A2, A4, A5 2–4,6,8
Quick Branch & Bound A1, A2, A5 A1–A6 A1, A2, A4, A5 2–4,6,8
Focus A1, A2, A5 A1–A6 A1, A2, A4, A5 2–4,6,8
LVF A1, A2, A5 A1–A6 A1, A2, A4, A5 2–4,6,8
IFBS (t = 0.05) A1, A2, A5 A1–A6 A2, A4, A5 2–4,6, 8
IFBS (t = 0.00) A1, A2, A5 A1–A6 A1, A2, A4, A5 2–4,6,8
Table 3: Results for the different artificial data sets. t de-
notes the threshold, G the selected feature set, i the incon-
sistency rate of G and r the average runtime in ms.
t G i r
Monk1 0.10 A1, A2,A5 0.00 199
0.05 A1,A2, A5 0.00 200
0.01 A1,A2, A5 0.00 203
0.00 A1,A2, A5 0.00 207
Monk2 0.10 A2–A6 0.09 222
0.05 A1–A6 0.00 231
0.01 A1–A6 0.00 227
0.00 A1–A6 0.00 226
Monk3 0.10 A2,A5 0.07 239
0.05 A2,A4, A5 0.05 239
0.01 A1, A2,A4, A5 0.00 245
0.00 A1, A2,A4, A5 0.00 239
Parity5+5 0.10 B2,B3, B4,B6,B8 0.00 342
0.05 B2, B3,B4, B6, B8 0.00 358
0.01 B2, B3,B4, B6, B8 0.00 348
0.00 B2, B3,B4, B6, B8 0.00 372
Monk1 and Parity5+5 the ideal feature subset com-
bination is found regardless of the threshold chosen.
For Monk2 and a threshold of 0.10 the known rele-
vant feature A1 is not part of the result. For Monk3
and a threshold lower than the innate noise (5% of the
instances are misclassified) results in selection of the
known not relevant feature A1. The algorithm seems
to be prone to overfitting.
In Table 2 the results of IFBS are contrasted with
results from other feature selection methods. Our new
approache is not worse than the other approaches.
Input: features F , threshold t
G ← SelectFeatures(F , t) (PhaseI)
G ← EliminateFeatures(G , t) (PhaseII)
inv ← InvalidSubset(F , t) (PhaseIII)
G ← SearchFinalLevels(F , G , inv, t) (PhaseIV)
return G
Algorithm 1: IFBS.
Table 4: Properties of real world data sets.
Data Set Features Instances Class Feature
Type Type
Arcene 10,000 100 binary integer
Gisette 5,000 6,000 binary integer
Dexter 20,000 300 binary integer
Dorothea 100,000 800 binary binary
Madelon 500 2,000 binary integer
4.2 Real World Data Sets
After evaluating IFBS and IFBSc on artificial data
sets we used real life data, too. For having compa-
rable results at hand we decided to use the data sets
from 2003’s feature selection challenge organized by
the Feature Extraction Workshop at the Neural Infor-
mation Processing Systems Conference (NIPS)
6
.
Evaluation on the real world data sets is a two
step process. First we conducted the feature selection
and reduced the data sets accordingly,then we learned
classifiers using a decision tree approach and applied
those to the unlabeled challenge data sets. The final
results were submitted to the challenge’s homepage
using the name IFBS-DT. After describing the data
sets we will give both types of results: the selected
features and achieved classification quality.
4.2.1 Data Sets
The five data sets provided for this challenge are ob-
fuscated versions of real world data sets with added
random attributes. All data sets were split into train-
ing, test, and validation set by the callenge organizers.
Test and validation sets were unlabeled. All data sets
are available at the challenge’s homepage. Table 4
summarizes the used data sets.
To run IFBS on the data sets having integer-valued
features these features were discretized based on en-
tropy (Fayyad and Irani, 1993). For all data sets fea-
tures having only one value were excluded.
6
http://www.nipsfsc.ecs.soton.ac.uk
KDIR 2010 - International Conference on Knowledge Discovery and Information Retrieval
456