ionosphere database the mean rate is about 75.5% for
a two features subset, 86% for a 5 features subset and
84% for a 10 features subset; for the landSat database
the mean rate is about 82.5% for a 4 features subset
and 85.5% for a 28 features subset.
We now focus on the good classification rate
obtained for some interesting subsets. The table 1.
shows subsets obtained with 2OMF, mRMR and
FSDD methods. Information about the Pareto
optimality is also given (Underlined size in table). In
addition to this, we present whether a subset
dominates the complete set (Green background color
in table). The mRMR and FSDD subsets are chosen
among the visited ones according to their mean rate
value.
All the displayed subsets obtained with the 2OMF
method are interesting because they have a low
number of features and a better stability than the
complete set. Nevertheless, some subsets have lowest
number of features and others highest classification
rates. For example, for the wine database, a subset
with two features (features 2 and 7) have a higher
mean rate and a lower amplitude than the complete
set having 13 features. Moreover, it has a higher
classification rates for 4 classifiers over 5. In the same
way, for imgSeg database the number of features is
divided by 2 with the 2OMF method.
Let us consider now the methods from the
literature. For landSat database none of the visited
subsets dominate the complete set for both mRMR
and FSDD. Moreover, stable and successful subsets
obtained with FSDD have a higher number of features
than the ones obtained with 2OMF. Only one stable
subset having low number of features is obtained with
mRMR (8 features). However, it is dominated by the
subset returned by 2OMF which has seven features
(last line in the table). We always found a subset
among 2OMF subsets having a lower number of
features, a higher classification mean rate and a lower
classification amplitude than the best subsets returned
by mRMR and FSD.
8 CONCLUSION
This paper presents a two steps algorithm for feature
selection and studies its multi-objective aspect. The
algorithm begins with a filter step to quickly select a
first pool of subsets in a Multi-Objectives and Multi-
Fronts way (2OMF). The subsets are evaluated using
the Dependency (D) and the Redundancy (R) of the
features. Then a second step based on a wrapper
approach is applied to measure the performances of
the subsets regarding several classifiers (KNN, LDA,
Mah, NB, PNN). Then the selection of the interesting
subsets is performed using the stability of the subsets
which is evaluated with the mean and amplitude of
the classification rates. From our experimentations, it
is observed that the interesting subsets dominate the
complete set regarding both objectives. The use of the
stability to select the subsets leads to robust results
which are very interesting for some applications such
as in biology where the stability of the subsets is more
important than its raw classification rate. The
wrapper step is required because some subsets of the
filter Pareto front could have a higher classification
rate than the complete set for a given classifiers but
not for another one. A selection of features only based
on a filter method does not ensure that the selected
subset will improve classification rates for a large set
of classifiers.
The results are very convincing for all tested
databases. The subsets obtained after applying our
algorithm have lower number of features and better
classification performances compare to the complete
set of features. Moreover, the diversity of the final
pool of subsets allows selecting a subset adapted to a
specific application (good classification expected or
reduction of a high number of features). We also
compared the proposed algorithm with two feature-
selection methods (mRMR and FSDD). It is observed
that our method outperforms the other tested methods
in almost all cases.
REFERENCES
A. Al-Ani, M. Deriche, and J. Chebil, "A new mutual
information based measure for feature selection,"
Intelligent Data Analysis, vol. 7, no. 1, pp. 43-57, 2003.
E. Cantu-Paz, "Feature Subset Selection,Class
Separability, and Genetic Algorithms," in Genetic and
Evolutionary Computation, 2004, pp. 959-970.
K. Deb, Multi-Objective Optimization Using Evolutionary
Algorithms.: John Wiley and Sons, Chichester, 2001.
C. Emmanouilidis, A. Hunter, and J. MacIntyre, "A
Multiobjective Evolutionary Setting for Feature
Selection and a Commonality-Based Crossover
Operator," in Congress on Evolutionary Computation,
California, July 2000, pp. 309-316.
B.A.S. Hasan, J.Q. Gan, and Z. Qingfu, "Multi-objective
evolutionary methods for channel selection in Brain-
Computer Interfaces: Some preliminary experimental
results," in IEEE Congress Evolutionary Computation,
Barcelona, Spain, July 2010, pp. 1-6.
M. Hilario and A. Kalousis, "Approaches to dimensionality
reduction in proteomic biomarker studies," Briefings in
Bioinformatics, vol. 9, no. 2, pp. 102-118, 2008.
A. Kalousis, J. Prados, and M. Hilario, "Stability of feature
selection algorithms: A study on high-dimensional