vector of real numbers, and y
n
is the class membership. In this paper, we only con-
sidered the binary case, i. e. y
n
∈ {−1, +1}, but all applied algorithms have already
been introduced for the multi-class case, cf. [15] and [8], respectively. Our final goal
is not only the classification of these feature vectors, we additionally want to select the
best features to reduce the dimension of the feature space and to eliminate redundant
features.
Structure of the Paper. In section 2, we give a review on feature subset selection
principles and methods. Then, in section 3, we outline the Adaboost and ADTboost
algorithms and show their similarities and differences w. r. t. feature selection. Further-
more, we demonstrate and discuss their functionality regarding a simple example. Our
feature subset selection schemes with Adaboost and ADTboost are proposed in section
4. We have tested our approach on synthetic and benchmark data. The results of these
experiments are presented and discussed in section 5. Finally, we summarize our work
and give a short outlook of our research plans.
2 Standard Feature Selection Methods
If we want to select a subset of appropriate features from the total set of features with
cardinality D, we have a choice between 2
D
possibilities. If we deal with feature vectors
with more than a few dozens components, the exhaustive search takes too long. Thus,
we have to find other ways to select a subset of features.
One choice are genetic algorithms which select these subsets randomly. But al-
though they are relatively insensitive to noise, and there is no explicit domain knowl-
edge required, the creation of mutated samples within the evolutionary process might
lead to wrong solutions. Furthermore, the computational time of genetic algorithms in
combination with a wrapped classification method is not efficient, cf. [10].
Alternatively, there are deterministic approaches for selecting subsets of relevant
features. Forward selection methods start with an empty set and greedily add the best
of the remaining features to this set. Contrarily, backward elimination procedures start
with the full set containing all features, and then the most useless features are greedily
removed from this set. According to [11], feature subset selection methods can also get
characterized as filters and wrappers. Filters use evaluation methods for feature ranking
that are independent from the learning method. E. g. in [7], there the squared Pearson’s
Correlation Coefficient is proposed for determining the most relevant features.
In our experiments, the features have low correlation coefficients with the class
target and they are highly correlated with each other. Furthermore, the training samples
do not form compact clusters in feature space. Then, it is a hard task to find a single
classifier that is able to separate the two classes. Thus, we look for a wrapper method,
where the evaluation of the features is based on the learning results. Since we obtained
bad classification results using only one classifier, we would like to merge the feature
subset selection with a learning technique that uses several classifiers. This motivated
us to pursue the concept of adaptive boosting (Adaboost) where a strong (or highly
accurate) classifier is found by combining several weak (or less accurate) classifiers, cf.
[14].
114