ferent dimensions at the same time. Moreover as it
uses adjusted genetic algorithm guided directly by the
combiner performance, selection in all dimensions
is coordinated and optimised to jointly produce best
overall performance of the composite classification
system. The prototype of the 3-dimensional selection
system is developed and tested on the two benchmark
datasets in the comparative experiments. The remain-
der of this paper is organised as follows. Section 2
discusses dimensions of selection in the classification
process covering in detail selection in data, classifiers
and fusion systems. The following section introduces
the multidimensional selection model and discusses
its representation and selection algorithm. Extensive
experimental results are shown in Section 4. Finally
conclusions and recommendations for future work are
briefly drawn in Section 5.
2 DIMENSIONS OF SELECTION
2.1 Data
Until recently there was a common belief that all the
available data should be used to build a classification
model. This belief although theoretically genuine was
being gradually relaxed by extensive experimental
findings related to the feature selection (H. Ishibuchi
and Nii, 2001), (Kuncheva and Jain, 1999). It was
uncovered that realistic learning systems can not fully
distinguish between good or representative data and
bad data due to their lack the mechanisms of accumu-
lative and non-conflicting exploitation of all the data.
On practical grounds it turned out that to avoid per-
formances losses, the simplest thing to do is to filter
out bad data and use only good data to build a classifi-
cation model. Finding the most suitable training data
breaks down into a variety of ways these data can be
selected. Direct selection of the optimal data points is
usually referred to as data editing (Kuncheva and Jain,
1999) where the aim could be either to attain the com-
pact data sample that retains maximum representa-
tiveness of the original data structure, or simply to fit
the best input to the learning mechanism. The selec-
tion restrictions can be specified in many other ways
beyond just direct selection of samples. As the data is
mapped onto the input space, the selection rules can
be attributed to the space rather than to the data form-
ing it. The input space can be simply segmented into
many differently shaped subspaces. The shapes of
subspaces may be formed in various generic forms, or
can be dictated by the classification methodology. In
dynamic classifier selection methodology (Giacinto
and Roli, 1999), the shape of the subspace is dictated
by the k-nearest neighbour rule, while in Error Cor-
recting Codes (ECOC) method (Dietterich and Bakiri,
1995) the shape of the subspace is fully determined by
the structure of classes in the data. In the most com-
mon scenario, the input space is divided along paral-
lel or perpendicular space boundaries, which means
that selection applies to features and some particular
ranges of their variability, respectively. Labelled char-
acter of the data for classification adds an additional
dimension for potential selection.
All the features have typically open domains allow-
ing for unlimited variability of (−∞, +∞). How-
ever there could be many reasons for limiting these
domains by selecting the narrow range of valid fea-
ture variability. One of such reasons could be filtering
out the outliers - samples laying far from the areas
with high data concentrations. To accommodate out-
liers, the classification model has to stretch model pa-
rameters such that a single distant data point has much
grater influence on the model than many points within
dense regions of the input space. The domain can be
limited by a single or multiple ranges or valid vari-
ability for each feature. In the special case the domain
range can be reduced to none which is equivalent to
the exclusion of such feature.
As mentioned above feature selection is a special
case of domain selection but due to its simplicity de-
serves separate treatment. Feature selection has two
attractive aspects to consider. First of all selecting
some instead of all features significantly reduces com-
putational costs of classification algorithms which are
typically at least quadratically complex with respect
to the number of features. Secondly, in practice many
features are noncontributory to the classifier perfor-
mance and sometimes due to imperfect learning al-
gorithms can even cause deterioration. Features can
be selected along with their variability range limits.
Such scenario is equivalent to selection of particular
clusters or subspaces in the input data, such as selec-
tion of classes of data.
The emergence of classes of data adds another de-
gree of freedom in selection process related to data.
However rather than another dimension of selection
it appears to be a form of restriction on how the do-
mains in each features should be restricted. Selection
of classes of data is used in Error Correcting Output
Coding (ECOC) where the N-class problem is con-
verted into a large number of 2-class problems. Selec-
tion with respect to classes is particularly attractive if
there are expert classifiers which specialise in recog-
nising particular class or classes but are very week
in recognising other classes, in which case it makes
sense to decompose the problem rather than aggregate
performance over all classes.
2.2 Classifiers
Classifier selection is probably the most intuitive form
of selection with respect to classifier fusion. There are
MULTIDIMENSIONAL SELECTION MODEL FOR CLASSIFICATION
227