
two broad categories: wrappers, Kohavi (Kohavi and
John, 1997), which employ a statistical re-sampling
technique (such as cross validation) using the actual
target learning algorithm to estimate the accuracy of
feature subsets. This approach has proved to be use-
ful but is very slow to execute because the learning
algorithm is called upon repeatedly. Another option
called filter, operates independently of any learning
algorithm. Undesirable features are filtered out of the
data before induction begins. Filters use heuristics
based on general characteristics of the data to eval-
uate the merit of feature subsets. As a consequence,
filter methods are generally much faster than wrapper
methods, and, as such, are more practical for use on
data of high dimensionality. LVF (Liu and Setiono,
1996) use class consistency as an evaluation measure.
One method called Chi2 (Liu and Setiono, 1995) real-
ize selection by discretization . Relief (Kira and Ren-
dell, 1992) works by randomly sampling an instance
from the data, and then locating its nearest neighbour
from the same and opposite class. Relief was origi-
nally defined for two-class problems and was later ex-
panded as ReliefF (Kononenko, 1994) to handle noise
and multi-class data sets, and RReliefF handles re-
gression problems. Other authors suggest Neuronal
Networks for attribute selector. In addition, learn-
ing procedures can be used to select attributes, like
ID3 (Quinlan, 1986), FRINGE (Pagallo and Haussler,
1990) and C4.5 (Quinlan, 1993) as well as methods
based on correlations like CFS (Hall, 1997).
The most important characteristics of our feature
selection algorithm, called SOAP (Ruiz et al., 2002)
(Selection of Attributes by Projection), are very simi-
lar to that of EOP.
3.2 Feature selection
In this paper, we propose a new feature selection cri-
terion not based on measures calculated between at-
tributes, or complex and costly distance calculations.
This criterion is based on a unique value called NLC.
It relates each attribute with the label used for classifi-
cation. This value is calculated by projecting data set
elements onto the respective axis of the attribute (or-
dering the examples by this attribute), then crossing
the axis from the beginning to the greatest attribute
value, and counting the Number of Label Changes
(NLC) produced.
Consider the situation depicted in Figure 2: the pro-
jection of the examples on the abscissa axis produces
three ordered sequences {O; E; O} corresponding to
the examples {[1,3,5],[8,4,10,2,6],[7,9]}. Identically,
with the projection on the ordinate axis, we can ob-
tain the sequences {O; E; O; E; O; E} formed by the
examples {[8],[7,5],[10,6],[9,3],[4,2],[1]}. Then, we
calculate the Number of Label Changes, NLC. Two
for the first attribute and five for the second.
E 8
7
5
10
6
9
3
4
2
O 1
O E O
O
E
O
E
Figure 2: Results of applying SOAP
We conclude that it will be easier to classify by at-
tributes with the smallest number of label changes. If
the attributes are in ascending order according to the
NLC, we obtain a ranking list with the better attributes
from the point of view of the classification.
We have dealt with eighteen databases from the
UCI repository [3]. To show the performance of our
method we have used k-NN and C4.5 before and af-
ter applying EOP. Results obtained (Ruiz et al., 2002)
prove the validity of the method.
Table 2: SOAP algorithm
Input: E training (N ex., M att.)
Output: E reduced (N ex., K att.)
for each attribute a
sort E in increasing order
count label changes
ranking attributes by NLC
choose the k first attributes
4 INTEGRATION OF
REDUCTION TECHNIQUES
The size of a data set can be measured in two dimen-
sions, number of features and number of instances.
Both can be very large. This enormity may cause se-
rious problems to many data mining systems.
Our approach is to reduce the database in the two
directions, vertically and horizontally, applying the
aforementioned algorithms sequentially.
The algorithm is very simple and efficient. The
computational cost of EOP and SOAP is O(m × n
× log n), being the lowest of its category. Therefore,
the new algorithm is efficient too.
Figure 3 shows the process to reduce a database
with two thousand examples and forty one attributes,
the last feature being the class. There are three posi-
ble labels A,B,C. At the beginning, vertical reduction
is applied with the algorithm EOP. The number of ex-
amples decreases to three hundred and fifty. Then,
ICEIS 2004 - ARTIFICIAL INTELLIGENCE AND DECISION SUPPORT SYSTEMS
100