high number of irrelevant attributes. In this context,
Zhu et al. (Zhu et al., 2010) introduced two new defi-
nitions of multiclass relevancy features: full class rel-
evant (FCR) and partial class relevant (PCR) features.
On the one hand, FCR features are useful for distin-
guishing any type of cancer. On the other hand, PCR
features only help to identify subsets of cancer types.
SD1, SD2 and SD3 are three-class synthetic
datasets with 75 samples (each class containing 25
samples) and 4000 irrelevant features, generated fol-
lowing the directions given in (D´ıaz-Uriarte and
De Andres, 2006). The number of relevant features
is 20, 40 and 60, respectively, which are divided in
groups of 10. Within each group of 10 features, only
one of them must be selected, since they are redun-
dant with each other.
To sum up, the characteristics of these three
datasets are depicted in Table 1, where one can see
the number of features, the number of features and
samples and the relevant attributes which should be
selected by the feature selection method, as well as
the number of full class relevant (FCR) and partial
class relevant (PCR) features. Notice that G
i
means
that the feature selection method must select only one
feature within the i-th group of features.
Table 1: Characteristics of SD1, SD2 and SD3 datasets.
Dataset
No. of No. of Relevant No. of No. of
features samples features FCR PCR
SD1 4020 75 G
1
, G
2
20 –
SD2
4040 75 G
1
− G
4
30 10
SD3
4060 75 G
1
− G
6
– 60
It has to be noted that the easiest dataset in order to
detect relevant features is SD1, since it contains only
FCR features and the hardest one is SD3, due to the
fact that it contains only PCR genes, which are more
difficult to detect.
For assessing the scalability of the mRMR
method, different configurations of these datasets
were used. In particular, the number of features
ranges from 2
6
to 2
12
whilst the number of samples
ranges from 3
2
to 3
5
(all pairwise combinations). No-
tice that the number of relevant features is fixed (2 for
SD1, 4 for SD2 and 6 for SD3) and it is the number
of irrelevant features the one that varies. When the
number of samples increases, the new instances are
randomly generated.
3.2 Evaluation Metrics
At this point, it is necessary to remind that mRMR
does not return a subset of selected features, but a
ranking of the features where the most relevant one
should be ranked first. The goal of this research
is to assess the scalability of mRMR feature selec-
tion method. For this purpose, some evaluation mea-
sures need to be defined, motivated by the measures
proposed in (Zhang et al., 2009). One
error, cover-
age, ranking loss, average precision and training time
were considered. In all measures, feat
sel is the
ranking of features returned by the mRMR method,
feat
rel is the subset of relevant features and feat irr
stands for the subset of irrelevant features. Notice that
all measures mentioned below except training time
are bounded between 0 and 1.
• The one
error measure evaluates if the top-ranked
(the first selected in the ranking) feature is not in
the set of relevant features.
one
error =
1; feat
sel(1) 6∈ ( feat rel)
0;otherwise
• The coverage evaluates how many steps are
needed, on average, to move down the ranking in
order to cover all the relevant features. At worst,
last ranking feature would be relevant so cover-
age would be 1 (since this measure is bounded
between 0 and 1).
coverage =
max( feat
sel( feat rel(i)))
#feat sel
• The ranking loss evaluates the number of irrele-
vant features that are better ranked than the rel-
evant ones. The fewer irrelevant features are on
the top of the ranking, the best classified are the
relevant ones.
ranking loss =
(coverage ∗ #feat
sel) − #feat rel
#feat rel ∗ # feat irr
• The average precision: evaluates the mean of av-
erage fraction of relevant features ranked above a
particular feature of the ranking.
average
precision =
1
#feat rel
∗
∑
j; feat sel( j) ∈ feat rel ∩ j<i
i;f eat rel(i)
• The training time is reported in seconds.
For example, suppose we have 4 relevant fea-
tures, x
1
, . . . , x
4
, 4 irrelevant features, x
5
, . . . , x
8
and the following ranking returned by mRMR:
x
5
, x
3
, x
8
, x
1
, x
4
, x
2
, x
7
, x
6
. In this case, the one
error
is 1, because the first feature in the ranking is not a
relevant one. For calculating the coverage, it is nec-
essary to move down 6 steps in the ranking to cover
all the relevant features. Regarding the ranking
loss,
there are 2 irrelevant features better ranked than the
relevant ones. As for the average
precision, the num-
ber of relevant features ranked above each feature of
the ranking are the following: 0, 0, 1, 1, 2, 3, 4, 4.
ICAART2014-InternationalConferenceonAgentsandArtificialIntelligence
382