Scalability Analysis of mRMR for Microarray Data

Diego Rego-Fern´andez, Ver´onica Bol´on-Canedo and Amparo Alonso-Betanzos

Department of Computer Science, University of A Coru˜na, Campus de Elvi˜na s/n, A Coru˜na 15071, Spain

Keywords:

Feature Selection, Microarray, Machine Learning, Scalability.

Abstract:

Lately, derived from the Big Data problem, researchers in Machine Learning became also interested not only

in accuracy, but also in scalability. Although scalability of learning methods is a trending issue, scalability of

feature selection methods has not received the same amount of attention. In this research, an attempt to study

scalability of both Feature Selection and Machine Learning on microarray datasets will be done. For this sake,

the minimum redundancy maximum relevance (mRMR) ﬁlter method has been chosen, since it claims to be

very adequate for this type of datasets. Three synthetic databases which reﬂect the problematics of microarray

will be evaluated with new measures, based not only in an accurate selection but also in execution time. The

results obtained are presented and discussed.

1 INTRODUCTION

The advent of DNA microarray technology has

brought the possibility of simultaneously measuring

the expressions of thousands of genes. However, due

to the high cost of experiments, sample sizes of gene

expression measurements remain in hundreds, a very

small number compared to tens of thousands of genes

involved (Mundra and Rajapakse, 2010). Theoreti-

cally, having more genes should give more discrimi-

nating power. But actually, this fact can cause several

problems, such as increasing computational complex-

ity and cost, too many redundant or irrelevant genes

and estimation degradation in the classiﬁcation er-

ror. Having much higher number of attributes than in-

stances causes difﬁculties for most machine learning

methods, since they cannot generalize adequately and

therefore, they obtain very poor test performances. To

deal with this problem, and according to Occams ra-

zor (Blumer et al., 1987), the need to reduce dimen-

sionality was soon recognized and several works have

used methods of feature (gene) selection (Saeys et al.,

2007).

Feature selection consists of detecting the rele-

vant features and discarding the irrelevant ones to re-

duce the input dimensionality, and most of the time,

to achieve an improvement in performance (Guyon,

2006). Moreover, several studies show that most

genes measured in a DNA microarray experiment are

not relevant for an accurate distinction among differ-

ent classes of the problem (Golub et al., 1999). To

avoid this curse of dimensionality (Jain and Zongker,

1997), feature selection plays a crucial role in DNA

microarray analysis. Although the efﬁciency of fea-

ture selection in this domain (and in other areas with

high dimensional datasets), is out of doubt, it is of-

ten forgotten in discussions of scaling, which is an

important issue when dealing with high dimensional

datasets, as it is the case in this research.

Among the different feature selection methods

(Guyon, 2006), ﬁlters only rely on general character-

istics of the data, and not on the learning machines;

therefore, they are faster, and more suitable for large

data sets. A common practice in this approach is to

simply select the top-ranked genes where the ranks

are determined by some dependence criteria, and the

number of genes to retain is usually set by human

intuition with trial-and-error. A deﬁciency of this

ranking approach is that the selected features could

be dependent among themselves. Therefore, a mini-

mum Redundancy Maximum Relevance (mRMR) ap-

proach is preferred in practice (Peng et al., 2005), that

also minimizes the dependence among selected fea-

tures. This ﬁlter method has been widely used to deal

with microarray data (Mundra and Rajapakse, 2010;

Zhang et al., 2008; El Akadi et al., 2011). However,

it is a computationally expensive method and its scal-

ability should be evaluated. Therefore, this prelim-

inary research will be focused on the scalability of

the mRMR method over an artiﬁcial controlled ex-

perimental scenario, paving the way to its application

to real microarray datasets.

The rest of the paper is organized as follows: sec-

tion 2 describes the mRMR feature selection method,

380

Rego-Fernández D., Bolón-Canedo V. and Alonso-Betanzos A..

Scalability Analysis of mRMR for Microarray Data.

DOI: 10.5220/0004807703800386

In Proceedings of the 6th International Conference on Agents and Artiﬁcial Intelligence (ICAART-2014), pages 380-386

ISBN: 978-989-758-015-4

 2014 SCITEPRESS (Science and Technology Publications, Lda.)

section 3 introduces the experimental settings, sec-

tion 4 presents the experimental results and, ﬁnally,

section 5 reveals the conclusions and future lines of

research.

2 THE FILTER: mRMR

As mentioned in the Introduction, ﬁlters are more

suitable for large datasets, as it is the case in this re-

search. Within ﬁlters, one can distinguish between

univariate and multivariate methods (Bol´on-Canedo

et al., 2013). Univariate methods (such as ﬁlters

which just evaluate the information gain between a

feature and the class label) are fast and scalable, but

ignore feature dependencies so the features could be

correlated among themselves. On the other hand,

multivariate ﬁlters (such as mRMR) model feature de-

pendencies and detect redundancy, but at the cost of

being slower and less scalable than univariate tech-

niques.

To rank the importance of the features of the

datasets included in this research, the mRMR method,

that was ﬁrst developed by Peng, Long and Ding

(Peng et al., 2005) was used for the analysis of mi-

croarray data. The mRMR method can rank features

based on their relevance to the target, and at the same

time, the redundancy of features is also considered.

Features that have the best trade-off between max-

imum relevance to target and minimum redundancy

are considered as “good” features. The feature selec-

tion purpose is to ﬁnd maximum dependency, a fea-

ture set S with m features {x

}, which have the largest

dependency on the target class c, described by the au-

thors (Peng et al., 2005) as:

max D(S, c), D = I({x

, i = 1, ..., m};c)

Implementing the maximum dependency criterion

is not an easy-to-solve task because of the charac-

teristics of high-dimensional spaces. Speciﬁcally,

the number of samples is often insufﬁcient and,

moreover, estimating the multivariate density usually

implies expensive computations. An alternative is

to determine the maximum relevance criterion. The

maximum relevance consists of searching features

which satisfy the following equation:

maxD(S, c), D =

|S|

∑

∈S

I(x

;c) (1)

Selecting the features according to the maximum

relevance criterion can bring a large amount of

redundancy. Therefore, the following criterion of

minimum redundancy must be added, as suggested

by (Peng et al., 2005) :

min R(S), R =

|S|

∑

∈S

I(x

, x

)

Combining the above two criteria and trying to

optimize D and R at the same time, the criterion

called minimum redundancy maximum relevance

(mRMR) arises.

max Φ(D, R), Φ = D− R

In practice, the next incremental algorithm can be

employed:

max

∈X−S

m−1

[I(x

;c) −

m−1

∑

∈S

m−1

I(x

)]

As mentioned above, mRMR is a multivariate

method so it is expected to be slow and its scalabil-

ity might be compromised. For these reason it is very

interesting to perform a scalability study, which will

be presented in next sections.

3 EXPERIMENTAL SECTION

3.1 Materials

Three synthetic datasets were chosen to evaluate the

scalability of mRMR. Several authors choose to use

artiﬁcial data since the desired output is known,there-

fore a feature selection algorithm can be evaluated

with independence of the classiﬁer used. Although

the ﬁnal goal of a feature selection method is to test its

effectiveness over a real dataset, the ﬁrst step should

be on synthetic data. The reason for this is twofold

(Belanche and Gonz´alez, 2011):

1. Controlled experiments can be developed by sys-

tematically varying chosen experimental condi-

tions, like adding more irrelevant features. This

fact facilitates to draw more useful conclusions

and to test the strengths and weaknesses of the ex-

isting algorithms.

2. The main advantage of artiﬁcial scenarios is the

knowledge of the set of optimal features that must

be selected; thus, the degree of closeness to any

of these solutions can be assessed in a conﬁdent

way.

The three synthetic datasets selected (SD1, SD2

and SD3) (Zhu et al., 2010) reﬂect the problematic of

microarray data. They are challenging problems be-

cause of their high number of features (around 4,000)

and the small number of samples (75), besides of a

ScalabilityAnalysisofmRMRforMicroarrayData

381

high number of irrelevant attributes. In this context,

Zhu et al. (Zhu et al., 2010) introduced two new deﬁ-

nitions of multiclass relevancy features: full class rel-

evant (FCR) and partial class relevant (PCR) features.

On the one hand, FCR features are useful for distin-

guishing any type of cancer. On the other hand, PCR

features only help to identify subsets of cancer types.

SD1, SD2 and SD3 are three-class synthetic

datasets with 75 samples (each class containing 25

samples) and 4000 irrelevant features, generated fol-

lowing the directions given in (D´ıaz-Uriarte and

De Andres, 2006). The number of relevant features

is 20, 40 and 60, respectively, which are divided in

groups of 10. Within each group of 10 features, only

one of them must be selected, since they are redun-

dant with each other.

To sum up, the characteristics of these three

datasets are depicted in Table 1, where one can see

the number of features, the number of features and

samples and the relevant attributes which should be

selected by the feature selection method, as well as

the number of full class relevant (FCR) and partial

class relevant (PCR) features. Notice that G

means

that the feature selection method must select only one

feature within the i-th group of features.

Table 1: Characteristics of SD1, SD2 and SD3 datasets.

Dataset

No. of No. of Relevant No. of No. of

features samples features FCR PCR

SD1 4020 75 G

, G

20 –

SD2

4040 75 G

− G

30 10

SD3

4060 75 G

− G

– 60

It has to be noted that the easiest dataset in order to

detect relevant features is SD1, since it contains only

FCR features and the hardest one is SD3, due to the

fact that it contains only PCR genes, which are more

difﬁcult to detect.

For assessing the scalability of the mRMR

method, different conﬁgurations of these datasets

were used. In particular, the number of features

ranges from 2

to 2

whilst the number of samples

ranges from 3

to 3

(all pairwise combinations). No-

tice that the number of relevant features is ﬁxed (2 for

SD1, 4 for SD2 and 6 for SD3) and it is the number

of irrelevant features the one that varies. When the

number of samples increases, the new instances are

randomly generated.

3.2 Evaluation Metrics

At this point, it is necessary to remind that mRMR

does not return a subset of selected features, but a

ranking of the features where the most relevant one

should be ranked ﬁrst. The goal of this research

is to assess the scalability of mRMR feature selec-

tion method. For this purpose, some evaluation mea-

sures need to be deﬁned, motivated by the measures

proposed in (Zhang et al., 2009). One

error, cover-

age, ranking loss, average precision and training time

were considered. In all measures, feat

sel is the

ranking of features returned by the mRMR method,

feat

rel is the subset of relevant features and feat irr

stands for the subset of irrelevant features. Notice that

all measures mentioned below except training time

are bounded between 0 and 1.

• The one

error measure evaluates if the top-ranked

(the ﬁrst selected in the ranking) feature is not in

the set of relevant features.

one

error =



1; feat

sel(1) 6∈ ( feat rel)

0;otherwise

• The coverage evaluates how many steps are

needed, on average, to move down the ranking in

order to cover all the relevant features. At worst,

last ranking feature would be relevant so cover-

age would be 1 (since this measure is bounded

between 0 and 1).

coverage =

max( feat

sel( feat rel(i)))

#feat sel

• The ranking loss evaluates the number of irrele-

vant features that are better ranked than the rel-

evant ones. The fewer irrelevant features are on

the top of the ranking, the best classiﬁed are the

relevant ones.

ranking loss =

(coverage ∗ #feat

sel) − #feat rel

#feat rel ∗ # feat irr

• The average precision: evaluates the mean of av-

erage fraction of relevant features ranked above a

particular feature of the ranking.

average

precision =

#feat rel

∗

∑

j; feat sel( j) ∈ feat rel ∩ j<i

i;f eat rel(i)

• The training time is reported in seconds.

For example, suppose we have 4 relevant fea-

tures, x

, . . . , x

, 4 irrelevant features, x

, . . . , x

and the following ranking returned by mRMR:

, x

. In this case, the one

error

is 1, because the ﬁrst feature in the ranking is not a

relevant one. For calculating the coverage, it is nec-

essary to move down 6 steps in the ranking to cover

all the relevant features. Regarding the ranking

loss,

there are 2 irrelevant features better ranked than the

relevant ones. As for the average

precision, the num-

ber of relevant features ranked above each feature of

the ranking are the following: 0, 0, 1, 1, 2, 3, 4, 4.

ICAART2014-InternationalConferenceonAgentsandArtificialIntelligence

382

Motivated by the methodology proposed in (Son-

nenburg et al., 2008), we deﬁne 5 ﬁgures from

which 13 scalar measures are extracted. Note that

the evaluation of mRMR algorithm relies on the bi-

dimensional features-samples space (X-Y -axes). So,

these evaluation measures shape a surface (Z-axis) in

a three-dimensional space.

• One

error surface: Feature size vs Sample size vs

One

error. It is obtained by displaying the evolu-

tion of the One error measure across the feature-

sample space. The following scalar measures are

computed:

1. OeMin: the minimum amount of data (features

x samples) for which the One

error measure

achieves its minimum value.

2. VuOe: volume under the One

error surface.

• Coverage surface: Feature size vs Sample size vs

Coverage. It is obtained by displaying the evo-

lution of the Coverage across the feature-sample

space.

3. Coverage: minimum coverage.

4. Co5%: the minimum amount of data (features

x samples) for which the coverage drops below

a threshold (5% of coverage).

5. VuCo: volume under the coverage surface.

• Ranking

loss surface: Feature size vs Sample size

vs Ranking loss. It is obtained by displaying the

evolution of the ranking

loss across the feature-

sample space.

6. Ranking loss: minimum ranking loss.

7. Rl5%: the minimum amount of data (features x

samples) for which the ranking

loss drops be-

low a threshold (5% of ranking

loss).

8. VuRl: volume under the ranking loss surface.

• Average precision surface: Feature size vs Sam-

ple size vs Average

precision. It is obtained by

displaying the evolution of the average precision

across the feature-sample space.

9. Average precision: maximum aver-

age

precision.

10. Ap95%: the minimum amount of data

(features x samples) for which the aver-

age

precision rises above a threshold (95% of

average

precision).

11. VuAp: volume under the Average

precision sur-

face.

• Training time surface: Feature size vs Sample

size vs Traning time. It is obtained by display-

ing the evolution of the average

precision across

the feature-sample space.

12. Training time: training time in seconds for the

maximum amount of data tested.

13. VuTt: volume under the training time surface.

Those measures related to One

error, Coverage,

Ranking

loss and Training time (i.e. VuOe, Cover-

age, VuCo, Ranking loss, VuRl, Training time and

VuTt) are desirable to be minimized, whilst those re-

lated to Average

precison and amount of data (i.e.

Average precison, VuAp, Co5%, Rl5% and Ap95%)

are desirable to be maximized.

4 RESULTS

This section shows the scalability results for mRMR

according to the measures explained above. Figure 1

plots these measures of scalability after applying a 10-

fold cross validation. Remind that all the metrics but

Average

precision are desirable to be minimized. In

general terms, One error, Coverage and Ranking loss

are more inﬂuenced by sample size whilst the training

time is more affected by feature size. In the case of

Average

precision, which should be maximized, this

measure seems to be affected by feature size, since

having more features would make harder the task of

ranking the relevant features on top. Notice that in

the ﬁgures related with Coverage and Ranking

loss

the X-Y axes are shifted for visualization purposes.

As expected, the best results on the measures that

assess the adequacy for selecting the most relevant

features in the highest positions of the ranking (Cov-

erage, Ranking

loss and Average precision) are ob-

tained on SD1 (the easiest dataset) whilst the perfor-

mance deteriorates on SD2 and SD3. It has to be no-

ticed that the coverage depends on the dataset, since

the number of relevant features has inﬂuence on the

calculation of this measure. Regarding One

error, it

can be seen for all the three datasets that, in most of

the cases, the top ranked feature is not in the subset

of relevant features, which gives an idea of the hard

challenge of the microarray problem.

Regarding the training time (see Figures 1(m),

1(n) and 1(o)), mRMR is sharply affected by the fea-

ture size (as expected for a multivariate ﬁlter tech-

nique), remaining almost constant with respect to the

sample size.

Table 2 depicts the thirteen scalar measures related

with Figure 1. These results conﬁrm the trends seen

in Figure 1, reﬂecting the adequacy of these mea-

sures which are reliable and conﬁdent and can give

us a global picture of the scalability properties of the

mRMR ﬁlter method. In terms of Coverage and Rank-

ing loss, it can be seen that mRMR achieves good

results, especially on SD1 dataset. In fact, for this

ScalabilityAnalysisofmRMRforMicroarrayData

383

2000

4000

100

200

0.5

feat size

sample size

one_error

(a) One error surface in SD1

2000

4000

100

200

0.4

0.6

0.8

feat size

sample size

one_error

(b) One error surface in SD2

2000

4000

100

200

0.8

0.9

feat size

sample size

one_error

2000

4000

100

200

0.2

0.4

feat size

sample size

coverage

(d) Coverage surface in SD1

2000

4000

100

200

0.5

feat size

sample size

coverage

(e) Coverage surface in SD2

2000

4000

100

200

0.5

feat size

sample size

coverage

(f) Coverage surface in SD3

2000

4000

100

200

0.1

0.2

feat size

sample size

ranking_loss

(g) Ranking loss surface in SD1

2000

4000

100

200

0.1

0.2

feat size

sample size

ranking_loss

(h) Ranking loss surface in SD2

2000

4000

100

200

0.1

0.2

feat size

sample size

ranking_loss

(i) Ranking loss surface in SD3

2000

4000

100

200

0.7

0.8

0.9

feat_size

sample_size

average_precision

(j) Average precision surface in SD1

2000

4000

100

200

0.7

0.8

0.9

feat_size

sample_size

average_precision

(k) Average precision surface in SD2

2000

4000

100

200

0.7

0.8

0.9

feat_size

sample_size

average_precision

(l) Average precision surface in SD3

2000

4000

100

200

500

1000

feat size

sample size

time(s)

(m) Training time surface in SD1

2000

4000

100

200

500

1000

feat size

sample size

time(s)

(n) Training time surface in SD2

2000

4000

100

200

500

1000

feat size

sample size

time(s)

(o) Training time surface in SD3

Figure 1: Measures of scalability of mRMR ﬁlter in the SD1 (Figures a, d, g, j and m), SD2 (Figures b, e, h, k and n) and SD3

datasets (Figures c, f, i, l and o).

dataset, the minimum value of these metrics is really

close to zero. As for the Average

precision, it is re-

markable the result obtained on SD1, which obtains a

maximum value very close to one, and an acceptable

value (95 % of the maximum) is achieved with a small

number of data (15552).

Table 3 shows an overview of the behavior of

mRMR according to the different evaluation metrics

over the different datasets studied, where the larger

the number of dots, the better the behavior. To eval-

uate the goodness of the method it was computed a

trade-off between the scalability in terms of number

ICAART2014-InternationalConferenceonAgentsandArtificialIntelligence

384

Table 2: Evaluation metrics of mRMR ﬁlter in the SD1,

SD2 and SD3 datasets.

Measure SD1 SD2 SD3

One error 0.4 0.5 0.8

Oe5% 9216 4608 9216

VuOe 17.00 17.38 17.70

Coverage 0.0017 0.0293 0.1474

Co5% 995328 497664 497664

VuCo 4.49 9.70 11.79

Ranking loss 0.0006 0.0068 0.0115

Rl5% 995328 497664 15552

VuRl 2.19 2.39 1.93

Average precision 0.9990 0.9893 0.9676

Av95% 15552 124416 248832

VuAp 15.61 13.85 13.25

Training time 1179 1144 1162

VuTt 2577 2577 2589

Table 3: Overview of the behavior regarding scalability of

mRMR on the SD1, SD2 and SD3 datasets.

Measure SD1 SD2 SD3

One error • • •

Coverage ••• •• ••

Ranking

loss •••• ••• •

Average precision •••• ••• ••

Training time ••• ••• •••

0 1000 2000 3000 4000

200

400

600

800

1000

1200

feat size

sample size

SD1

SD2

SD3

Figure 2: Training time vs. number of features for mRMR

on SD1, SD2 and SD3 datasets.

of samples and number of features. In this manner, it

is easy to see at a glance that mRMR does not achieve

good results in terms of One

error, whilst shows

strength in terms of Coverage and Ranking loss, es-

pecially with SD1. With regard to the training time,

the difﬁculty of the dataset has little impact on the

time required to apply the ﬁlter, since it is almost con-

stant for the three datasets tested. As can be seen in

Figure 2, the training time is not linear for the num-

ber of features employed. In fact, when using 2000

features, the training time takes around 300 seconds,

while when using double of features (4000), the train-

ing time increases by a factor of four.

5 CONCLUSIONS

With the advent of high dimensional scenarios in ma-

chine learning, scalability is becoming a very impor-

tant trending issue. An algorithm is said to be scalable

if it is suitable, efﬁcient and practical when applied to

large datasets. However, the current state is that the

issue of scalability is far from being solved although

is present in a diverse set of problems such as learn-

ing, clustering or feature selection.

In this research, our attention was focused on the

scalability of feature selection, that has not received

yet as much consideration in the literature as in the

case of learning. In particular, this work is devoted

to analyze the scalability of the well-known mRMR

ﬁlter method, which is said to be suitable for microar-

ray datasets. The method was evaluated over three

synthetic datasets which reﬂect the problematic of mi-

croarray data. For analyzing scalability, these mea-

sures needed to be based not only in the accuracy of

the selection, but also taking into account the execu-

tion time. Finally, the adequacy of the proposed mea-

sures to give a global picture on the mRMR method

on the issue of scalability was shown.

In terms of accuracy of the selection, the mRMR

method was demonstrated to be suitable and scalable

for microarray datasets, since for most of the eval-

uation measures an increase in the amount of data

does not produce a signiﬁcantly degradation in per-

formance. As for the training time, this ﬁlter is multi-

variate, and so the time raises exponentially when the

number of features increases.

For future work, we plan to extend this research

to other datasets and feature selection methods (ﬁl-

ters, wrappers and embedded) in order to draw reli-

able conclusions. A methodology for fusing the pro-

posed evaluation measures seems to be also necessary

when comparing different methods so as to be able to

obtain a ranking of the results, to establish ﬁnal con-

clusions.

ACKNOWLEDGEMENTS

This research has been partially funded by the Sec-

retar´ıa de Estado de Investigaci´on of the Spanish

Government and FEDER funds of the European

Union through the research project TIN 2012-37954.

Ver´onica Bol´on-Canedo acknowledges the support of

Xunta de Galicia under Plan I2C Grant Program.

ScalabilityAnalysisofmRMRforMicroarrayData

385

REFERENCES

Belanche, L. and Gonz´alez, F. (2011). Review and evalua-

tion of feature selection algorithms in synthetic prob-

lems. arXiv preprint arXiv:1101.2320.

Blumer, A., Ehrenfeucht, A., Haussler, D., and Warmuth,

M. K. (1987). Occam’s razor. Information processing

letters, 24(6):377–380.

Bol´on-Canedo, V., S´anchez-Maro˜no, N., and Alonso-

Betanzos, A. (2013). A review of feature selection

methods on synthetic data. Knowledge and informa-

tion systems, 34(3):483–519.

D´ıaz-Uriarte, R. and De Andres, S. A. (2006). Gene selec-

tion and classiﬁcation of microarray data using ran-

dom forest. BMC bioinformatics, 7(1):3.

El Akadi, A., Amine, A., El Ouardighi, A., and Abouta-

jdine, D. (2011). A two-stage gene selection scheme

utilizing mRMR ﬁlter and ga wrapper. Knowledge and

Information Systems, 26(3):487–500.

Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasen-

beek, M., Mesirov, J. P., Coller, H., Loh, M. L., Down-

ing, J. R., Caligiuri, M. A., et al. (1999). Molecu-

lar classiﬁcation of cancer: class discovery and class

prediction by gene expression monitoring. Science,

286(5439):531–537.

Guyon, I. (2006). Feature extraction: foundations and ap-

plications, volume 207. Springer.

Jain, A. and Zongker, D. (1997). Feature selection: Evalua-

tion, application, and small sample performance. Pat-

tern Analysis and Machine Intelligence, IEEE Trans-

actions on, 19(2):153–158.

Mundra, P. A. and Rajapakse, J. C. (2010). Svm-rfe with

mrmr ﬁlter for gene selection. NanoBioscience, IEEE

Transactions on, 9(1):31–37.

Peng, H., Long, F., and Ding, C. (2005). Feature se-

lection based on mutual information criteria of max-

dependency, max-relevance, and min-redundancy.

Pattern Analysis and Machine Intelligence, IEEE

Transactions on, 27(8):1226–1238.

Saeys, Y., Inza, I., and Larra˜naga, P. (2007). A review of

feature selection techniques in bioinformatics. Bioin-

formatics, 23(19):2507–2517.

Sonnenburg, S., Franc, V., Yom-Tov, E., and Sebag, M.

(2008). Pascal large scale learning challenge. In

25th International Conference on Machine Learning

(ICML2008) Workshop. J. Mach. Learn. Res, vol-

ume 10, pages 1937–1953.

Zhang, M.-L., Pe˜na, J. M., and Robles, V. (2009). Feature

selection for multi-label naive bayes classiﬁcation. In-

formation Sciences, 179(19):3218–3229.

Zhang, Y., Ding, C., and Li, T. (2008). Gene selection al-

gorithm by combining ReliefF and mRMR. BMC ge-

nomics, 9(Suppl 2):S27.

Zhu, Z., Ong, Y.-S., and Zurada, J. M. (2010). Identiﬁcation

of full and partial class relevant genes. Computational

Biology and Bioinformatics, IEEE/ACM Transactions

on, 7(2):263–277.

ICAART2014-InternationalConferenceonAgentsandArtificialIntelligence

386