Performance Evaluation and Enhancement of Biclustering Algorithms

Jeffrey Dale

, America Nishimoto

and Tayo Obafemi-Ajayi

Department of Computer Science, Missouri State University, Springﬁeld, MO, U.S.A.

Engineering Program, Missouri State University, Springﬁeld, MO, U.S.A.

Keywords:

Biclustering, Evaluation, Gene Expression Pattern Recognition, Validation Measures.

Abstract:

In gene expression data analysis, biclustering has proven to be an effective method of ﬁnding local patterns

among subsets of genes and conditions. The task of evaluating the quality of a bicluster when ground truth

is not known is challenging. In this analysis, we empirically evaluate and compare the performance of eight

popular biclustering algorithms across 119 synthetic datasets that span a wide range of possible bicluster struc-

tures and patterns. We also present a method of enhancing performance (relevance score) of the biclustering

algorithms to increase conﬁdence in the signiﬁcance of the biclusters returned based on four internal validation

measures. The experimental results demonstrate that the Average Spearman’s Rho evaluation measure is the

most effective criteria to improve bicluster relevance with the proposed performance enhancement method,

while maintaining a relatively low loss in recovery scores.

1 INTRODUCTION

Biclustering is an effective unsupervised learning tool

for discovering patterns of co-regulated/co-expressed

genes across a subset of samples in gene expression

data analysis (Madeira and Oliveira, 2004; Pontes

et al., 2015a). As the name implies, clustering is

performed simultaneously on both the row and col-

umn dimensions to discover biclusters, which are de-

ﬁned as submatrices in which the group of rows be-

have similarly across the subset of columns contained

in the submatrix. Biclustering is a special case of

pattern-based clustering algorithms (Kriegel et al.,

2009).

In traditional clustering methods, the objective is

to subdivide the entire data matrix into subgroups, or

clusters, which consist of rows (examples) that ex-

hibit more homogeneous patterns across all columns

(features). In biclustering, these homogeneous sub-

groups, or biclusters, do not necessarily span all the

columns. This makes biclustering useful for identi-

fying possible relevant subspaces in the data. There

are two underlying assumptions in biclustering: (i)

the presence of irrelevant features, or of correlations

among subsets of features, may signiﬁcantly bias

the representation of clusters in the full-dimensional

space. By relaxing the constraint of global fea-

ture space, we could discover more meaningful sub-

groups; and (ii) different subsets of features may be

relevant for different clusters which implies that ob-

jects cluster in subspaces of the data, rather than

across an entire dimension.

In gene expression data analysis, such a problem

formulation is particularly useful because according

to the general understanding of cellular processes,

only a subset of genes is involved with a speciﬁc cel-

lular process, which becomes active only under some

experimental conditions (Xu and Wunsch II, 2011).

Usually, the expression levels of many genes are mea-

sured across a relatively small set of conditions or

samples, and the obtained gene expression data are

organized as a data matrix with rows corresponding to

genes and columns corresponding to samples or con-

ditions. However, such a practice is inherently limited

due to the existence of many uncorrelated genes with

respect to sample or condition clustering, or many

unrelated samples or conditions with respect to gene

clustering. Biclustering offers a solution to such prob-

lems by performing simultaneous clustering on both

dimensions as well as automatically integrating fea-

ture selection with clustering without any prior infor-

mation, so that the relations of clusters of genes (gen-

erally, features) and clusters of samples or conditions

(data objects) are established.

Note that the usefulness of biclustering methods

in discovering local patterns that exist among a sub-

set of rows and columns are also applicable to do-

mains beyond the context of gene expression data

202

Dale, J., Nishimoto, A. and Obafemi-Ajayi, T.

Performance Evaluation and Enhancement of Biclustering Algorithms.

DOI: 10.5220/0006662502020213

In Proceedings of the 7th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2018), pages 202-213

ISBN: 978-989-758-276-9

analysis such analysis of voting data (Hartigan, 1972)

or more recently, collaborative ﬁltering recommenda-

tion systems (Elnabarawy et al., 2016). In the past

two decades, there has been an inﬂux of multiple bi-

clustering algorithms proposed, as reviewed in (Preli

et al., 2006; Eren et al., 2012; Oghabian et al., 2014;

Pontes et al., 2015a; Roy et al., 2016). A recent re-

view by Pontes et al. (Pontes et al., 2015a) classiﬁes

30 different biclustering algorithms (which is still not

an exhaustive list of all methods proposed in litera-

ture) by their inherent method of determining optimal

biclusters. It separated the algorithms into two main

categories: those based on evaluation measures vs.

those non metric-based. Given that biclustering is an

unsupervised machine learning technique, a key issue

is how to evaluate/rank the goodness of the biclusters

returned by varied algorithms, especially in the situ-

ations where the algorithm return a large number of

biclusters. This task becomes particularly challeng-

ing in absence of ground truth.

The development of effective heuristic and suit-

able evaluation measures is of particular interest to the

biclustering research community (Cheng and Church,

2000; Pontes et al., 2007; Mukhopadhyay et al., 2009;

Ayadi et al., 2009; Pontes et al., 2010). These mea-

sures are based on inherent assumptions about pos-

sible bicluster patterns for gene expression data such

as shifting, scaling or a combination of both. Sev-

eral evaluation measures have been proposed for bi-

clustering algorithms (Pontes et al., 2015b). These

evaluation measures, also referred to as quality mea-

sures, attempt to quantify the goodness of the biclus-

ters. They can be regarded as internal validation mea-

sures as they are evaluating the biclusters based on

certain desired properties of possible patterns (shift-

ing, scaling, and combined). Pontes et al. in (Pontes

et al., 2015b) conducted a comparison analysis of 14

measures known in literature to assess their ability to

identify optimal biclusters based on shifting, scaling

or combined patterns. Their work identiﬁed two of

these measures, average Spearman’s rho (ASR) and

transposed virtual error (VE

) as been proﬁcient in

identifying all three types of biclusters.

In this work, we conduct a comparative empir-

ical analysis of the performance of a subset of the

30 algorithms recently identiﬁed in (Pontes et al.,

2015a), along with two improved algorithms that

were not included in the review, using a state of the

art benchmark synthetic dataset (Wang et al., 2016).

We propose an enhancement framework based on ap-

plication of two best-performing validation measures

(ASR and VE

) to enhance the performance of bi-

clustering methods speciﬁcally in terms of identify-

ing the optimal set of relevant biclusters returned by

the algorithms. In addition, we present a compar-

ative performance analysis of the proposed method

using two other commonly used measures: mean

squared residue (MSR) and scaling mean squared

residue (SMSR). We also apply a statistical measure

(Friedman’s test statistic (Conover and Iman, 1981))

to quantify the signiﬁcance of the improvements ob-

tained. The objective of this study is to provide em-

pirical evidence that can guide practical applications

of biclustering methods along with these measures in

discovering signiﬁcant and relevant biclusters. The

remainder of this paper is organized as follows. In

section 2, we present an overview of the algorithms

analyzed in this paper. Section 3 presents a descrip-

tion of the performance evaluation framework utilized

in this paper. The experimental results obtained is il-

lustrated and discussed in section 4 while the conclu-

sion is drawn in section 5.

2 BACKGROUND

2.1 Motivation

Our overall goal in this paper is twofold. The ﬁrst is

to evaluate empirically the performance of eight com-

monly used biclustering algorithms using benchmark

synthetic dataset that differ from previous surveys to

provide an insight into the overall performance of the

algorithms and an understanding of what types of ap-

plications it’s best suited for. Secondly, we are inter-

ested in improving the overall performance of these

algorithms using evaluation measures that have been

proposed in literature. In the context of this work,

an algorithm performs well if majority of the biclus-

ters returned by the method are relevant and if it dis-

covers (or retrieves) majority of the actual biclusters

present in the data. We formally quantify perfor-

mance based on relevance and recovery scores, as de-

ﬁned in section 3.1. Our hypothesis is that ASR and

will result in the most signiﬁcant improvement

given that they have been demonstrated to success-

fully identify biclusters of shifting, scaling and com-

bined patterns (known pattern concepts for gene ex-

pression data (Pontes et al., 2010)). We compare their

effect on the enhancement of these algorithms to MSR

and SMSR which have been demonstrated in litera-

ture as been effective internal validation measures for

only one class of bicluster patters, shifting and scaling

respectively.

It is commonly known that ﬁnding biclusters is an

NP-hard problem (Pontes et al., 2015b). Each algo-

rithm usually has its own internal method of guiding

its search for the optimal set of biclusters. Some are

Performance Evaluation and Enhancement of Biclustering Algorithms

203

based on using a heuristic search guided by evalua-

tion measures i.e. metric-based, of which MSR is

the most commonly used, while others are non-metric

based (Pontes et al., 2015a). Given that our goal is to

improve the outcome of these algorithms using evalu-

ation measures, we evaluate algorithms that belong to

both categories and analyze the effect of our proposed

method.

To ensure an unbiased systematic evaluation of

these methods, we selected algorithms that had freely

available implementations and have been readily

cited/used among the biclustering community. Two

of the methods described in this work were an ex-

tension/improvement of a prior method reviewed

among the 30 biclustering algorithms in (Pontes et al.,

2015a). In this work, we focused on the most re-

cent improved method (as in the case of UniBic and

BicPAMS described below). Table 1 presents an

overview of the 8 algorithms empirically evaluated

and analyzed in this work including the implemen-

tation source. They include both metric-based and

non-metric based approaches and span two decades.

To provide a context for the comparative analysis pre-

sented in this paper, we brieﬂy describe each biclus-

tering algorithm in a chronological order.

2.2 Review of Biclustering Algorithms

Cheng and Church (CC)

The Cheng and Church algorithm (Cheng and

Church, 2000) was the ﬁrst application of biclustering

to ﬁnding local similarity patterns in gene expression

data. CC is a deterministic greedy algorithm that ﬁnds

biclusters by minimizing the Mean Squared Residue

(MSR) score of a discovered submatrix. MSR is

an evaluation measure that is a measure of bicluster

homogeneity, as deﬁned in the next section. As a

heuristic-based method, it outputs a desired number

of biclusters k based on user deﬁned parameters k and

δ, which is the maximum acceptable MSR score.

Iterative Signature Algorithm (ISA)

The iterative signature algorithm (Bergmann et al.,

2003) is a non-deterministic method that discov-

ers biclusters even in the presence of noise and

overlapping biclusters. It deﬁnes the biclusters as

transcription modules (TM): a set of co-regulated

genes (rows) with relevant experimental conditions

(columns). Starting from a set of randomly selected

genes (or conditions), it iteratively reﬁnes the genes

and conditions until they match the deﬁnition of TM.

In the context of this work, we evaluate ISA as a non-

heuristic based method (Pontes et al., 2015a) though

in some others reviews (Preli

c et al., 2006; Eren

et al., 2012), it has been evaluated as a heuristic based

method due to its iterative greedy search approach.

Order-Preserving Submatrices

Algorithm (OPSM)

The Order-Preserving Submatrices Algorithm (Ben-

Dor et al., 2003) is a deterministic method for ﬁnding

biclusters that are deﬁned as order-preserving subma-

trices i.e. a set of rows and columns in the data matrix

in which all the values in the rows for the given set of

columns are strictly increasing (or similarly ordered

in the relaxed case). Using a probabilistic model de-

scribing biclusters hidden in otherwise random ma-

trices and statistical strategies, OPSM algorithm can

efﬁciently ﬁnd multiple, potentially overlapping bi-

clusters.

FLexible Overlapped biClustering

(FLOC)

The Flexible Overlapped Biclustering algorithm

(Yang et al., 2005) is a stochastic iterative based

method for ﬁnding biclusters, particularly overlap-

ping ones using a probabilistic model. It is an

evaluation-based approach that assesses the quality of

the biclusters using the mean residue function, similar

to the CC algorithm. It consists of two steps. In the

ﬁrst step, crude initial biclusters are constructed on a

probabilistic basis. The second step centers around

iteratively reﬁning these biclusters. This process in-

volves greedily removing rows or columns from the

bicluster in an effort to reduce the mean squared

residue score of the bicluster.

Factor Analysis for Bicluster Acquisition

(FABIA)

The Factor Analysis for Bicluster Acquisition method

(Hochreiter et al., 2010) is a generative multiplicative

model for discovering biclusters in expression data

by assuming a non-Gaussian signal distributions with

heavy tails. (This method was not included in the re-

view presented in (Pontes et al., 2015a).) In FABIA, a

bicluster is modeled as an outer product of two sparse

vectors. It is a fuzzy like clustering method that re-

turns probability of memberships. However, it can be

set to return crisp biclusters by setting user-deﬁned

threshold parameters.

ICPRAM 2018 - 7th International Conference on Pattern Recognition Applications and Methods

204

Table 1: Biclustering Algorithm Summary.

Algorithm (Year) Algorithm Type

Deterministic /

Implementation Source

Metric based

CC (2000) Greedy Search Yes / Yes (MSR) Python (Eren, 2013)

ISA (2003) Linear Algebra Yes / No R package isa2 (Cs

ardi et al., 2010)

OPSM (2003) Optimal Reordering Yes / No BicAT (Barkow et al., 2006)

FLOC (2005) Stochastic Greedy Search No / Yes (MSR) R Package BicARE (Gestraud, 2008)

FABIA (2010) Generative Biclustering No / No Bioconductor fabia (Hochreiter et al., 2010)

PPM

(2015) Probabilistic No / No Java (Chekouo and Murua, 2015)

UniBic

(2016) Graph-Based No / No C (Wang et al., 2016)

BicPAMS

(2017) Pattern-based No / No bicpams.com (Henriques et al., 2017)

Most recent pattern mining approach.

Improvement on the QUBIC algorithm.

Extension of Bayesian Biclustering Model.

sourceforge.net/projects/unibic/

Penalized Plaid Model (PPM)

The Penalized Plaid Model biclustering technique

(Chekouo and Murua, 2015) models biclusters using

a Bayesian framework. It is a modiﬁed extended ver-

sion of the Bayesian plaid model. The PPM method

fully accounts for a general overlapping structure,

which differs from other models that account for only

one dimensional overlapping such as in the Bayesian

Biclustering Model (Gu and Liu, 2008). Instead of us-

ing the sequential algorithm deﬁned in (Zhang, 2010),

the parameters in the Penalized Plaid model are found

all at once by a dedicated Markov chain Monte Carlo

sampler. It is a non-heuristic based approach.

UniBic

UniBic is an extension/improvement of the graph-

based biclustering method: QUBIC (Li et al., 2009).

In this work, we evaluate UniBic, which was not re-

viewed in (Pontes et al., 2015a) since it’s an improved

algorithm of QUBIC that was included in (Pontes

et al., 2015a). In QUBIC, the input data matrix is

initially transformed to a discrete integer rank matrix

prior to subsequent operations. A graph G is con-

structed based on this matrix in which nodes repre-

sent the rows (genes) and the edge weights are num-

ber of corresponding conditions (columns) between

two genes (rows). The biclustering problem is trans-

lated to ﬁnding heavy subgraphs in G.

UniBic (Wang et al., 2016) is very similar to

QUBIC with the exception of edge weight calcu-

lation. UniBic applies the longest common subse-

quence (LCS) algorithm to translate the input data

matrix to a rank matrix in which the rows are dis-

cretized as rank vectors. The n

smallest value in

each row is replaced with the integer n, with priority

in ties given to the leftmost value. Edge weight in the

graph is calculated as the magnitude of the maximal

LCS between nodes. UniBic demonstrates a strong

resilience to noise and can detect biclusters of both

shifting and scaling patterns.

Biclustering based on PAttern Mining

Software (BicPAMS)

BicPAMS (Henriques et al., 2017) is an aggregate of

state-of-the-art pattern mining approaches to the bi-

clustering problem. BicPAMS is the most recent pat-

tern mining algorithms, an improved version of prior

pattern mining biclustering algorithms since the ini-

tial publication of BicPAM (Henriques and Madeira,

2014a). Other prior versions of pattern-mining biclus-

tering algorithms that it extends include BicSPAM

(Henriques and Madeira, 2014b) (reviewed in (Pontes

et al., 2015a)), BiP (Henriques and Madeira, 2015),

and BicNET (Henriques and Madeira, 2016). Bic-

PAMS is a highly parametrized algorithm includ-

ing parameters relating to coherence of biclusters,

structure of biclusters, quality of biclusters, and ef-

ﬁciency of the program. BicPAMS was not reviewed

in (Pontes et al., 2015a). It is a non-heuristic based

algorithm.

3 METHODS

3.1 Evaluation Framework

To effectively evaluate the biclustering algorithms,

we utilize the benchmark synthetic data introduced

in (Wang et al., 2016) and generated with the

BiBench framework (Eren et al., 2012). The ad-

vantage of utilizing synthetic data in evaluation of

algorithm performance is that there is readily avail-

able ground-truth. However, there is always the con-

cern of whether the synthetic data generation cap-

tures the complexity of real applications. The bench-

mark data consist of 6 groups of square bicluster

Performance Evaluation and Enhancement of Biclustering Algorithms

205

structures (trend-preserving, column-constant, row-

constant, shift-scale (combined), shift, scale) as well

as 3 overlapping datasets and 3 narrow datasets: a to-

tal of 119 datasets. Square biclusters have the same

number of genes and conditions in each bicluster

while overlapping biclusters are biclusters that share

one or more genes or conditions. Narrow biclusters

contain many more genes than conditions. A com-

prehensive description of these bicluster types is pre-

sented in (Mukhopadhyay et al., 2010).

Given a bicluster B , let I denote a set of row vec-

tors in B and J, the corresponding set of column vec-

tors. Then, the element in the i

and j

column of B

is denoted by B

i j

. We index speciﬁc gene vectors or

condition vectors using capital letters. For example,

the gene corresponding to the i

row of B across all

conditions is denoted B

, while the condition corre-

sponding to the j

column of B across all genes is

denoted B

I j

To evaluate the performance of the algorithms,

we utilize the recovery and relevance scores, derived

from match score (Preli

c et al., 2006). Match Score

(MS) between two sets of biclusters S

and S

is de-

ﬁned as:

MS (S

, S

) =

∑

∈S

max

∈S

∩ B

∪ B

(1)

which reﬂects the average of the maximum similarity

for all biclusters B

in S

with respect to the biclusters

in S

. The intersection of two biclusters B

∈ S

and B

∈ S

denotes the set of rows common to both

and B

. Similarly, the union of two biclusters is

the set of rows that exist in either B

or B

or both.

The match score takes on values between 0 and 1,

inclusive. In the case that no rows of any bicluster

in S

are found in any bicluster in S

, |B

∩ B

| = 0

for all possible B

∈ S

, B

∈ S

. Subsequently, MS

= 0 (equation (1). Similarly, if the sets of biclusters

and S

are identical, then both |B

∩ B

| = |S

and |B

∪ B

| = |S

|, yielding a match score of one.

The match score is also referred to as similarity score

(Wang et al., 2016).

For a given dataset D, let S(A

) denote the set of

biclusters returned by applying a speciﬁc biclustering

algorithm A

on D, while G denotes the correspond-

ing set of known ground truth biclusters for D. The

relevance score, MS(S, G), is a measure of the ex-

tent to which the generated biclusters S(A

) are sim-

ilar to the ground truth biclusters in the gene (row)

dimension. The recovery score, given by MS(G, S),

quantiﬁes the proportion of the subset of G that were

retrieved by A

. A high relevance score implies that

a large percentage of the biclusters discovered by the

algorithm are signiﬁcant, while a high recovery score

indicates that a large percentage of the actual ground

truth biclusters are very similar to the ones returned

by the algorithm.

3.2 Internal Validation Measures

Relevance and recovery scores are both external

validation measures, as the computation is dependent

on prior knowledge of ground truth data. Internal

validation measures provide a means of evaluating

quality of biclusters obtained without the knowledge

of ground truth; which is very useful for real datasets

for which ground truth is unknown. In this section,

We formally describe the two evaluation measures

that are used in our performance enhancement

method: ASR and VE

. We also discuss two other

common internal validation measures that we utilize

for comparison analysis: MSR and SMSR.

Average Spearman’s Rho. The Average Spearman’s

Rho (ASR) (Ayadi et al., 2009) measure is an adapta-

tion of the Spearman’s Rho (Lehmann and D’abrera,

1975) correlation coefﬁcient to assess bicluster qual-

ity. Spearman’s Rho is deﬁned as

ρ(x, y) = 1 −

m(m

− 1)

∑

k=1

(r (x

) − r (y

))

(2)

for two vectors x and y of equal length m, where r(x

)

and r(y

) are the ranks of x

and y

, respectively. Let

gene

∑

i∈I

∑

j∈I, j>i

ρ(i, j)

|I| · (|I|− 1)

(3)

condition

∑

i∈J

∑

j∈J, j>i

ρ(i, j)

|J| · (|J| − 1)

(4)

ASR is deﬁned as

ASR(B) = 2 · max

{

gene

, ρ

condition

}

(5)

The ASR’s value is in the range [−1, 1], where both

−1 and 1 represent a perfect trend-preserving biclus-

ter. ASR is one of the few bicluster quality measures

that can detect both shifting and scaling patterns of

biclusters, as well as shift-scale (combined pattern)

biclusters (Pontes et al., 2015b).

Transposed Virtual Error. Transposed Virtual Er-

ror (VE

) (Pontes et al., 2010) is another bicluster

quality measure that correctly identiﬁes shift, scale,

and shift-scale biclusters. Transposed Virtual Error is

an improvement on Virtual Error (VE) (Pontes et al.,

2007), which does not identify shift-scale biclusters.

Both VE and VE

required standardized biclus-

ters. A bicluster is standardized by subtracting the

row mean from each element of the bicluster and di-

viding by the row standard deviation, i.e.

ICPRAM 2018 - 7th International Conference on Pattern Recognition Applications and Methods

206

B =

i j

− µ

, i = 1, 2, ..., |I|, j = 1, 2, ..., |J|

(6)

where µ

is the mean of row i in B and σ

is the

standard deviation of row i in B.

VE computes a virtual gene ρ, which is a vec-

tor imitating a gene whose entries are column means

across all genes in the bicluster. Explicitly, the stan-

dardized virtual gene is calculated for a standardized

bicluster

B as

|I|

∑

i=1

i j

, j = 1, 2, ..., |J| (7)

Finally, VE is deﬁned as

V E(B) =

|I| · |J|

|I|

∑

i=1

|J|

∑

j=1

i j

−

| (8)

To compute VE

, transpose the bicluster prior to cal-

culating VE. VE

computes a virtual condition ρ and

measures the deviation of conditions in the bicluster

from ρ. The virtual condition ρ is calculated as

|J|

∑

j=1

i j

, j = 1, 2, ..., |J| (9)

and VE

is calculated as

V E

(B) =

|I| · |J|

|I|

∑

i=1

|J|

∑

j=1

i j

−

| (10)

is equal to zero for perfect shifting or scaling

or shift-scale patterns.

Special Cases for VE

. Constant rows in expression

data pose an issue when computing VE

. When one

or more rows are constant, the standard deviation of

at least one row is zero, and thus the result of equation

(3.2) is undeﬁned.A constant row is highly unlikely

in real data applications, so a standard deviation of

zero should be a non-issue. For the context of this

work with synthetic data, VE

is set to one if any

zero-division errors occurred. This does produce

false negatives in the case that a constant row is part

of a constant bicluster.

Mean Squared Residue. The mean squared residue

score (MSR) describes how well a bicluster follows a

shifting pattern (Cheng and Church, 2000). MSR is

deﬁned as

MSR(B) =

|I| · |J|

|I|

∑

i=1

|J|

∑

j=1

i j

−b

I j

)

(11)

where the bicluster B consists of rows I and columns

J. Values b

and b

I j

denote the mean of the i

row and j

column, respectively, and b

denotes the

mean of all entries of the bicluster.

By design, biclusters that follow a perfect shifting

pattern have an MSR score of zero. Larger MSR

scores represent more deviation from a perfect

shifting pattern.

Scaling Mean Squared Residue. The Scaling Mean

Squared Residue (SMSR) is an evaluation measure

for biclusters that detects scaling patterns in biclusters

(Mukhopadhyay et al., 2009). SMSR is very similar

to MSR except that it is suited for biclusters with scal-

ing patterns while MSR is suited for shifting patterns.

SMSR is deﬁned as

SMSR(B ) =

|I| · |J|

|I|

∑

i=1

|J|

∑

j=1

· b

I j

− b

i j

· b

)

· b

I j

(12)

The SMSR score of a bicluster with a perfect scaling

pattern is zero. Neither MSR nor SMSR perform well

on shift-scale biclusters.

3.3 Enhancement Framework

Gene expression data matrices are usually very large.

It is not uncommon for these matrices to have tens of

thousands of rows (genes) and hundreds of columns

(samples). It is generally unknown how many biclus-

ters will be returned by a biclustering algorithm on

a given dataset. Some biclustering algorithms usu-

ally output a very large set of biclusters (based on al-

gorithm speciﬁc stop criterion) while some include a

user speciﬁed parameter to deﬁne the number of bi-

clusters to generate. A large portion of recent bi-

clustering algorithms use stochastic approaches, and

hence are not deterministic. This means that multi-

ple repetitions of the same experiment with such algo-

rithms do not necessarily yield identical results. Prop-

erties of algorithms analyzed in this paper, such as

determinism vs. non-determinism, are described in

Table 1. It is desirable for the discovered set of bi-

clusters to be a manageable number of highly relevant

since the discovered biclusters require signiﬁcant hu-

man effort for further evaluation to determine biolog-

ical signiﬁcance.

In this section, we present a method of improv-

ing the relevance score of any set of biclusters by us-

ing either of these two internal validation measures:

Average Spearman’s Rho and Transposed Virtual Er-

ror. This can be applied to both types of algorithms

i.e. the ones that have a deﬁned stop criterion as

well as the ones that require a user-speciﬁed parame-

ter of number of biclusters to generate. The strength

of the proposed framework is that it serves a ”ﬁlter” to

help detect highly relevant bicluster among a large set

Performance Evaluation and Enhancement of Biclustering Algorithms

207

of output biclusters. The framework can be applied

using any desired internal validation metric, though

from the results obtained, we recommend using the

best performing ones (VE

and ASR). The next step

is to determine an ensemble method for leveraging the

usefulness of both metrics.

The method of improving the relevance of a set of

biclusters S is described as follows:

1. Choose an internal validation measure M and a

number of desired biclusters n, where n < |S|.

2. Compute M(B ) for each bicluster B ∈ S .

3. Order each bicluster in B ∈ S from best to worst

according to M(B).

4. Retain the best n biclusters according to M.

It is important to note that while relevance scores are

improved with our method, recovery scores are nega-

tively impacted. By reducing the number |S|, the ini-

tial size of the output biclusters, the recovery score

will be less than or equal to that of the initial list.

Thus, our goal is to maximize the increase in rele-

vance scores while minimizing the decrease in recov-

ery scores. Ideally, we desire to ﬁlter out biclusters

with redundant or insigniﬁcant information.

4 EXPERIMENTAL RESULTS

AND ANALYSIS

4.1 Experimental Setup

The eight biclustering algorithms analyzed in this

work were set to their default parameters and con-

ducted using existing implementations (Table 1). To

ensure that the experiments presented in this work are

replicable, the source code is publicly available via

GitHub

along with detailed instructions on the spec-

iﬁcations of implementations. For algorithms (CC,

FLOC, and PPM)that required user-speciﬁed param-

eter k on the number of biclusters to generate, we set

k = 20. In addition, CC was set to return biclusters

with a maximum MSR score of 0.1. For OPSM, the

number of passed models between iterations used was

10. PPM was implemented using the recommended

parameters of the GPE method (Chekouo and Murua,

2015).

In the computation of ASR, we have to compute

Spearman’s Rho according to equation (2). This re-

quires us calculate the rank of each element in both

vectors x and y. When there are ties in the elements

github.com/clslabMSU/Biclustering-Algorithm-

Comparison

of x or y, ranking becomes problematic and subse-

quently results in Spearman’s Rho not being deﬁned.

There are different tie correction methods available to

alleviate this problem (Zar, 1998). The method of tie

correction implemented in this work was to assign all

tied values to the minimum rank.

The experimental results presented are two-fold.

In section 4.2, we present the results of the compar-

ative analysis of the eight algorithms using relevance

and recovery scores on the 119 benchmark datasets

while section 4.3 focuses on empirical evaluation of

the proposed enhancement framework. For the per-

formance enhancement evaluation, the desired num-

ber of biclusters n is set to 3g, where g is the actual

number of biclusters present in the dataset (based on

the ground truth information). Thus, the evaluation

results presented demonstrate the impact on both rel-

evance and recovery scores with a very minimal num-

ber of clusters selected - 3g. In actual practice, n can

be set to the number that the user is comfortable using

for further biological evaluation.

4.2 Performance Evaluation Results

Figure 1 illustrates the results of the performance of

the eight algorithms in terms of relevancy and recov-

ery scores for eight types of biclusters datasets: 6

types of square biclusters (trend-preserving, column-

constant, row-constant, shift-scale (combined), shift,

scale) with each type having 15 associated datasets,

as well as a set of 20 overlapping datasets and a set of

9 narrow datasets.

As can be observed from Figure 1a, OPSM is

the best performing algorithm on narrow biclusters

in terms of both relevance and recovery scores. This

is useful for practical gene expression datasets where

the number of conditions is much less than the num-

ber of genes. For the datasets with overlapping bi-

clusters (Figure 1b), BicPAMS performed the best

in terms of relevance, and UniBic performed best in

terms of recovery. For all the datasets, except for nar-

row, UniBic had very high recovery scores.

Table 2 illustrates the average rank of each algo-

rithm across all 119 synthetic datasets. The rank-

ing results for the performance evaluation results is

contained in the ﬁrst 4 columns after the algorithm

name column (i.e. before enhancement). The aver-

age rank is calculated by ranking each algorithm’s

performance on each dataset, with 1 being the best

performer and 8 (in this case) being the worst. We

rank the performance using relevance and recovery

scores, ASR, and VE

. The average rank of an algo-

rithm is simply the sum of its ranks for each dataset

divided by the number of datasets. The statistical sig-

ICPRAM 2018 - 7th International Conference on Pattern Recognition Applications and Methods

208

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Recovery

Relevance

Narrow Biclusters

ISA CC OPSM BicPAMS

UniBic FABIA FLOC PPM

(a) Narrow Biclusters

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Recovery

Relevance

Overlap

ISA CC OPSM BicPAMS

UniBic FABIA FLOC PPM

(b) Overlapping Biclusters

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Recovery

Relevance

Type 1 Biclusters

ISA CC OPSM BicPAMS

UniBic FABIA FLOC PPM

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Recovery

Relevance

Type 2 Biclusters

ISA CC OPSM BicPAMS

UniBic FABIA FLOC PPM

(d) Square Type 2 (column-constant) Biclusters

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Recovery

Relevance

Type 3 Biclusters

ISA CC OPSM BicPAMS

UniBic FABIA FLOC PPM

(e) Square Type 3 (row-constant) Biclusters

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Recovery

Relevance

Type 4 Biclusters

ISA CC OPSM BicPAMS

UniBic FABIA FLOC PPM

(f) Square Type 4 (shift-scale) Biclusters

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Recovery

Relevance

Type 5 Biclusters

ISA CC OPSM BicPAMS

UniBic FABIA FLOC PPM

(g) Square Type 5 (shift) Biclusters

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Recovery

Relevance

Type 6 Biclusters

ISA CC OPSM BicPAMS

UniBic FABIA FLOC PPM

(h) Square Type 6 (scale) Biclusters

Figure 1: Comparisons of recovery and relevance scores across biclustering algorithms on different types of bicluster datasets.

Performance Evaluation and Enhancement of Biclustering Algorithms

209

niﬁcance of these ranks is measured using Friedman

test statistic. The critical value of a chi-square distri-

bution with 7 degrees of freedom is 14.067, so Fried-

man statistics higher than 14.067 are considered sta-

tistically signiﬁcant. The p-values associated with the

Friedman statistics were all signiﬁcant: < 0.001.

From Table 2, one can observe that BicPAMS had

the best average rank in terms of relevance score be-

fore performance enhancement, and UniBic had the

best average rank in recovery score by a very large

margin. Similarly, OPSM had the best average rank

according to the ASR value (internal validation mea-

sure) in every dataset, hence an average rank of 1.0.

OPSM also had the best average rank according to

, closely followed by FLOC.

4.3 Performance Enhancement Results

UniBic and BicPAMS both tended to have high recov-

ery scores but low relevance scores according to the

experimental results in section 4.2. They also returned

a very large number of clusters as indicated in Table 3

which probably helped their recovery scores but de-

graded the performance in terms of relevance. Fig-

ure 2 demonstrates the effect of the performance en-

hancement (PE) framework on the performance of the

algorithms in terms of relevancy and recovery scores.

For each algorithm, the top n biclusters selected is

set to three times the actual number of ground truth

present in the dataset. The algorithms that beneﬁt the

most from the PE method are those that return a very

large number of biclusters, such as BicPAMS and

UniBic. When these algorithms are applied to real

gene expression datasets, the number of returned bi-

clusters is usually too large to manually examine. By

applying the PE method, results of these algorithms

are much more manageable, and each bicluster exam-

ined is more likely to contain biologically signiﬁcant

information. According to Table 3, PPM algorithm is

the only algorithm that actually returns less than this

parameter initially i.e. before enhancement. Given

that the returned number of biclusters is less, some

loss in recovery is inevitable however, it is interesting

to observe the effect on the relevance of the results.

Table 3 also demonstrates that, overall, applying

the PE framework using the ASR or VE

measure

yields a more signiﬁcant positive impact on the rel-

evance scores compared to MSR or SMSR measures.

We can observe from Figure 2 that relevance scores

were improved for six of the eight tested algorithms,

with the most dramatic improvement being on the

UniBic algorithm and the ASR quality measure. In

this case, the relevance of the UniBic biclusters were

increased by well over 50%. Naturally, removing bi-

clusters from a set will have a negative impact on the

recovery score of that set. The ASR validation mea-

sure showed the largest increase in relevance scores

among every algorithm except OPSM and FABIA,

thus indicating most superior performance compared

to VE

, SMSR and MSR. Table 4 summarizes the ef-

fect of the enhancement framework, based only on the

ASR measure according to the eight different types of

dataset tested.

The last two sets of four columns of Table 2

present the average ranking results of each algo-

rithm after applying the PE framework using the ASR

and VE

validation measures, respectively across all

datasets. For both evaluation measures, we observe

that the best performing algorithms are largely the

same as before we performed our enhancement. How-

ever, the average ranks have shifted slightly. Filtering

on the ASR measure has further set BicPAMS apart

from the competition in terms of relevance, lowering

its rank from 2.18 before enhancement to 1.62 after

ASR ﬁltering. UniBic still performs best in terms of

recovery, but by a smaller margin. Filtering by VE

has produced less compelling results. The average

ranks for recovery are much closer together than the

ranks before enhancement and the ranks using ASR

ﬁltering, which implies that the ranks of each algo-

rithm across all datasets were inconsistent. This is

reﬂected in the lower Friedman statistics of VE

ﬁl-

tering compared to the Friedman statistics before en-

hancement and with ASR ﬁltering.

For the evaluation (metric)-based methods, CC

and FLOC, which are based on MSR, applying ASR

to select the top n biclusters still improves the rele-

vance scores, even though the original mean number

of clusters returned by these algorithm is 20 which is

close to the mean top n that we select [12, 20]. Ac-

cording to Table 3, though recovery scores were hurt,

the algorithm with the highest recovery score was un-

changed after our enhancement, implying that the per-

centage loss was almost uniform across best perform-

ing algorithms. The PE framework results demon-

strate that the ASR quality measure tended to lead

to a larger increase in relevance scores, while main-

taining a relatively low loss in recovery scores. At

this point, it becomes a trade-off between obtaining

a manageable number of biclusters and losing accu-

racy. On larger datasets for real data applications,

it quickly becomes difﬁcult to inspect the biological

signiﬁcance of a large number of biclusters. Thus,

the proposed PE framework would be useful for gene

expression data analysis, in determining signiﬁcance

and relevance of the results.

ICPRAM 2018 - 7th International Conference on Pattern Recognition Applications and Methods

210

Table 2: Statistical comparison of average ranking of algorithm performance using Friedman test before and after enhancement

method.

Algorithm

Before Enhancement Filtering on ASR Filtering on VE

Rel. Rec. ASR VE

BicPAMS 2.18 2.75 3.39 3.75 1.62 3.62 1.92 2.12 1.67 3.91 2.14 1.62

CC 7.67 7.06 4.43 3.25 7.47 6.86 5.46 3.88 7.55 7.03 5.38 5.25

FABIA 6.10 4.87 7.42 8.00 6.60 5.84 7.48 7.12 6.56 5.69 7.50 8.00

FLOC 4.66 5.12 3.47 1.62 4.28 4.58 4.37 1.25 4.60 4.58 4.09 2.88

ISA 5.04 4.07 6.35 6.62 4.42 3.99 5.26 4.75 4.22 3.39 5.61 5.62

OPSM 2.77 5.42 1.00 1.38 3.73 4.55 1.22 3.50 3.46 4.16 1.03 3.88

PPM 4.09 5.39 5.52 6.00 4.86 4.49 6.46 6.12 4.49 4.22 6.40 7.00

UniBic 3.49 1.32 4.41 5.38 3.03 2.07 3.83 7.25 3.45 3.02 3.85 1.75

Friedman

276 290 265 335 289 260 313 390 192 163

280 425

Rec.: Recovery Score; Rel.: Relevance Score;

P-values for all Friedman tests < 0.001 with the exception of:

P-value = 0.004.

Table 3: Mean number of biclusters returned by algorithm.

Biclusters

Range of g

PE Exp.

Mean No. of biclusters returned by each algorithm.

n = 3g BicPAMS FABIA ISA OPSM PPM UniBic CC FLOC

Narrow 3 9 320 12 65 11 10 92 20 20

overlap 3 9 557 20 26 14 10 53 20 20

Type 1 [3 5] 12 243 20 48 10 10 42 20 20

Type 2 [3 5] 12 746 20 37 10 10 47 20 20

Type 3 [3 5] 12 712 19 46 9 10 61 20 20

Type 4 [3 5] 12 768 19 53 9 10 40 20 20

Type 5 [3 5] 12 353 20 35 10 10 56 20 20

Type 6 [3 5] 12 374 19 53 10 10 43 20 20

g: Mean number of biclusters in ground truth data;

PE Exp: Performance Enhancement Experiment.

Table 4: Best performing algorithm before and after enhancement (by ASR).

Type of dataset Best relevance before Best recovery before Best relevance after Best recovery after

Narrow Bicluster OPSM OPSM BicPAMS OPSM

Overlap Bicluster BicPAMS UniBic FLOC UniBic

Type 1 Biclusters UniBic UniBic BicPAMS UniBic

Type 2 Biclusters BicPAMS UniBic FLOC UniBic

Type 3 Biclusters FLOC UniBic FLOC UniBic

Type 4 Biclusters UniBic UniBic BicPAMS UniBic

Type 5 Biclusters BicPAMS UniBic BicPAMS UniBic

Type 6 Biclusters UniBic UniBic BicPAMS UniBic

-15%

-5%

15%

25%

35%

45%

55%

ISA CC OPSM BicPAMS UniBic FABIA FLOC PPM

Improvement in Relevance Scores Across Quality Measures

VET ASR MSR SMSR

(a) Improvement in Relevance Scores.

-50%

-45%

-40%

-35%

-30%

-25%

-20%

-15%

-10%

-5%

ISA CC OPSM BicPAMS UniBic FABIA FLOC PPM

Impact on Recovery Scores Across Quality Measures

VET ASR MSR SMSR

(b) Improvement in Recovery Scores.

Figure 2: Effect of applying evaluation measures to enhance performance of the biclustering algorithms.

Performance Evaluation and Enhancement of Biclustering Algorithms

211

5 CONCLUSION

In this paper, we presented a systematic compari-

son of eight popular biclustering algorithms, and ob-

jectively evaluated their performance using Recov-

ery and Relevance scores on 119 synthetic datasets.

We also ranked these eight algorithms using the aver-

age rank across each dataset, and veriﬁed the statis-

tical signiﬁcance of these ranks using the Friedman

statistic. Across the synthetic datasets used in our

experiment, we determined that UniBic was the best

performing algorithm in terms of recovery score and

BicPAMS was the best in terms of relevance, both

before and after the enhancement framework. The

datasets were highly skewed towards square biclus-

ters. It should be noted that for the narrow datasets,

which constituted a small fraction, OPSM had the best

relevance and recovery scores prior to the PE frame-

work. After the PE method, BicPAMS had the best

relevance performance. Thus, applying the PE frame-

work enabled BicPAMS to obtain a better perfor-

mance. It should also be noted that the biclusters hid-

den in these synthetic datasets are all sequential, that

is, all genes and conditions in each bicluster appear

consecutively. Future analysis would include perfor-

mance evaluation on non-sequential biclusters. We

evaluated the performance of our proposed enhance-

ment framework of improving relevance scores (and

signiﬁcance of) biclustering results using internal val-

idation measures. This new method of improvement

offers an option to improve the relevance of biclus-

tering results at the cost of recovery, a choice that we

believe will be valuable in the analysis of biological

signiﬁcance of biclusters found in real gene expres-

sion datasets.

REFERENCES

Ayadi, W., Elloumi, M., and Hao, J.-K. (2009). A bicluster-

ing algorithm based on a bicluster enumeration tree:

application to dna microarray data. BioData Mining,

2(1):9.

Barkow, S., Bleuler, S., Preli

c, A., Zimmermann, P., and

Zitzler, E. (2006). Bicat: a biclustering analysis tool-

box. Bioinformatics, 22(10):1282–1283.

Ben-Dor, A., Chor, B., Karp, R., and Yakhini, Z. (2003).

Discovering local structure in gene expression data:

the order-preserving submatrix problem. Journal of

Computational Biology, 10(3-4):373–384.

Bergmann, S., Ihmels, J., and Barkai, N. (2003). Iter-

ative signature algorithm for the analysis of large-

scale gene expression data. Physical Review E,

67(3):031902.

Chekouo, T. and Murua, A. (2015). The penalized bicluster-

ing model and related algorithms. Journal of Applied

Statistics, 42(6):1255–1277.

Cheng, Y. and Church, G. M. (2000). Biclustering of ex-

pression data. In Ismb, volume 8, pages 93–103.

Conover, W. J. and Iman, R. L. (1981). Rank transforma-

tions as a bridge between parametric and nonparamet-

ric statistics. The American Statistician, 35(3):124–

129.

ardi, G., Kutalik, Z., and Bergmann, S. (2010). Modular

analysis of gene expression data with r. Bioinformat-

ics, 26(10):1376–1377.

Elnabarawy, I., Wunsch, D. C., and Abdelbar, A. M.

(2016). Biclustering artmap collaborative ﬁltering

recommender system. In Neural Networks (IJCNN),

2016 IEEE International Joint Conference on, pages

2986–2991.

Eren, K. (2013). Cheng and church algorithm for

scikit learn. https://github.com/kemaleren/scikit-

learn/tree/cheng˙church.

Eren, K., Deveci, M., K

uc¸

uktunc¸, O., and C¸ ataly

urek,

U. V.

(2012). A comparative analysis of biclustering algo-

rithms for gene expression data. Brieﬁngs in Bioinfor-

matics, 14(3):279–292.

Gestraud, P. (2008). BicARE: Biclustering Analysis and Re-

sults Exploration. R package version 1.32.0.

Gu, J. and Liu, J. S. (2008). Bayesian biclustering of gene

expression data. BMC Genomics, 9(1):113–120.

Hartigan, J. A. (1972). Direct clustering of a data ma-

trix. Journal of the American Statistical Association,

67(337):123–129.

Henriques, R., Ferreira, F. L., and Madeira, S. C. (2017).

Bicpams: software for biological data analysis with

pattern-based biclustering. BMC Bioinformatics,

18(1):82.

Henriques, R. and Madeira, S. C. (2014a). Bicpam: Pattern-

based biclustering for biomedical data analysis. Algo-

rithms for Molecular Biology, 9(1):27.

Henriques, R. and Madeira, S. C. (2014b). Bicspam: ﬂexi-

ble biclustering using sequential patterns. BMC Bioin-

formatics, 15(1):130.

Henriques, R. and Madeira, S. C. (2015). Biclustering

with ﬂexible plaid models to unravel interactions be-

tween biological processes. IEEE/ACM Transac-

tions on Computational Biology and Bioinformatics,

12(4):738–752.

Henriques, R. and Madeira, S. C. (2016). Bicnet: Flexible

module discovery in large-scale biological networks

using biclustering. Algorithms for Molecular Biology,

11(1):14.

Hochreiter, S., Bodenhofer, U., Heusel, M., Mayr, A., Mit-

terecker, A., Kasim, A., Khamiakova, T., Van Sanden,

S., Lin, D., Talloen, W., et al. (2010). Fabia: fac-

tor analysis for bicluster acquisition. Bioinformatics,

26(12):1520–1527.

Kriegel, H.-P., Kr

oger, P., and Zimek, A. (2009). Clustering

high-dimensional data: A survey on subspace cluster-

ing, pattern-based clustering, and correlation cluster-

ing. ACM Transactions on Knowledge Discovery from

Data (TKDD), 3(1):1.

Lehmann, E. L. and D’abrera, H. (1975). Nonparametrics:

statistical methods based on ranks.

ICPRAM 2018 - 7th International Conference on Pattern Recognition Applications and Methods

212

Li, G., Ma, Q., Tang, H., Paterson, A. H., and Xu, Y. (2009).

Qubic: a qualitative biclustering algorithm for analy-

ses of gene expression data. Nucleic Acids Research,

37(15):e101–e101.

Madeira, S. C. and Oliveira, A. L. (2004). Bicluster-

ing algorithms for biological data analysis: a survey.

IEEE/ACM Transactions on Computational Biology

and Bioinformatics (TCBB), 1(1):24–45.

Mukhopadhyay, A., Maulik, U., and Bandyopadhyay, S.

(2009). A novel coherence measure for discovering

scaling biclusters from gene expression data. Jour-

nal of Bioinformatics and Computational Biology,

7(05):853–868.

Mukhopadhyay, A., Maulik, U., and Bandyopadhyay, S.

(2010). On biclustering of gene expression data. Cur-

rent Bioinformatics, 5(3):204–216.

Oghabian, A., Kilpinen, S., Hautaniemi, S., and Czeizler,

E. (2014). Biclustering methods: biological relevance

and application in gene expression analysis. PloS One,

9(3):e90801.

Pontes, B., Divina, F., Gir

aldez, R., and Aguilar-Ruiz, J. S.

(2007). Virtual error: a new measure for evolutionary

biclustering. In European Conference on Evolutionary

Computation, Machine Learning and Data Mining in

Bioinformatics, pages 217–226.

Pontes, B., Gir

aldez, R., and Aguilar-Ruiz, J. S. (2010).

Measuring the quality of shifting and scaling patterns

in biclusters. In PRIB, pages 242–252.

Pontes, B., Gir

aldez, R., and Aguilar-Ruiz, J. S. (2015a).

Biclustering on expression data: A review. Journal of

Biomedical Informatics, 57:163–180.

Pontes, B., Girldez, R., and Aguilar-Ruiz, J. S. (2015b).

Quality measures for gene expression biclusters. PloS

One, 10(3):e0115497.

Preli

c, A., Bleuler, S., Zimmermann, P., Wille, A.,

uhlmann, P., Gruissem, W., Hennig, L., Thiele, L.,

and Zitzler, E. (2006). A systematic comparison and

evaluation of biclustering methods for gene expres-

sion data. Bioinformatics, 22(9):1122–1129.

Roy, S., Bhattacharyya, D. K., and Kalita, J. K. (2016).

Analysis of gene expression patterns using bicluster-

ing. Microarray Data Analysis: Methods and Appli-

cations, pages 91–103.

Wang, Z., Li, G., Robinson, R. W., and Huang, X. (2016).

Unibic: Sequential row-based biclustering algorithm

for analysis of gene expression data. Scientiﬁc Re-

ports, 6:23466.

Xu, R. and Wunsch II, D. C. (2011). Bartmap: A vi-

able structure for biclustering. Neural Networks,

24(7):709–716.

Yang, J., Wang, H., Wang, W., and Yu, P. S. (2005). An

improved biclustering method for analyzing gene ex-

pression proﬁles. International Journal on Artiﬁcial

Intelligence Tools, 14(05):771–789.

Zar, J. H. (1998). Spearman rank correlation. Encyclopedia

of Biostatistics.

Zhang, J. (2010). A bayasian model for biclustering with

applications. Journal of the Royal Statistical Society,

59(4):635–656.

Performance Evaluation and Enhancement of Biclustering Algorithms

213