In this paper, we presented a systematic compari-
son of eight popular biclustering algorithms, and ob-
jectively evaluated their performance using Recov-
ery and Relevance scores on 119 synthetic datasets.
We also ranked these eight algorithms using the aver-
age rank across each dataset, and verified the statis-
tical significance of these ranks using the Friedman
statistic. Across the synthetic datasets used in our
experiment, we determined that UniBic was the best
performing algorithm in terms of recovery score and
BicPAMS was the best in terms of relevance, both
before and after the enhancement framework. The
datasets were highly skewed towards square biclus-
ters. It should be noted that for the narrow datasets,
which constituted a small fraction, OPSM had the best
relevance and recovery scores prior to the PE frame-
work. After the PE method, BicPAMS had the best
relevance performance. Thus, applying the PE frame-
work enabled BicPAMS to obtain a better perfor-
mance. It should also be noted that the biclusters hid-
den in these synthetic datasets are all sequential, that
is, all genes and conditions in each bicluster appear
consecutively. Future analysis would include perfor-
mance evaluation on non-sequential biclusters. We
evaluated the performance of our proposed enhance-
ment framework of improving relevance scores (and
significance of) biclustering results using internal val-
idation measures. This new method of improvement
offers an option to improve the relevance of biclus-
tering results at the cost of recovery, a choice that we
believe will be valuable in the analysis of biological
significance of biclusters found in real gene expres-
sion datasets.
