5 CONCLUSION
In this paper, we presented a systematic compari-
son of eight popular biclustering algorithms, and ob-
jectively evaluated their performance using Recov-
ery and Relevance scores on 119 synthetic datasets.
We also ranked these eight algorithms using the aver-
age rank across each dataset, and verified the statis-
tical significance of these ranks using the Friedman
statistic. Across the synthetic datasets used in our
experiment, we determined that UniBic was the best
performing algorithm in terms of recovery score and
BicPAMS was the best in terms of relevance, both
before and after the enhancement framework. The
datasets were highly skewed towards square biclus-
ters. It should be noted that for the narrow datasets,
which constituted a small fraction, OPSM had the best
relevance and recovery scores prior to the PE frame-
work. After the PE method, BicPAMS had the best
relevance performance. Thus, applying the PE frame-
work enabled BicPAMS to obtain a better perfor-
mance. It should also be noted that the biclusters hid-
den in these synthetic datasets are all sequential, that
is, all genes and conditions in each bicluster appear
consecutively. Future analysis would include perfor-
mance evaluation on non-sequential biclusters. We
evaluated the performance of our proposed enhance-
ment framework of improving relevance scores (and
significance of) biclustering results using internal val-
idation measures. This new method of improvement
offers an option to improve the relevance of biclus-
tering results at the cost of recovery, a choice that we
believe will be valuable in the analysis of biological
significance of biclusters found in real gene expres-
sion datasets.
REFERENCES
Ayadi, W., Elloumi, M., and Hao, J.-K. (2009). A bicluster-
ing algorithm based on a bicluster enumeration tree:
application to dna microarray data. BioData Mining,
2(1):9.
Barkow, S., Bleuler, S., Preli
´
c, A., Zimmermann, P., and
Zitzler, E. (2006). Bicat: a biclustering analysis tool-
box. Bioinformatics, 22(10):1282–1283.
Ben-Dor, A., Chor, B., Karp, R., and Yakhini, Z. (2003).
Discovering local structure in gene expression data:
the order-preserving submatrix problem. Journal of
Computational Biology, 10(3-4):373–384.
Bergmann, S., Ihmels, J., and Barkai, N. (2003). Iter-
ative signature algorithm for the analysis of large-
scale gene expression data. Physical Review E,
67(3):031902.
Chekouo, T. and Murua, A. (2015). The penalized bicluster-
ing model and related algorithms. Journal of Applied
Statistics, 42(6):1255–1277.
Cheng, Y. and Church, G. M. (2000). Biclustering of ex-
pression data. In Ismb, volume 8, pages 93–103.
Conover, W. J. and Iman, R. L. (1981). Rank transforma-
tions as a bridge between parametric and nonparamet-
ric statistics. The American Statistician, 35(3):124–
129.
Cs
´
ardi, G., Kutalik, Z., and Bergmann, S. (2010). Modular
analysis of gene expression data with r. Bioinformat-
ics, 26(10):1376–1377.
Elnabarawy, I., Wunsch, D. C., and Abdelbar, A. M.
(2016). Biclustering artmap collaborative filtering
recommender system. In Neural Networks (IJCNN),
2016 IEEE International Joint Conference on, pages
2986–2991.
Eren, K. (2013). Cheng and church algorithm for
scikit learn. https://github.com/kemaleren/scikit-
learn/tree/cheng˙church.
Eren, K., Deveci, M., K
¨
uc¸
¨
uktunc¸, O., and C¸ ataly
¨
urek,
¨
U. V.
(2012). A comparative analysis of biclustering algo-
rithms for gene expression data. Briefings in Bioinfor-
matics, 14(3):279–292.
Gestraud, P. (2008). BicARE: Biclustering Analysis and Re-
sults Exploration. R package version 1.32.0.
Gu, J. and Liu, J. S. (2008). Bayesian biclustering of gene
expression data. BMC Genomics, 9(1):113–120.
Hartigan, J. A. (1972). Direct clustering of a data ma-
trix. Journal of the American Statistical Association,
67(337):123–129.
Henriques, R., Ferreira, F. L., and Madeira, S. C. (2017).
Bicpams: software for biological data analysis with
pattern-based biclustering. BMC Bioinformatics,
18(1):82.
Henriques, R. and Madeira, S. C. (2014a). Bicpam: Pattern-
based biclustering for biomedical data analysis. Algo-
rithms for Molecular Biology, 9(1):27.
Henriques, R. and Madeira, S. C. (2014b). Bicspam: flexi-
ble biclustering using sequential patterns. BMC Bioin-
formatics, 15(1):130.
Henriques, R. and Madeira, S. C. (2015). Biclustering
with flexible plaid models to unravel interactions be-
tween biological processes. IEEE/ACM Transac-
tions on Computational Biology and Bioinformatics,
12(4):738–752.
Henriques, R. and Madeira, S. C. (2016). Bicnet: Flexible
module discovery in large-scale biological networks
using biclustering. Algorithms for Molecular Biology,
11(1):14.
Hochreiter, S., Bodenhofer, U., Heusel, M., Mayr, A., Mit-
terecker, A., Kasim, A., Khamiakova, T., Van Sanden,
S., Lin, D., Talloen, W., et al. (2010). Fabia: fac-
tor analysis for bicluster acquisition. Bioinformatics,
26(12):1520–1527.
Kriegel, H.-P., Kr
¨
oger, P., and Zimek, A. (2009). Clustering
high-dimensional data: A survey on subspace cluster-
ing, pattern-based clustering, and correlation cluster-
ing. ACM Transactions on Knowledge Discovery from
Data (TKDD), 3(1):1.
Lehmann, E. L. and D’abrera, H. (1975). Nonparametrics:
statistical methods based on ranks.
ICPRAM 2018 - 7th International Conference on Pattern Recognition Applications and Methods
212