used consensus functions to the biclustering context.
Section 5 is devoted to evaluate these new consen-
sus functions on several experimentations. Finally,
we summarize the main points resulting from our ap-
proach.
2 ENSEMBLE METHODS
The principle of ensemble methods is to construct a
set of models, then to aggregate them into a single
model. It is well-known that these methods often per-
form better than a single model (Dietterich, 2000).
Ensemble methods first appeared in supervised learn-
ing problems. A combination of classifiers is more
accurate than single classifiers (Maclin, 1997). A pi-
oneer method boosting, the most popular algorithm
which is adaboost, was developed mainly by Shapire
(Schapire, 2003). The principle is to assign a weight
to each training example, then several classifiers are
learned iteratively and between each learning step the
weight of the examples is adjusted depending on the
classifier results. The final classifier is a weighted
vote of classifiers constructed during the procedure.
Another type of popular ensemble methods is bag-
ging, proposed by Breiman (Breiman, 1996). The
principle is to create a set a classifiers based on boot-
strap samples of the original data. The random forests
(Breiman, 2001) are the most famous application of
bagging. They are a combination of tree predictors,
and have given very good results in many domains
(Diaz-Uriarte and Alvarez de Andres, 2006).
Several works have shown that ensemble methods
can also be used in unsupervised learning. Topchy et
al. (Topchy et al., 2004b) showed theoretically that
ensemble methods may improve the clustering per-
formance. The principle of boosting was exploited
by Frossyniotis et al. (Frossyniotis et al., 2004) in
order to provide a consistent partitioning of the data.
The boost-clustering approach creates, at each itera-
tion, a new training set using weighted random sam-
pling from original data, and a simple clustering algo-
rithm is applied to provide new clusters. Dudoit and
Fridlyand (Dudoit and Fridlyand, 2003) used bagging
to improve the accuracy of clustering in reducing the
variability of the PAM algorithm (Partitioning Around
Medoids) results (van der Laan et al., 2003). Their
method has been applied to leukemia and melanoma
datasets and made it possible to differentiate the dif-
ferent subtypes of tissues. Strehl et al. (Strehl and
Ghosh, 2002) proposed an approach to combine mul-
tiple partitioning obtained from different sources into
a single one. They introduced heuristics based on a
voting consensus. Each example is assigned to one
cluster for each partition, an example has therefore as
many assignments as number of partitions in the col-
lection. In the aggregated partition, the example is
assigned to the cluster to which it was the most of-
ten assigned. One problem with this consensus is that
it requires knowledge of the cluster correspondence
between the different partitions. They also proposed
a cluster-based similarity partitioning algorithm. The
collection is used to compute a similarity matrix of
the examples. The similarity between two examples
is based on the frequency of their co-association to
the same cluster over the collection. The aggregated
partition is computed by a clustering of the exam-
ples from the similarity matrix. Fern (Fern and Brod-
ley, 2004) formalized the aggregation procedure by a
bipartite graph partitioning. The collection is repre-
sented by a bipartite graph. The examples and clus-
ters of partitions are the two sets of vertices. An edge
between an example and a cluster means that exam-
ple has been assigned to this cluster. A partition of
the graph is performed and each sub-graph represents
an aggregated cluster. Topchy (Topchy et al., 2004a)
proposed to modelize the consensus of the collection
by a multinomial mixture model. In the collection,
each example is defined by a set of labels that rep-
resents their assigned clusters in each partition. This
can be seen as a new space in which the examples are
defined, each dimension being a partition of the col-
lection. The aggregated partition is computed from a
clustering of examples in this new space. Since the
labels are discrete variables, a multinomial mixture
model is used. Each component of the model repre-
sents an aggregated cluster.
Some recent works have shown that the ensem-
ble approach can also be useful in biclustering prob-
lems (Hanczar and Nadif, 2012). DeSmet pre-
sented a method of ensemble biclustering for query-
ing gene expression compendia from experimental
lists (De Smet and Marchal, 2011). Actually the en-
semble approach is performed only one dimension of
the data (the gene dimension). Then biclusters are ex-
tracted from the gene consensus clusters. A bagging
version of biclustering algorithms has been proposed
and tested for microarray data (Hanczar and Nadif,
2010). Although this last method improves the per-
formance of biclustering, in some cases it fails and
returns empty biclusters, i.e. without examples or fea-
tures. This is because the consensus function handles
the sets of examples and features on the same dimen-
sion as in the clustering context. The consensus func-
tion must respect the structure of the biclusters. For
this reason, the consensus functions mentioned above,
can be applied to biclustering problems. In this paper
we adapt these consensus functions to the biclustering
UnsupervisedConsensusFunctionsAppliedtoEnsembleBiclustering
31