Metrics for Clustering Comparison in Bioinformatics
Giovanni Rossi
Department of Computer Science and Engineering (DISI), University of Bologna
Mura Anteo Zamboni 7, Bologna, 40126, Italy
Keywords:
Hamming Distance, Partition Lattice, Hasse Diagram, Weighted Graph, Geodesic Distance, Path.
Abstract:
Developing from a concern in bioinformatics, this work analyses alternative metrics between partitions. From
both theoretical and applicative perspectives, a useful and interesting distance between any two partitions is
HD, which counts the number of atoms finer than either one but not both. While faithfully reproducing the
traditional Hamming distance between subsets, HD is very sensible and computable through scalar products
between Boolean vectors. It properly deals with complements and axiomatically resembles the entropy-based
variation of information VI distance. Entire families of metrics (including HD and VI) obtain as minimal
paths in the weighted graph given by the Hasse diagram: submodular weighting functions yield path-based
distances visiting the join (of any two partitions), whereas supermodularity leads to visit the meet. This yields
an exact (rather than heuristic) approach to the consensus partition (combinatorial optimization) problem.
1 INTRODUCTION
Partitions or clusterings are key instruments in a vari-
ety of fields at the interface of computer science, ar-
tificial intelligence and engineering, including pattern
recognition/learning, web mining and bioinformatics.
Quantitative clustering comparison is essential for as-
sessing the proximity between and superiority among
diverse partitions of a given set (Meila, 2007; Pinto
Da Costa and Rao, 2004).
In bioinformatics, measuring the distance between
clusterings of populations, either natural or experi-
mental, is fundamental for sibling relationship recon-
struction. Apparently, attention has been placed for
the most on a unique distance measure, here denoted
by MMD, which relies on maximum matching (Kono-
valov, 2006; Berger-Wolf et al., 2007; Sheikh et al.,
2010; Konovalov et al., 2005b; Konovalov et al.,
2005a). After its appearance (Almudevar and Field,
1999), MMD was shown (Gusfield, 2002) to be com-
putable via the assignment problem (Korte and Vy-
gen, 2002, p. 236). Another partition distance re-
cently tested in this setting (Brown and Dexter, 2012)
is the variation of information VI, obtained axiomati-
cally from information theory (Meila, 2007).
In this work, the distance between partitions is
measured in quite different ways, since the aim is to
have consistency and generalizations in terms of lat-
tice theory. The primary objective is to reproduce
the traditional Hamming distance between subsets,
given by the cardinality of their symmetric differ-
ence (Bollobas, 1986, p. 3). Such a benchmark is
extended from Boolean to geometric lattices by fo-
cusing on atoms and join-decompositions of lattice
elements (Aigner, 1997; Stern, 1999). While every
subset admits a unique such a decomposition, involv-
ing a number of atoms equal to the cardinality of
the subset, a generic partition admits different join-
decompositions, most of which redundant. The num-
ber of atoms involved in the unique maximal join-
decomposition of a partition is here defined to be the
size of that partition. The size is an integer-valued lat-
tice function, like the rank. In fact, the two coincide
for Boolean lattices but differ crucially for geometric
lattices. Roughly speaking, replacing the rank with
the size yields the Hamming distance HD between
partitions proposed below. While achieving combi-
natorial congruency, HD shares with VI important
characterizing axioms and is computed through sim-
ple scalar products between Boolean vectors, avoid-
ing any algorithmic issue. Finally, HD also has a large
range which provides fine measurement sensitivity.
The traditional Hamming distance between two
subsets is also the length of any shortest path between
them in the associated Hasse diagram, which is the
unit hypercube. This latter is a graph with subsets as
vertices and edges linking any two subsets whenever
one covers the other (in terms of set inclusion, see
(Bollobas, 1986; Godsil and Royle, 2001) and below).
In order to have an analog for the Hamming distance
between partitions defined here, it is necessary to look
at the lattice of partitions of a n-set as the polygon ma-
Rossi, G.
Metrics for Clustering Comparison in Bioinformatics.
DOI: 10.5220/0005707102990308
In Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2016), pages 299-308
ISBN: 978-989-758-173-1
Copyright
c
2016 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
299
troid of the complete graph K
n
on n vertices (Aigner,
1997, pp. 259, 274). In other terms, partitions of a
n-set may well be regarded as those graphs on n ver-
tices each of whose components is complete (i.e. a
clique). In this way, the partition lattice is seen to
be strictly included in the
n
2
-dimensional unit hyper-
cube: the set {0, 1}
(
n
2
)
of hypercube vertices identifies
the 2
(
n
2
)
distinct graphs on n vertices, while linear de-
pendence (Whitney, 1935) entails that partitions only
span B
n
< 2
(
n
2
)
hypercube vertices, where Bell num-
ber B
n
is the number of partitions of a n-set (Rota,
1964a; Graham et al., 1994). The convex hull of
these B
n
vertices identifies a polytope, and the graph
of this polytope (Gr
¨
unbaum, 2001, pp. 212-16) is in
fact the Hasse diagram of the partition lattice. Yet,
while the covering relation between subsets assigns a
unitary weight to each edge of the hypercube (Seb
˝
o
and Tannier, 2004, p. 384), edges of the polytope
of partitions must be weighted through the size (see
above), as this latter quantifies precisely the number
of hypercube edges that collapse into a unique edge
of the polytope. With such a weighting, the Ham-
ming distance between two partitions (like between
two subsets) quantifies the minimum weight of a path
connecting them.
The approach allows for generalizations in that
the size may be replaced with any alternative order-
preserving lattice function, such as the rank (or the
entropy of partitions). Then, polytope edges have
weights obtained as the difference between the greater
and the smaller value taken by the chosen order-
preserving function on the associated endpoints. Ac-
cordingly, the distance between two lattice elements
is the minimum weight of a path connecting them.
2 DISTANCES, LATTICES AND
GRAPHS
For a finite set N = {1, . . . , n}, let (2
N
, , ) and
(P
N
, , ) be the associated subset and partition lat-
tices, ordered by inclusion and coarsening >, re-
spectively. Both are atomic and atomistic; the fomer
is distributive while the latter is geometric (Aigner,
1997; Stern, 1999). A graph G = (V,E) consists of a
vertex set V = {v
1
, . . . , v
m
} and an edge set E V
2
in-
cluded in the
m
2
-set V
2
:= {{v
i
, v
j
} : 1 i < j m}
of unordered pairs of vertices. The complete graph on
m vertices (see above) is K
m
= (V,V
2
). The Hamming
distance HD(P, Q) between partitions P, Q P
N
pro-
posed here aims at reproducing the traditional Ham-
ming distance |AB| between subsets A, B 2
N
while
keeping into account that partitions of a n-set N are in
fact graphs with vertex set V = N and whose compo-
nents are each a complete subgraph (Aigner, 1997).
Distances within a ordered set must be measured
via the order relation, while distances between el-
ements of any set are called ‘Hamming distances’
when such elements are represented as arrays and the
distance between two of them is the number of entries
where their array representations differ. The Ham-
ming distance between subsets A, B 2
N
is
|AB| = |A\B| + |B\A| = r(A B) r(A B), (1)
r : 2
N
Z
+
being the rank function: r(A) = |A|. It
counts how many i N are included in either A or
else B, but not in both. Elements i N, or 1-cardinal
subsets {i} 2
N
, are the atoms of lattice (2
N
, , ),
and (1) is a Hamming distance since subsets A 2
N
are represented as Boolean n-vectors χ
A
{0, 1}
n
or
vertices of the n-dimensional unit hypercube [0, 1]
n
.
This achieves through their characteristic function
χ
A
: N {0, 1}, defined by χ
A
(i) = 1 if i A and
χ
A
(i) = 0 if i N\A. Thus the distance between any
A, B 2
N
is the number |AB| of entries where χ
A
and
χ
B
differ (Bollobas, 1986; Aigner, 1997).
A partition P = {A
1
, . . . , A
|P|
} 2
N
of N is a col-
lection of pairwise disjoint subsets, called blocks (or
clusters), whose union yields N. Any subset A 2
N
has a unique complement A
c
= N\A. For all parti-
tions P P
N
and all non-empty subsets
/
0 A N,
let P
A
= {B A : B P,
/
0 6= B A} denote the parti-
tion of A induced by P. Two non-Hamming distances
between partitions are now briefly introduced. Max-
imum matching distance MMD(P, Q) between two
partitions P, Q P
N
is
MMD(P, Q) = min{|A
c
| :
/
0 A N, P
A
= Q
A
}. (2)
This is the minimum number of elements i N that
must be deleted in order for the two residual induced
partitions to coincide. Also, MMD(P, Q) “is the mini-
mum number of elements that must be moved between
clusters of P so that the resulting partition equals
Q” (Gusfield, 2002, p. 160). It is computable as
a maximum matching or assignment problem (Korte
and Vygen, 2002, chapter 11). In a graph a match-
ing is a set of pairwise disjoint edges, i.e. the end-
points are all different vertices. Now consider the bi-
partite graph G = (P Q, E) with |P| + |Q| vertices,
one for each block of each partition, and join any two
of them A P and B Q with an edge {A, B} E if
A B 6=
/
0. In addition, let |A B| be the weight of
the edge. Then, determining MMD(P, Q) amounts to
find a maximum-weight matching E
in G, that is one
where the sum
(A,B)E
|A B| of edge weights is
maximal. In fact, the minimum number MMD(P, Q)
of elements that must be removed for the two residual
ICPRAM 2016 - International Conference on Pattern Recognition Applications and Methods
300
partitions to coincide is the sum
(A,B)E
|AB| over
all selected edges of the cardinality of the symmetric
difference between the associated endpoints.
Another important measure of the distance be-
tween two partitions P, Q is the variation of informa-
tion V I(P, Q), obtained axiomatically from informa-
tion theory (see (Meila, 2007, expressions (15)-(22),
pages 879-80)). Entropy e(P) =
AP
|A|
n
log
|A|
n
of a partition P (binary logarithm) enables to measure
the distance between P, Q P
N
as
V I(P, Q) = 2e(P Q) e(P) e(Q), (3)
where P Q is the coarsest partition finer than both
P and Q (while is the ‘finest-coarser-than’ op-
erator). Notice that while the range of MMD is
{0, 1, . . . , n 1} Z
+
, VI ranges in a finite subset of
interval [0, log n] R
+
.
Apart from MMD and VI, there exist several other
partition distance measures (see (Deza and Deza,
2013, sections 10.2 and 10.3, pp. 191-193) and
(Day, 1981; Hubert and Arabie, 1985; Warrens, 2008;
Mirkin, 1996)). One was proposed as the Ham-
ming distance between (matrices representing) parti-
tions (Meila, 2007; Mirkin and Cherny, 1970; Mirkin
and Muchnik, 2008), and thus shall be briefly dis-
tinguished from the object of this paper. A bi-
nary relation R on N is a subset R N × N of
ordered pairs (i, j) of elements i, j N. The col-
lection of all such binary relations is subset lattice
(2
N×N
, , ). If symmetry (i, j) R ( j, i) R
and transitivity (i, j), ( j, h) R (i, h) R hold,
then R is an equivalence relation, or a partition of
N into equivalence classes: maximal subsets A 2
N
such that (i, j),( j, i) R for all i, j A are pre-
cisely its blocks. A binary relation R is represented
by a matrix M
R
{0, 1}
n×n
with entries M
R
i j
= 1
if (i, j) R and M
R
i j
= 0 if (i, j) 6∈ R . Let two
equivalence relations R , R
0
have associated partitions
P, P
0
and matrices M
R
, M
R
0
. The distance d(R , R
0
)
between subsets R , R
0
2
N×N
can be computed as
d(R , R
0
) = |R R
0
| = |R R
0
| |R R
0
|. This
is the number of 1s in matrix M
R R
0
= M
R
+ M
R
0
modulo 2 (see (Aigner, 1997, p. 338)). While pro-
viding a distance between partitions P and P
0
, this
is a Hamming distance between certain subsets that
correspond to partitions only in quite special cases,
as lattice (2
N×N
, , ) contains 2
n
2
B
n
elements, or
binary relations, that do not correspond to partitions,
or equivalence relations. The argument also applies
when partitions are represented as Boolean n × n-
matrices through the complement of equivalence re-
lations, namely apartness relations R
c
= (N × N)\R
(Ellerman, 2013b; Ellerman, 2013a). This is detailed
in subsection 4.1 below.
However regarded, partition lattice (P
N
, , ) is
compressed into a larger subset lattice, with which
some elements are shared while some others are not.
This feature is maintained even when partitions are
decomposed as joins of atoms, for they generally
admit several such decompositions. Nevertheless,
when regarded from this perspective partition lat-
tice (P
N
, , ) is seen to be included in subset lat-
tice (2
N
2
, , ), with the two sharing the same
n
2
atoms. In fact, (2
N
2
, , ) is the minimal subset lat-
tice including the partition lattice. Accordingly, the
Hamming distance between partitions HD proposed
below relies precisely on representing partitions as
Boolean
n
2
-vectors, although only B
n
< 2
(
n
2
)
distinct
such vectors correspond to partitions. In particular,
HD is the traditional Hamming distance |EE
0
| be-
tween edge sets E, E
0
2
N
2
, with these latter cor-
responding to partitions only when in both graphs
G = (N, E), G
0
= (N, E
0
) each component is a com-
plete subgraph.
3 HAMMING DISTANCE
The rank function r : P
N
Z
+
for partitions is r(P) =
n |P|, where P
= {{1}, . . . , {n}} is the bottom ele-
ment: r(P
) = 0. Atoms are immediately above, with
rank 1, in the associated Hasse diagram. This latter
is ordered by coarsening >, with coarser partitions in
upper levels (Meila, 2007; Aigner, 1997; Stern, 1999)
and P > Q meaning that every block of Q is included
in some block of P. Hence atoms are those partitions
consisting of n 1 blocks, namely n 2 singletons
and one pair. These
n
2
unordered pairs {i, j} N
2
are the same atoms as in subset lattice (2
N
2
, , ). For
notational convenience, let [i j] P
N
denote the atom
where the unique 2-cardinal block is (unordered) pair
{i, j} [i j].
Consider χ
N
{0, 1}
n
as the n-vector with all en-
tries equal to 1 and denote by hx, yi the scalar prod-
uct between x and y. The Hamming distance between
subsets A, B 2
N
is |AB| = |A| + |B| 2|A B| =
= hχ
A
, χ
N
i + hχ
B
, χ
N
i 2hχ
A
, χ
B
i. (4)
In order to replace subsets A with partitions P, let
P
N
(1)
= {[i j] : 1 i < j n} be the
n
2
-set of atoms
of the partition lattice, i.e. P
N
(1)
N
2
. The ana-
log of characteristic function χ
A
is indicator function
I
P
: P
N
(1)
{0, 1} defined, for P P
N
, [i j] P
N
(1)
, by
I
P
([i j]) =
1 if P > [i j]
0 if P 6> [i j]
. In words, if pair {i, j} is
included in some block A of P (i.e. {i, j} A P),
Metrics for Clustering Comparison in Bioinformatics
301
then partition P is coarser than atom [i j], and the
corresponding position I
P
([i j]) of indicator array I
P
has entry 1. Otherwise, I
P
([i j]) equals 0. Top el-
ement P
>
= {N} for partitions yields I
P
>
, i.e. the
n
2
-vector with all entries equal to 1. The number
s(P) = |{[i j] : [i j] 6 P}| of atoms finer than P is
(Rossi, 2011) the size s : P
N
Z
+
, that is
s(P) =
AP
|A|
2
= hI
P
, I
P
>
i. (5)
While the cardinality of subsets |A| = hχ
A
, χ
N
i takes
every integer value between 0 and n, the size of parti-
tions s(P) = hI
P
, I
P
>
i does not the same between 0 and
n
2
. Minimally, this may be seen for N = {1, 2, 3},
as the B
3
= 5 partitions are the finest {{1}, {2}, {3}}
and coarsest {1, 2, 3} ones, together with
3
2
= 3
atoms: [12] = {{1, 2}, {3}}, [13] = {{1, 3}, {2}} and
[23] = {{2, 3}, {1}}. Thus, there is no partition with
size equal to 2, in that {1, 2, 3} = [12] [13] [23]
and {1, 2, 3} = [12] [23] = [12] [13] = [13] [23].
Available sizes for 1 n 7 are in Table I below.
Table 1: Available sizes of partitions of a n-set, 1 n 7.
|N| = n {s(P) : P P
N
} Available sizes
1 {0}
2 {0, 1}
3 {0, 1, 3}
4 {0, 1, 2, 3, 6}
5 {0, 1, 2, 3, 4, 6, 10}
6 {0, 1, 2, 3, 4, 6, 7, 10, 15}
7 {0, 1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 15, 21}
In atomistic lattices, such as 2
N
, P
N
and 2
N
2
, ev-
ery element admits a decomposition as a join of atoms
(Aigner, 1997; Stern, 1999). While subsets A 2
N
and E 2
N
2
admit a unique such a decomposition,
namely A =
iA
{i} and E =
{i, j}∈E
{i, j}, partitions
generally admit several such decompositions. For
n = 3 as above, the coarsest partition {1, 2, 3} de-
composes either as the join of any two atoms, or else
as the join of all the three available atoms at once.
In particular, the rank r(P) = n |P| of any parti-
tion P is the minimum number of atoms involved
in a join-decomposition of P, while the size s(P) is
the maximum number of atoms involved in a join-
decomposition of P. The coarsest partition {1, 2, 3}
of a 3-cardinal set has rank r({1, 2, 3}) = 3 1 = 2
and size s({1, 2, 3}) = 3 =
3
2
.
Proposition 1. The size is strictly monotone: for all
P, Q P
N
, if P > Q, P 6= Q, then s(P) > s(Q).
Proof. If P > Q (i.e. P > Q, P 6= Q), then every block
A P is the union of some blocks B
1
, . . . , B
|Q
A
|
Q,
with |Q
A
| > 1 for at least one block A P. Recall
that Q
A
is the partition of A induced by Q (see the
definition of MMD in section 2 above). The union of
any two such blocks B, B
0
Q increases the size by
|B|+|B
0
|
2
[
|B|
2
+
|B
0
|
2
] = |B||B
0
|, which is strictly
positive as blocks are non-empty.
In order to reproduce (1) and (4) above, Hamming
distance HD : P
N
× P
N
Z
+
has to count the num-
ber of atoms finer than either one of any two partitions
P, Q P
N
but not finer than both, that is HD(P, Q) =
= |{[i j] : P > [i j] 66 Q}| + |{[i j] : P 6> [i j] 6 Q}|. In
view of (5), HD(P, Q) = s
P
+ s
Q
2s(P Q) =
= hI
P
, I
P
>
i + hI
Q
, I
P
>
i 2hI
P
, I
Q
i. (6)
Also P Q =
P>[i j]6Q
[i j] is the maximal decomposi-
tion of P Q as a join of atoms, thus
HD(P, Q) = hI
P
, I
P
>
i + hI
Q
, I
P
>
i 2hI
PQ
, I
P
>
i.
3.1 HD and VI: Axioms
Following (Meila, 2007), attention is now placed on
those axioms that characterize both partition distance
measures HD and VI.
Proposition 2. HD is a metric: for all P, P
0
, Q P
N
,
1. HD(P, Q) = HD(Q, P),
2. HD(P, Q) 0, with equality if and only if P = Q,
3. HD(P, P
0
) + HD(P
0
, Q) HD(P, Q).
Proof. The first condition is obvious. In view of
proposition 1, the second one is also immediate as
min{s(P), s(Q)} s(P Q). In fact, HD(P, Q) is the
sum [s(P)s(P Q)] + [s(Q) s(P Q)] of two pos-
itive integers, while min
P6=Q
HD(P, Q) = 1 = min
PP
N
s(P).
As for the third condition, known as triangle inequal-
ity, difference HD(P, P
0
) + HD(P
0
, Q) HD(P, Q) =
= 2[s(P
0
) s(P P
0
) s(P
0
Q)+ s(P Q)] must be
shown to be 0 for all P, P
0
, Q P
N
. Since size
s(P Q) is given, s(P
0
) [s(P P
0
) + s(P
0
Q)] has
to be minimized by suitably choosing P
0
. Now, sum
s(P P
0
) + s(P
0
Q) is maximized when P P
0
= P
(or P
0
> P) and P
0
Q = Q (or P
0
> Q). If Q 6 P
0
> P,
then P
0
= PQ minimizes the whole difference. Thus
HD satisfies triangle inequality as long as the size is
supermodular: s(P Q) s(P) s(Q) + s(P Q) 0
for all P, Q P
N
. The simplest way to see that this
is indeed the case is by focusing on M
¨
obius inver-
sion of lattice (or more generally poset) functions
(Rota, 1964b; Aigner, 1997; Stern, 1999). By def-
inition, the size s : P
N
Z
+
has M
¨
obius inversion
µ
s
: P
N
Z given by µ
s
(P) = 1 if P is an atom (i.e.
P = [i j] P
N
(1)
), and µ
s
(P) = 0 otherwise. In fact,
ICPRAM 2016 - International Conference on Pattern Recognition Applications and Methods
302
s(P) =
Q6P
µ
s
(Q) for all P P
N
. Hence the size
satisfies a condition which is sufficient (but not nec-
essary) for supermodularity, as its M
¨
obius inversion
takes only values 0. This completes the proof.
Triangle inequality is satisfied with equality by
HD (and VI) when P
0
= P Q (for VI, see (Meila,
2007, pp. 883, 888) properties 6, 10(A.2)).
Proposition 3. HD satisfies horizontal collinearity:
HD(P, P Q) + HD(P Q, Q) = HD(P, Q).
Proof. HD(P, P Q) + HD(P Q, Q) =
= [s(P) s(P Q)] + [s(Q) s(P Q)] as well as
HD(P, Q) = s(P) +s(Q) 2s(P Q).
Collinearity also applies to distances between par-
titions P, Q that are comparable, i.e. either P > Q or
Q > P. Firstly consider the case involving the top P
>
and bottom P
elements (for VI, see (Meila, 2007, p.
888) property 10(A.1)).
Proposition 4. HD satisfies vertical collinearity:
HD(P
, P) + HD(P, P
>
) = HD(P
, P
>
).
Proof. HD(P
, P) + HD(P, P
>
) =
= s(P) + s(P
>
) s(P) = s(P
>
) independently
from P, as well as HD(P
, P
>
) = s(P
>
) =
n
2
.
Vertical collinearity may be generalized for arbi-
trary comparable partitions P
>
> P > Q > P
, in that
HD(Q, P
0
) + HD(P
0
, P) = HD(Q, P) for all P
0
satis-
fying Q 6 P
0
6 P.
3.2 Complementation
The distance between the bottom and top elements
considered by vertical collinearity leads to regard
such lattice elements as complements, thereby focus-
ing on the distance between other, generic comple-
ments. Maintaining the traditional Hamming distance
between subsets as the fundamental benchmark, it
must be taken into account that the subset and parti-
tion lattices are very different in terms of complemen-
tation. Every subset A 2
N
has a unique complement
A
c
, and the distance between any two such comple-
ments equals the distance between the bottom and top
elements: |AA
c
| = n = |N
/
0|. Conversely, partitions
P generally have several and quite different comple-
ments, which are all those Q such that P Q = P
as well as P Q = P
>
. In this respect, MMD mea-
sures the distance between any two complements P, Q
solely through their cardinalities |P|, |Q|, while VI
and HD provide a fine distinction between different
complements, and also agree on which are closer and
which are remoter. The issue may be exemplified as
follows: for N = {1, . . . , 7}, consider P = 123|456|7
and P
= 147|2|3|5|6 and P
= 1|2|34|5|67 (where
vertical bar | separates blocks). Both P
and P
are
complements of P, that is P P
= P P
= P
and
P P
= P P
= P
>
. Here VI, HD and MMD are:
V I(P, P
) ' 1.93 < 1.95 ' V I(P, P
),
HD(P, P
) = 8 < 9 = HD(P, P
),
MMD(P, P
) = 4 = MMD(P, P
).
For MMD this examples generalizes as follows.
Proposition 5. For any two complements P, Q P
N
,
it holds MMD(P, Q) = max{r(P), r(Q)}.
Proof. If P Q = P
, then edges {A, B} E P ×Q
of the bipartite graph G = (P Q, E) defined in sec-
tion 2 have all same weight 1 = |A B| (see above).
Hence, a maximum-weight matching simply is one
including the maximum number of feasible edges. In
turn, such a number equals
APQ
min{|P
A
|, |Q
A
|},
because each block (of either partition) can be the
endpoint of at most one edge included in a match-
ing. Also, the number of elements i N that must be
deleted for the two residual partitions to coincide is
APQ
(|A| min{|P
A
|, |Q
A
|}). On the other hand,
P Q = P
>
entails
APQ
(|A| min{|P
A
|, |Q
A
|}) =
= n min{|P|, |Q|} = max{r(P), r(Q)}.
The class c : P
N
Z
n
+
of partitions (Rota, 1964b)
identifies the vector c(P) = (c
1
(P), . . . , c
n
(P)) where
c
k
(P) is the number of k-cardinal blocks of P, for
1 k n. As shown by the above example, a par-
tition generally has different complements with dif-
ferent classes. In this view, for all P P
N
denote by
C O(P) = {Q : P Q = P
, P Q = P
>
} the set of
complements of P.
A modular element of the partition lattice (Aigner,
1997; Stern, 1999; Stanley, 1971) is any P P
N
where all blocks are singletons apart from only one,
at most, i.e.
1<kn
c
k
(P) 1. The sublattice P
N
mod
of modular elements contains the bottom and top el-
ements, and all partitions of the form {A} P
A
c
with
1 < |A| < n. Hence, |P
N
mod
| = 2
n
n, while P
N
mod
= P
N
for n 3 and P
N
mod
P
N
for n > 3.
Here, the main link between modular elements
and complementation is that an element is modular
if and only if no two of its complements are compa-
rable (see (Stanley, 1971, Theorem 1)). Therefore,
if P 6∈ P
N
mod
, then there are Q, Q
0
C O(P) such that
Q > Q
0
. It seems thus important that the distance be-
tween P and Q differs from the distance between P
and Q
0
. The following result bounds the Hamming
distance HD between a partition and its complements.
Proposition 6. For all P P
N
, if Q C O(P), then
s(P) + |P| 1 HD(P, Q) s(P) +
|P|
2
, where the
upper bound is always tight, while the lower one is
tight only if c
1
(P) 2 +
1<kn
(k 2)c
k
(P).
Metrics for Clustering Comparison in Bioinformatics
303
Proof. If Q C O(P), then HD(P, Q) = s(P) + s(Q).
Hence s(P) + min{s(Q) : Q CO(P)} HD(P, Q)
and HD(P, Q) s(P) + max{s(Q) : Q C O(P)}. A
complement of P has join-decompositions minimally
involving |P| 1 atoms [i j]
1
, . . . , [i j]
|P|−1
P
N
(1)
, with
|A
m
{i, j}
m
| = 1 = |A
m+1
{i, j}
m
|, 1 m < |P|.
Considering the upper bound first, observe that size
s([i j]
1
··· [i j]
|P|−1
) attains its maximum when
|{i, j}
m
{i, j}
m+1
| = 1 for all 1 m < |P| 1, in
which case s([i j]
1
· · · [i j]
m
) =
m+1
2
, 1 m < |P|.
This complement P
= [i j]
1
· · ·[i j]
|P|−1
always ex-
ists, whatever the class c(P) of P, making the bound
tight. In fact, P
P
N
mod
has n |P| + 1 blocks, out of
which n |P| are singletons, while the remaining one
B P
is |P|-cardinal and satisfies |B A| = 1 for all
A P, i.e. P
= {B} P
B
c
s(P
) =
|P|
2
. For the
lower bound, note that size s([i j]
1
·· · [i j]
|P|−1
) at-
tains its minimum, ideally, when {i, j}
m
{i, j}
m
0
=
/
0,
1 m < m
0
< |P|, i.e. s([i j]
1
·· · [i j]
m
) = m for all
1 m < |P|. This is not always possible as each A P
can have non-empty intersection with a number of
pair-wise disjoint pairs {i, j}
m
, 1 m < |P| bounded
above by |A|, entailing that the constraint is given by
the number c
1
(P) of singletons {i} P. Specifically,
nesting together
1<kn
c
k
(P) non-singleton blocks
requires
1<kn
c
k
(P) 1 pairs {i, j}
m
. If these lat-
ter have to be pair-wise disjoint, then the maximum
number of elements j N in non-singleton blocks
available to match (into pair-wise disjoint pairs) those
elements {i} P in singleton blocks is precisely
1<kn
kc
k
(P) 2
1<kn
c
k
(P) 1
.
Proposition 7. If 2 +
1<kn
(k 2)c
k
(P) < c
1
(P),
then min
P
C O(P)
s(P
) = (n θ(P)b
n
θ(P)
c)
d
n
θ(P)
e
2
+
+[θ(P)(b
n
θ(P)
c + 1) n]
b
n
θ(P)
c
2
,
where θ(P) = 1 +
1<kn
c
k
(P)(k 1).
Proof. If 2 +
1<kn
(k 2)c
k
(P) < c
1
(P), then the
above proof of proposition 6 entails that the maximum
number max{|Q| : Q C O(P)} of blocks of a com-
plement of P is θ(P). Among θ(P)-cardinal partitions
P
, the size is minimized when |B| {b
n
θ(P)
c, d
n
θ(P)
e}
for all B P
, where the number of b
n
θ(P)
c-cardinal
blocks is θ(P)(b
n
θ(P)
c + 1) n, while the number of
d
n
θ(P)
e-cardinal blocks is n θ(P)b
n
θ(P)
c.
Proposition 8. Among complements Q C O(P),
HD and VI have common minimizers and maximiz-
ers, i.e. arg min
QC O(P)
HD(P, Q) = argmin
QC O(P)
V I(P, Q), and
argmax
QC O(P)
HD(P, Q) = argmax
QC O(P)
V I(P, Q).
Proof. If Q C O(P), then V I(P, Q) is minimized or
maximized when e(Q) is, respectively, maximized
or minimized, as V I(P, Q) = 2 log n e(P) e(Q).
Given this, if P P
N
mod
, then all Q C O(P) have same
rank. Otherwise, there are comparable complements,
i.e. with different rank (see above). Thus, in gen-
eral, among complements Q C O(P) entropy e(Q)
is minimized when |Q| is minimized and, in addition,
Q P
N
mod
. This is precisely where size s(Q) is max-
imized. Similarly, e(Q) is maximized when |Q| is
maximized and, in addition, |B| {b
n
|Q|
c, d
n
|Q|
e} for
all B Q. Again, this is where s(Q) is minimized.
4 MINIMUM-WEIGHT PATHS
Hamming distance |EE
0
| between edge sets E, E
0
2
N
2
is the length of a shortest path between vertices
χ
E
, χ
E
0
{0, 1}
(
n
2
)
of the
n
2
-dimensional unit hyper-
cube [0, 1]
(
n
2
)
, where χ
E
: N
2
{0,1} is the char-
acteristic function defined above in section 2, i.e.
χ
E
({i, j}) = 1 if {i, j} E and 0 otherwise. Re-
call that a polytope naturally defines a graph with its
same vertices and edges (Brøondsted, 1983, p. 93),
and the hypercube is perhaps the main example of
polytope. In fact, the graph of hypercube [0, 1]
(
n
2
)
is the Hasse diagram of Boolean lattice (2
N
2
, , ),
for its edges correspond to the covering relation, that
is to say {E, E
0
} is an edge of the hypercube if ei-
ther E E
0
, |E| = |E
0
| + 1 or else the converse, i.e.
E
0
E, |E
0
| = |E| + 1.
Clearly, a shortest path is a minimum-weight path
as long as every edge has weight 1. This simple
observation is the starting point towards an analog
view of the Hamming distance HD between parti-
tions, namely as the weight of a minimum-weight
path in the associated Hasse diagram. To this end,
define the polytope of partitions P as the convex
hull P := conv({I
P
: P P
N
}) [0, 1]
(
n
2
)
containing
all convex combinations of the B
n
Boolean vectors
defined by the indicator functions of partitions. Also
denote by G = (P
N
, E) the graph of polytope P or,
equivalently, the Hasse diagram of partition lattice
(P
N
, , ). Edges correspond to the covering rela-
tion: {P, Q} E if either {P
0
: Q 6 P
0
6 P] = {P, Q}
or else {P
0
: P 6 P
0
6 Q} = {P, Q} (see above).
Denote this relation by P m Q {P, Q} E. Finally,
let F R
B
n
be the vector space of symmetric and
order-preserving partition functions f : P
N
R, that
is to say, respectively, for all P, Q P
N
,
(a) c(P) = c(Q) f (P) = f (Q), and
(b) P > Q f (P) > f (Q) or else
(b’) P > Q f (P) < f (Q).
ICPRAM 2016 - International Conference on Pattern Recognition Applications and Methods
304
Entropy, rank and size e, r, s : P
N
R are in F, with
e satisfying (b’) and r, s satisfying (b). For f F, let
weights w
f
: E R
++
on edges {P, Q} E of G be
w
f
({P, Q}) = max{ f (P), f (Q)} min{ f (P), f (Q)}.
For P, Q P
N
, let Path(P, Q) contain all P Q-paths
p(P, Q) in graph G. Any p(P, Q) Path(P, Q) is a
subgraph p(P, Q) = (V
p
P,Q
, E
p
P,Q
) G with vertex set
V
p
P,Q
= {P = P
0
, P
1
, . . . , P
m
= Q} and edge set
E
p
P,Q
= {{P
0
, Q
0
}, {P
1
, Q
1
}, . . . , {P
m1
, Q
m1
}},
where P
k+1
= Q
k
, 0 k < m. Graph G is connected,
entailing Path(P, Q) 6=
/
0 for all P, Q P
N
. The weight
of p(P, Q) is w
f
(p(P, Q)) =
0k<m
w
f
({P
k
, Q
k
}).
Definition 9. For f F, minimum- f -weight partition
distance δ
f
: P
N
× P
N
R
+
is
δ
f
(P, Q) := min
p(P,Q)Path(P,Q)
w
f
(p(P, Q)). (7)
Proposition 10. For f F and P, Q P
N
, every
minimum- f -weight P Q-path visits P Q or P Q
or both, i.e. V
p
P,Q
{P Q, P Q} 6=
/
0 for all p(P, Q)
satisfying w
f
(p(P, Q)) = δ
f
(P, Q).
Proof. If P > Q, then {P Q, P Q} V
p
P,Q
for all
p(P, Q) Path(P, Q), with P m Q {P, Q} = V
p
P,Q
.
Differently, if P 6> Q 6> P, then any path p(P, Q)
visits some vertex P
0
comparable with both P, Q
and, in particular, satisfying either P
0
> P, Q or else
P, Q > P
0
. Accordingly, p(P, Q) = p(P, P
0
) p(P
0
, Q),
with E
p
P,P
0
E
p
P
0
,Q
=
/
0, for some P P
0
-path p(P, P
0
)
and P
0
Q-path p(P
0
, Q), entailing that w
f
(p(P, Q))
equals w
f
(p(P, P
0
)) + w
f
(p(P
0
, Q)). Finally, since
f is order-preserving and symmetric, P
0
= P Q
minimizes w
f
(p(P, P
0
)) + w
f
(p(P
0
, Q)) over all ver-
tices P
0
> P, Q as well as P
0
= P Q minimizes
w
f
(p(P, P
0
)) + w
f
(p(P
0
, Q)) over all P
0
< P, Q.
Whether a minimum- f -weight path visits the join
or else the meet of any two incomparable partitions
clearly depends on f . A generic f F may have
associated minimum-weight paths visiting the meet
of some incomparable partitions P, Q and the join of
some others P
0
, Q
0
. Whether minimum-weight paths
awlays visit the meet or else the join depends on
whether f or else f is supermodular. Note that
if f is supermodular, then f is submodular, i.e.
f (P Q) f (P Q) f (P) f (Q).
Proposition 11. Let f F satisfy (b), i.e. P > Q
entails f (P) > f (Q). If f is supermodular, then the
minimum- f -weight partition distance is
δ
f
(P, Q) = f (P) + f (Q) 2 f (P Q),
while if f is submodular, then the minimum- f -weight
partition distance is
δ
f
(P, Q) = 2 f (P Q) f (P) f (Q).
Proof. Supermodularity entails
2 f (P Q) f (P) f (Q) f (P Q) f (P Q)
and f (P Q) f (P Q) f (P)+ f (Q)2 f (P Q),
whereas submodularity entails
2 f (P Q) f (P) f (Q) f (P Q) f (P Q)
and f (P Q) f (P Q) f (P)+ f (Q)2 f (P Q),
for all P, Q P
N
.
Since the size s is supermodular (see proposition
2 above), Hamming distance HD is the minimum-s-
weight partition distance, i.e. HD(P, Q) = δ
s
(P, Q) for
all P, Q P
N
. The rank r of partitions being submod-
ular (Aigner, 1997, pp. 259, 265, 274), minimum-r-
weight distance is δ
r
(P, Q) = |P| + |Q| 2|P Q|. In
fact, w
r
({P, Q}) = 1 for all edges {P, Q} E, hence
δ
r
is a shortest-path distance.
Turning to entropy e, a simple example shows
that VI distance does not correspond to the e-based
minimum-weight distance.
Proposition 12. There are partitions P, Q P
N
such
that 2e(P Q) e(P) e(Q) > δ
e
(P, Q).
Proof. For two atoms [i j], [i j
0
] P
N
(1)
, with non
empty intersection {i, j} {i, j
0
} = {i}, VI distance
is V I([i j], [i j
0
]) = 2e([i j] [i j
0
]) e([i j]) e([i j
0
]) =
= 2 log n 2
logn
2
n
=
4
n
, while minimum-e-
weight distance is e([i j]) + e([i j
0
]) 2e([i j] [i j
0
]) =
= 2
logn
2
n
2
logn
3
n
log3
=
=
2
n
(3log3 2) = δ
e
([i j], [i j
0
]), with 3log 3 < 4 en-
tailing V I([i j], [i j
0
]) > δ
e
([i j], [i j
0
]).
An alternative measure of partition entropy, called
logical entropy, has been recently proposed (Eller-
man, 2013a) in terms of distinctions or ordered pairs
(i, j) N × N, hence (i, j) 6= ( j, i). If distinctions
are replaced with unordered pairs {i, j} N
2
, then
mutatis mutandis the non-normalized logical entropy
of partitions P is the analog of
n
2
s(P), providing
a further minimum-weight partition distance. Also,
since in information theory partitions are evaluated
through functions f such that P > Q f (P) < f (Q),
the approach developed thus far may be applied to the
upside-down Hasse diagram of the partition lattice,
with co-atoms in place of atoms, as detailed below.
4.1 Distinctions, Co-atoms and Fields
A partition P distinguishes between i N and j N\i
if i A P while j B P with A 6= B, and the
set of such distinctions has been recently proposed as
the logical analog of the complement of P, with the
(normalized) number of distinctions providing a novel
measure of the (logical) entropy of partitions (Eller-
man, 2013b; Ellerman, 2013a). This achieves through
apartness relations R
c
, which are the complement of
Metrics for Clustering Comparison in Bioinformatics
305
equivalence relations R , both being sets of ordered
pairs (i, j) N ×N (see section 2 above). In terms of
atoms [i j] P
N
(1)
, the logical entropy h : P
N
R
+
of
partitions (Ellerman, 2013a, p. 127) is
h(P) =
2|{[i j]:P6>[i j]}|
n
2
=
2
((
n
2
)
s(P)
)
n
2
=
n(n1)2s(P)
n
2
,
with h(P
>
) = 0 = s(P
) and h(P
) =
n1
n
=
2s(P
>
)
n
2
.
Proposition 13. The minimum-h-weight distance is
δ
h
(P, Q) = 2h(P Q) h(P) h(Q).
Proof. Logical entropy h is symmetric and satisfies
P > Q h(P) < h(Q), hence h F. Also, apart from
constant terms, h varies with s, which is submodu-
lar because s is supermodular. That is to say,
h(P) +h(Q) =
2
n
n 1
s(P)+s(Q)
n
and
h(P Q) + h(P Q) =
2
n
n 1
s(PQ)+s(PQ)
n
.
Thus s(P Q) + s(P Q) s(P) + s(Q) entails
h(P Q) + h(P Q) h(P) + h(Q). Also, like in
proposition 11 above but with reversed inequalities,
2h(P Q) h(P) h(Q) h(P Q) h(P Q) and
h(P Q) h(P Q) h(P)+ h(Q) 2h(P Q).
Reasoning in terms of ordered pairs results in a
double counting, in that (i, j) R
c
( j, i) R
c
for all apartness relations R
c
and all (i, j) N × N.
Hence an analog logical entropy
ˆ
h of partitions may
be defined in terms of unordered pairs {i, j} N
2
or atoms [i j] P
N
(1)
by
ˆ
h(P) =
(
n
2
)
s(P)
(
n
2
)
= 1
s(P)
(
n
2
)
.
Again,
ˆ
h F and
ˆ
h(P
>
) = 0 as well as
ˆ
h(P
) = 1.
Therefore, the minimum-
ˆ
h-weight distance is
δ
ˆ
h
(P, Q) = 2
ˆ
h(PQ)
ˆ
h(P)
ˆ
h(Q) for all P, Q P
N
.
On the other hand, a distance between partitions
also obtains by dealing directly with their associated
set of distinctions: let D
P
= {[i j] : P 6> [i j]} and con-
sider the distance between any two partitions P, Q
given by the traditional Hamming distance between
their sets of (unordered) distinctions, i.e. |D
P
D
Q
|. In
particular, |D
P
D
Q
| =
n
2
2
(2h(P Q) h(P) h(Q)).
In view of proposition 13 above, this is the non-
normalized minimum-h-weight distance.
A field of subsets is a set system F 2
N
which
is closed under union, intersection and complemen-
tation, hence A B, A B, A
c
F for all A, B F .
Every partition P P
N
generates the field F
P
:= 2
P
containing all subsets B 2
N
obtained as the union of
blocks A P, with F
P
= 2
N
as well as F
P
>
= {
/
0, N}.
There are 2
n1
1 minimal fields that strictly include
F
P
>
; they are those F
A
= F
A
c
= {
/
0, A, A
c
, N} with
/
0 A N. On the other hand, 2-cardinal partitions
{A, A
c
} P
N
are the co-atoms (Aigner, 1997) of par-
tition lattice (P
N
, , ) ordered by coarsening. In
fact, in information theory finer partitions are gener-
ally more valuable than coarser ones, and thus atten-
tion is placed on partition functions f such as entropy
e or logical entropy h satisfying f (P) < f (Q) when-
ever P > Q. In this view, the partition lattice is often
dealt with as ordered by refinement and thus with the
upside-down Hasse diagram. Accordingly, a distance
between partitions also obtains by counting co-atoms
rather than atoms. Define the co-size cs : P
N
Z
+
of
partitions by cs(P) = |{{A, A
c
} : P 6 {A, A
c
}}|, with
cs(P
) = 2
n1
1 and cs(P
>
) = 0. In words, cs(P) is
the number of co-atoms coarser than P.
Proposition 14. The minimum-cs-weight partition
distance is δ
cs
(P, Q) = cs(P) + cs(Q) 2cs(P Q).
Proof. Denote by ˆµ
cs
: P
N
Z the M
¨
obius inversion
from above (Rota, 1964b; Aigner, 1997) of the co-
size, with cs(P) =
Q>P
ˆµ
cs
(Q) for all P. By defini-
tion, ˆµ
cs
(P) = 1 if |P| = 2 and 0 otherwise. Like for
the size in proposition 2, this entails supermodularity,
i.e. cs(P Q) + cs(P Q) cs(P) + cs(Q). Also,
cs F with cs(P) < cs(Q) whenever P > Q, thus
cs(P) +cs(Q) 2cs(P Q) cs(P Q) cs(P Q),
cs(P Q) cs(P Q) 2cs(P Q) cs(P) cs(Q)
for all P, Q P
N
.
Denote by (, u, t) the lattice whose elements
are the B
n
fields of subsets F
P
, P P
N
ordered by
inclusion . The meet and join are, respectively,
F
P
u F
Q
= F
PQ
and F
P
t F
Q
= F
PQ
. The set of
atoms is the collection {F
{A,A
c
}
:
/
0 A N} of min-
imal fields, i.e. F
P
= t
{A,A
c
}>P
F
{A,A
c
}
for all F
P
.
Thus δ
cs
(P, Q) is also an analog of the traditional
Hamming distance between subsets: δ
cs
(P, Q) =
= |{{A, A
c
} : F
{A,A
c
}
F
P
}|+
+|{{A, A
c
} : F
{A,A
c
}
F
Q
}|+
2|{{A, A
c
} : F
{A,A
c
}
(F
Q
F
P
)}|. In words, this
is the number of minimal fields F
{A,A
c
}
included in
either F
P
or else in F
Q
, but not in both.
5 THE CONSENSUS PARTITION
PROBLEM
Hamming distance between partitions HD firstly ap-
pears in the mid ’60s (R
´
enier, 1965) in terms of the
consensus partition problem, which is important in
many applicative scenarios concerned with statistical
classification. From a combinatorial optimization per-
spective, after selecting a metric δ : P
N
× P
N
R
+
,
an instance is a m-collection P
1
, . . . , P
m
P
N
, m 2,
and the objective is to find a partition
ˆ
P minimizing
the sum of its distances from the m partitions: any
ˆ
P
satisfying
1km
δ(
ˆ
P, P
m
)
1km
δ(Q, P
k
) for all
ICPRAM 2016 - International Conference on Pattern Recognition Applications and Methods
306
Q P
N
is a consensus partition. For generic δ, find-
ing a solution
ˆ
P is tipically hard. If δ = MMD, then
each distance δ(Q, P
k
), 1 k m for any Q P
N
is
computable in O(n
3
) time (Korte and Vygen, 2002,
p. 236), whereas if δ = HD, then in view of expres-
sion (6) above (see section 3) δ(Q, P
k
) is computable
more rapidly through scalar products. Independently
from the chosen metric δ, the main issue is that the
size B
n
= |P
N
| of the search space P
N
makes all ap-
proaches relying on direct enumeration simply unvi-
able, at least for relevant values of n. The problem
is thus commonly interpreted in terms of heuristics
(Pinto Da Costa and Rao, 2004; Celeux et al., 1989).
Although the consensus problem is generally
harsh, still the analysis conducted thus far identifies
conditions where exact solutions are easy to find. If
the chosen metric is a minimum- f -weight partition
distance, i.e. δ = δ
f
with f F, and weighting
function f is either supermodular or else submodu-
lar (but not both, see below), then either the meet
ˆ
P = P
1
··· P
m
or else the join
ˆ
P = P
1
··· P
m
of
instance elements are consensus partitions. The for-
mer case applies to Hamming distance or size-based
δ
s
= HD and to logical entropy-based δ
h
, while the
latter applies to rank-based δ
r
and to co-size-based
δ
cs
. The computational burden thus reduces solely to
assessing the m distances between instance elements
and their meet (or their join), with no search need.
Proposition 15. If distances between partitions are
measured by HD, then the meet of instance elements
achieves consensus: for Q, P
1
, . . . , P
m
P
N
,
1km
HD(P
1
·· · P
m
, P
k
)
1km
HD(Q, P
k
).
Proof. Firstly note that for m = 2 this consensus con-
dition is in fact a restatement of horizontal collinear-
ity and triangle inequality (see propositions 2 and 3
above). Hence, in order to use induction, assume that
the condition holds for some m 2, and denote by
ˆ
P
the solution or consensus partition of a m + 1-instance
P
1
. . . , P
m
, P
m+1
. By assumption, P
1
·· · P
m
is a so-
lution of instance P
1
, . . . , P
m
, thus novel solution
ˆ
P
minimizes the sum of its distances from the previous
solution P
1
··· P
m
and from the novel instance el-
ement P
m+1
, i.e. for all Q P
N
,
HD(P
1
·· · P
m
,
ˆ
P) + HD(
ˆ
P, P
m+1
)
HD(P
1
· · · P
m
, Q) + HD(Q, P
m+1
) Then, hor-
izontal collinearity and triangle inequality entail
HD(P
1
·· · P
m
,
ˆ
P) + HD(
ˆ
P, P
m+1
)
HD(P
1
·· · P
m
, P
m+1
), with equality if
ˆ
P = P
1
·· · P
m
P
m+1
.
The consensus partition problem may be also
framed in a novel manner through fuzzy modeling.
A fuzzy subset of N is a function q : N [0, 1] or,
geometrically, a point q = (q
1
, . . . , q
n
) [0, 1]
n
in the
n-dimensional unit hypercube, where q
i
= q(i), i N.
A fuzzy partition is thus commonly intended as a par-
tition P with associated |P| points q
A
[0, 1]
n
, A P
in the hypercube such that q
A
i
(0, 1] for all i A and
all A P. Also, a fuzzy or random graph with vertex
set N may be seen as one whose edge set is a fuzzy
subset of N
2
, i.e. a function t : N
2
[0, 1] or, geomet-
rically, a point in the
n
2
-dimensional unit hypercube,
i.e. t =
t
{i, j}
1
, . . . ,t
{i, j}
(
n
2
)
[0, 1]
(
n
2
)
.
By looking at partitions of N as graphs with ver-
tex set N each of whose components is complete (see
above), the fuzzy consensus partition t
I
associated
with instance I P
N
may be defined as the point
in the interior of the polytope P of partitions (see
above) corresponding to the center of the convex hull
conv({I
P
: P I }) consisting of all convex combina-
tions of the indicator functions I
P
, P I of instance
elements. Then, the fuzzy consensus partition is a
function ranging in the unit interval [0, 1] and taking
values on the atoms of P
N
, i.e. t
I
: P
N
(1)
[0, 1] and
t
I
([i j]) =
1
|I |
PI
I
P
([i j]) for all atoms [i j] P
N
(1)
. In
this framework, the strong patterns of instance I con-
sidered in (Pinto Da Costa and Rao, 2004) are the
blocks of partition P(t
I
) obtained through defuzzifi-
cation of t
I
as follows: P(t
I
) =
t
I
([i j])=1
[i j]. In words,
P(t
I
) obtains as the join of all atoms where the fuzzy
consensus partition t
I
attains its maximum, i.e. 1.
6 CONCLUSIONS
Measuring the distance between partitions is an im-
portant topic in statistical classification since the ’60s
(Lerman, 1981; R
´
enier, 1965). This works considers
the analog of the traditional Hamming distance be-
tween subsets by counting unordered pairs of parti-
tioned elements. Counting ordered and/or unordered
pairs is not new (see (Meila, 2007, Section 2.1)),
but the Hamming distance HD is here analyzed from
a novel geometric perspective. Special attention is
placed on complements in comparison with two dis-
tances proposed in recent years, namely MMD and
VI. Given its low computational complexity and fine
measurement sensitivity, HD seems interesting for ap-
plications, especially in bioinformatics.
HD relies on the size, which counts the atoms finer
than partitions. While the cardinality (or rank) of sub-
sets is a valuation (i.e. supermodular and submodu-
lar), the size is supermodular. In fact, if f is a valu-
ation of the partition lattice, then it is constant, i.e.
f (P) = f (Q) for all partitions P, Q (Aigner, 1997).
Metrics for Clustering Comparison in Bioinformatics
307
Since the Hamming distance between A, B 2
N
is
|AB| = |A B| |A B|, it may seem reasonable
to consider distances δ(P, Q) between P, Q P
N
of
the form δ(P, Q) = f (P Q) f (P Q) with f F.
Yet, this clearly does not distinguish between differ-
ent complements Q, Q
0
C O(P) when P 6= P
, P
>
.
The geometric approach enables to analyze fur-
ther partition distances obtained by replacing the size
with alternative partition functions such as entropy,
rank and logical entropy, where these latter two are
submodular. In general, any symmetric and order-
preserving partition function f provides a distance be-
tween partitions P, Q by considering f (P), f (Q) and
the values taken on their meet f (P Q) or else on
their join f (P Q). Specifically, f defines weights on
edges of the Hasse diagram of partitions such that the
corresponding partition distance between any P, Q is
the weight of a lightest P Q-path.
REFERENCES
Aigner, M. (1997). Combinatorial Theory. Springer.
Almudevar, A. and Field, C. (1999). Estimation of single-
generation sibling relationships based on DNA mark-
ers. Journal of Agricultural, Biological and Environ-
mental Statistics, 4(2):136–165.
Berger-Wolf, T. Y., Sheikh, S. I., DasGupta, B., Ashley,
M. V., Caballero, I. C., Chaovalitwongse, W., and Pu-
trevu, S. L. (2007). Reconstructing sibling relation-
ship in wild populations. Bioinf., 23(13):i49–i56.
Bollobas, B. (1986). Combinatorics. Set Systems, Hyper-
graphs, Families of Vectors, and Combinatorial Prob-
ability. Cambridge University Press.
Brøondsted, A. (1983). An introduction to convex poly-
topes. Springer.
Brown, D. G. and Dexter, D. (2012). Sibjoin: a fast heuristic
for half-sibling reconstruction. Algorithms in Bioin-
formatics, LNCS 7534:44–56.
Celeux, G., Diday, E., Govaert, G., Lechevalier, G., and
Ralambondrainy, H. (1989). Classification Automa-
tique Des Donn
´
ees. Dunod.
Day, W. (1981). The complexity of computing metric dis-
tances between partitions. Math. Soc. Sc., 1(3):269–
287.
Deza, M. M. and Deza, E. (2013). Encyclopedia of Dis-
tances - Second Edition. Springer.
Ellerman, D. (2013a). An introduction to logical entropy
and its relation to Shannon entropy. International
Journal of Semantic Computing, 7(2):121–145.
Ellerman, D. (2013b). An introduction to partition logic.
Logic Journal of the IGPL, 22(1):94–125.
Godsil, C. and Royle, G. F. (2001). Algebraic Graph The-
ory. Springer.
Graham, R., Knuth, D., and Patashnik, O. (1994). Concrete
Mathematics. Addison-Wesley.
Gr
¨
unbaum, B. (2001). Convex Polytopes. Springer.
Gusfield, D. (2002). Partition-distance: A problem and
class of perfect graphs arising in clustering. Informa-
tion Processing Letters, 82:159–164.
Hubert, L. and Arabie, P. (1985). Comparing partitions.
Journal of Classification, 2(1):193–218.
Konovalov, D. A. (2006). Accuracy of four heuristics for the
full sibship reconstruction problem in the presence of
genotype errors. Adv. Bioinf. Comp. Bio., 3:7–16.
Konovalov, D. A., Bajema, N., and Litow, B. (2005a). Mod-
ified Simpson O(n
3
) algorithm for the full sibship re-
construction problem. Bioinf., 21(20):3912–3917.
Konovalov, D. A., Litow, B., and Bajema, N. (2005b).
Partition-distance via the assignment problem. Bioinf.,
21(10):2463–2468.
Korte, B. and Vygen, J. (2002). Combinatorial Optimiza-
tion: Theory and Algorithms (2nd edition). Springer.
Lerman, I. C. (1981). Classification et Analyse Ordinale
des Donn
´
ees. Dunod.
Meila, M. (2007). Comparing clusterings - an information
based distance. J. of Mult. Ananysis, 98(5):873–895.
Mirkin, B. G. (1996). Mathematical Classification and
Clustering. Kluwer Academic Press.
Mirkin, B. G. and Cherny, L. B. (1970). Measurement of
the distance between distinct partitions of a finite set
of objects. Aut. and Rem. Con., 31(5):786–792.
Mirkin, B. G. and Muchnik, I. (2008). Some topics of cur-
rent interest in clustering: Russian approaches 1960-
1985. Electronic Journal for History of Probability
and Statistics, 4(2):1–12.
Pinto Da Costa, J. F. and Rao, P. R. (2004). Central parti-
tion for a partition-distance and strong pattern graph.
REVSTAT - Statistical Journal, 2(2):127–143.
R
´
enier, S. (1965). Sur quelques aspects math
´
ematiques des
probl
´
emes de classification automatique. ICC Bul-
letin, 4:175–191. Reprinted in Math
´
ematiques et Sci-
ences Humaines 82:13-29, 1983.
Rossi, G. (2011). Partition distances. arXiv:1106.4579v1.
Rota, G.-C. (1964a). The number of partitions of a set.
American Mathematical Monthly, 71:499–504.
Rota, G.-C. (1964b). On the foundations of combinatorial
theory I: theory of M
¨
obius functions. Z. Wahrschein-
lichkeitsrechnung u. verw. Geb., 2:340–368.
Seb
˝
o, A. and Tannier, E. (2004). On metric generators of
graphs. Math. of Op. Res., 29(2):383–393.
Sheikh, S. I., Berger-Wolf, T. Y., Khokhar, A. A., Caballero,
I. C., Ashley, M. V., Chaovalitwongse, W., Chou,
C.-A., and DasGupta, B. (2010). Combinatorial re-
construction of half-sibling groups from microsatellite
data. J. Bioinf. Comp. Biol., 8(2):337–356.
Stanley, R. (1971). Modular elements of geometric lattices.
Algebra Universalis, (1):214–217.
Stern, M. (1999). Semimodular Lattices. Theory and Appli-
cations. Encyclopedia of Mathematics and its Appli-
cations 73. Cambridge University Press.
Warrens, M. J. (2008). On the equivalence of Chen’s Kappa
and the Hubert-Arabie adjusted Rand index. Journal
of Classification, 25(1):177–183.
Whitney, H. (1935). On the abstract properties of linear
dependence. Amer. J. of Math., 57:509–533.
ICPRAM 2016 - International Conference on Pattern Recognition Applications and Methods
308