Metrics for Clustering Comparison in Bioinformatics

Giovanni Rossi

Department of Computer Science and Engineering (DISI), University of Bologna

Mura Anteo Zamboni 7, Bologna, 40126, Italy

Keywords:

Hamming Distance, Partition Lattice, Hasse Diagram, Weighted Graph, Geodesic Distance, Path.

Abstract:

Developing from a concern in bioinformatics, this work analyses alternative metrics between partitions. From

both theoretical and applicative perspectives, a useful and interesting distance between any two partitions is

HD, which counts the number of atoms ﬁner than either one but not both. While faithfully reproducing the

traditional Hamming distance between subsets, HD is very sensible and computable through scalar products

between Boolean vectors. It properly deals with complements and axiomatically resembles the entropy-based

variation of information VI distance. Entire families of metrics (including HD and VI) obtain as minimal

paths in the weighted graph given by the Hasse diagram: submodular weighting functions yield path-based

distances visiting the join (of any two partitions), whereas supermodularity leads to visit the meet. This yields

an exact (rather than heuristic) approach to the consensus partition (combinatorial optimization) problem.

1 INTRODUCTION

Partitions or clusterings are key instruments in a vari-

ety of ﬁelds at the interface of computer science, ar-

tiﬁcial intelligence and engineering, including pattern

recognition/learning, web mining and bioinformatics.

Quantitative clustering comparison is essential for as-

sessing the proximity between and superiority among

diverse partitions of a given set (Meila, 2007; Pinto

Da Costa and Rao, 2004).

In bioinformatics, measuring the distance between

clusterings of populations, either natural or experi-

mental, is fundamental for sibling relationship recon-

struction. Apparently, attention has been placed for

the most on a unique distance measure, here denoted

by MMD, which relies on maximum matching (Kono-

valov, 2006; Berger-Wolf et al., 2007; Sheikh et al.,

2010; Konovalov et al., 2005b; Konovalov et al.,

2005a). After its appearance (Almudevar and Field,

1999), MMD was shown (Gusﬁeld, 2002) to be com-

putable via the assignment problem (Korte and Vy-

gen, 2002, p. 236). Another partition distance re-

cently tested in this setting (Brown and Dexter, 2012)

is the variation of information VI, obtained axiomati-

cally from information theory (Meila, 2007).

In this work, the distance between partitions is

measured in quite different ways, since the aim is to

have consistency and generalizations in terms of lat-

tice theory. The primary objective is to reproduce

the traditional Hamming distance between subsets,

given by the cardinality of their symmetric differ-

ence (Bollobas, 1986, p. 3). Such a benchmark is

extended from Boolean to geometric lattices by fo-

cusing on atoms and join-decompositions of lattice

elements (Aigner, 1997; Stern, 1999). While every

subset admits a unique such a decomposition, involv-

ing a number of atoms equal to the cardinality of

the subset, a generic partition admits different join-

decompositions, most of which redundant. The num-

ber of atoms involved in the unique maximal join-

decomposition of a partition is here deﬁned to be the

size of that partition. The size is an integer-valued lat-

tice function, like the rank. In fact, the two coincide

for Boolean lattices but differ crucially for geometric

lattices. Roughly speaking, replacing the rank with

the size yields the Hamming distance HD between

partitions proposed below. While achieving combi-

natorial congruency, HD shares with VI important

characterizing axioms and is computed through sim-

ple scalar products between Boolean vectors, avoid-

ing any algorithmic issue. Finally, HD also has a large

range which provides ﬁne measurement sensitivity.

The traditional Hamming distance between two

subsets is also the length of any shortest path between

them in the associated Hasse diagram, which is the

unit hypercube. This latter is a graph with subsets as

vertices and edges linking any two subsets whenever

one covers the other (in terms of set inclusion, see

(Bollobas, 1986; Godsil and Royle, 2001) and below).

In order to have an analog for the Hamming distance

between partitions deﬁned here, it is necessary to look

at the lattice of partitions of a n-set as the polygon ma-

Rossi, G.

Metrics for Clustering Comparison in Bioinformatics.

DOI: 10.5220/0005707102990308

In Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2016), pages 299-308

ISBN: 978-989-758-173-1

299

troid of the complete graph K

on n vertices (Aigner,

1997, pp. 259, 274). In other terms, partitions of a

n-set may well be regarded as those graphs on n ver-

tices each of whose components is complete (i.e. a

clique). In this way, the partition lattice is seen to

be strictly included in the





-dimensional unit hyper-

cube: the set {0, 1}

(

)

of hypercube vertices identiﬁes

the 2

(

)

distinct graphs on n vertices, while linear de-

pendence (Whitney, 1935) entails that partitions only

span B

< 2

(

)

hypercube vertices, where Bell num-

ber B

is the number of partitions of a n-set (Rota,

1964a; Graham et al., 1994). The convex hull of

these B

vertices identiﬁes a polytope, and the graph

of this polytope (Gr

unbaum, 2001, pp. 212-16) is in

fact the Hasse diagram of the partition lattice. Yet,

while the covering relation between subsets assigns a

unitary weight to each edge of the hypercube (Seb

and Tannier, 2004, p. 384), edges of the polytope

of partitions must be weighted through the size (see

above), as this latter quantiﬁes precisely the number

of hypercube edges that collapse into a unique edge

of the polytope. With such a weighting, the Ham-

ming distance between two partitions (like between

two subsets) quantiﬁes the minimum weight of a path

connecting them.

The approach allows for generalizations in that

the size may be replaced with any alternative order-

preserving lattice function, such as the rank (or the

entropy of partitions). Then, polytope edges have

weights obtained as the difference between the greater

and the smaller value taken by the chosen order-

preserving function on the associated endpoints. Ac-

cordingly, the distance between two lattice elements

is the minimum weight of a path connecting them.

2 DISTANCES, LATTICES AND

GRAPHS

For a ﬁnite set N = {1, . . . , n}, let (2

, ∩, ∪) and

, ∧, ∨) be the associated subset and partition lat-

tices, ordered by inclusion ⊇ and coarsening >, re-

spectively. Both are atomic and atomistic; the fomer

is distributive while the latter is geometric (Aigner,

1997; Stern, 1999). A graph G = (V,E) consists of a

vertex set V = {v

, . . . , v

} and an edge set E ⊆ V

in-

cluded in the





-set V

:= {{v

, v

} : 1 ≤ i < j ≤ m}

of unordered pairs of vertices. The complete graph on

m vertices (see above) is K

= (V,V

). The Hamming

distance HD(P, Q) between partitions P, Q ∈ P

pro-

posed here aims at reproducing the traditional Ham-

ming distance |A∆B| between subsets A, B ∈ 2

while

keeping into account that partitions of a n-set N are in

fact graphs with vertex set V = N and whose compo-

nents are each a complete subgraph (Aigner, 1997).

Distances within a ordered set must be measured

via the order relation, while distances between el-

ements of any set are called ‘Hamming distances’

when such elements are represented as arrays and the

distance between two of them is the number of entries

where their array representations differ. The Ham-

ming distance between subsets A, B ∈ 2

|A∆B| = |A\B| + |B\A| = r(A ∪ B) − r(A ∩ B), (1)

r : 2

→ Z

being the rank function: r(A) = |A|. It

counts how many i ∈ N are included in either A or

else B, but not in both. Elements i ∈ N, or 1-cardinal

subsets {i} ∈ 2

, are the atoms of lattice (2

, ∩, ∪),

and (1) is a Hamming distance since subsets A ∈ 2

are represented as Boolean n-vectors χ

∈ {0, 1}

vertices of the n-dimensional unit hypercube [0, 1]

This achieves through their characteristic function

: N → {0, 1}, deﬁned by χ

(i) = 1 if i ∈ A and

(i) = 0 if i ∈ N\A. Thus the distance between any

A, B ∈ 2

is the number |A∆B| of entries where χ

and

differ (Bollobas, 1986; Aigner, 1997).

A partition P = {A

, . . . , A

|P|

} ⊂ 2

of N is a col-

lection of pairwise disjoint subsets, called blocks (or

clusters), whose union yields N. Any subset A ∈ 2

has a unique complement A

= N\A. For all parti-

tions P ∈ P

and all non-empty subsets

0 ⊂ A ⊆ N,

let P

= {B ∩ A : B ∈ P,

0 6= B ∩ A} denote the parti-

tion of A induced by P. Two non-Hamming distances

between partitions are now brieﬂy introduced. Max-

imum matching distance MMD(P, Q) between two

partitions P, Q ∈ P

MMD(P, Q) = min{|A

| :

0 ⊂ A ⊆ N, P

= Q

}. (2)

This is the minimum number of elements i ∈ N that

must be deleted in order for the two residual induced

partitions to coincide. Also, MMD(P, Q) “is the mini-

mum number of elements that must be moved between

clusters of P so that the resulting partition equals

Q” (Gusﬁeld, 2002, p. 160). It is computable as

a maximum matching or assignment problem (Korte

and Vygen, 2002, chapter 11). In a graph a match-

ing is a set of pairwise disjoint edges, i.e. the end-

points are all different vertices. Now consider the bi-

partite graph G = (P ∪ Q, E) with |P| + |Q| vertices,

one for each block of each partition, and join any two

of them A ∈ P and B ∈ Q with an edge {A, B} ∈ E if

A ∩ B 6=

0. In addition, let |A ∩ B| be the weight of

the edge. Then, determining MMD(P, Q) amounts to

ﬁnd a maximum-weight matching E

∗

in G, that is one

where the sum

∑

(A,B)∈E

∗

|A ∩ B| of edge weights is

maximal. In fact, the minimum number MMD(P, Q)

of elements that must be removed for the two residual

ICPRAM 2016 - International Conference on Pattern Recognition Applications and Methods

300

partitions to coincide is the sum

∑

(A,B)∈E

∗

|A∆B| over

all selected edges of the cardinality of the symmetric

difference between the associated endpoints.

Another important measure of the distance be-

tween two partitions P, Q is the variation of informa-

tion V I(P, Q), obtained axiomatically from informa-

tion theory (see (Meila, 2007, expressions (15)-(22),

pages 879-80)). Entropy e(P) = −

∑

A∈P

|A|

log



|A|



of a partition P (binary logarithm) enables to measure

the distance between P, Q ∈ P

V I(P, Q) = 2e(P ∧ Q) − e(P) − e(Q), (3)

where P ∧ Q is the coarsest partition ﬁner than both

P and Q (while ∨ is the ‘ﬁnest-coarser-than’ op-

erator). Notice that while the range of MMD is

{0, 1, . . . , n − 1} ⊂ Z

, VI ranges in a ﬁnite subset of

interval [0, log n] ⊂ R

Apart from MMD and VI, there exist several other

partition distance measures (see (Deza and Deza,

2013, sections 10.2 and 10.3, pp. 191-193) and

(Day, 1981; Hubert and Arabie, 1985; Warrens, 2008;

Mirkin, 1996)). One was proposed as the Ham-

ming distance between (matrices representing) parti-

tions (Meila, 2007; Mirkin and Cherny, 1970; Mirkin

and Muchnik, 2008), and thus shall be brieﬂy dis-

tinguished from the object of this paper. A bi-

nary relation R on N is a subset R ⊆ N × N of

ordered pairs (i, j) of elements i, j ∈ N. The col-

lection of all such binary relations is subset lattice

N×N

, ∩, ∪). If symmetry (i, j) ∈ R ⇒ ( j, i) ∈ R

and transitivity (i, j), ( j, h) ∈ R ⇒ (i, h) ∈ R hold,

then R is an equivalence relation, or a partition of

N into equivalence classes: maximal subsets A ∈ 2

such that (i, j),( j, i) ∈ R for all i, j ∈ A are pre-

cisely its blocks. A binary relation R is represented

by a matrix M

∈ {0, 1}

n×n

with entries M

i j

= 1

if (i, j) ∈ R and M

i j

= 0 if (i, j) 6∈ R . Let two

equivalence relations R , R

have associated partitions

P, P

and matrices M

, M

. The distance d(R , R

)

between subsets R , R

∈ 2

N×N

can be computed as

d(R , R

) = |R ∆R

| = |R ∪ R

| − |R ∩ R

|. This

is the number of 1s in matrix M

R ∆R

= M

+ M

modulo 2 (see (Aigner, 1997, p. 338)). While pro-

viding a distance between partitions P and P

, this

is a Hamming distance between certain subsets that

correspond to partitions only in quite special cases,

as lattice (2

N×N

, ∩, ∪) contains 2

− B

elements, or

binary relations, that do not correspond to partitions,

or equivalence relations. The argument also applies

when partitions are represented as Boolean n × n-

matrices through the complement of equivalence re-

lations, namely apartness relations R

= (N × N)\R

(Ellerman, 2013b; Ellerman, 2013a). This is detailed

in subsection 4.1 below.

However regarded, partition lattice (P

, ∧, ∨) is

compressed into a larger subset lattice, with which

some elements are shared while some others are not.

This feature is maintained even when partitions are

decomposed as joins of atoms, for they generally

admit several such decompositions. Nevertheless,

when regarded from this perspective partition lat-

tice (P

, ∧, ∨) is seen to be included in subset lat-

tice (2

, ∩, ∪), with the two sharing the same





atoms. In fact, (2

, ∩, ∪) is the minimal subset lat-

tice including the partition lattice. Accordingly, the

Hamming distance between partitions HD proposed

below relies precisely on representing partitions as

Boolean





-vectors, although only B

< 2

(

)

distinct

such vectors correspond to partitions. In particular,

HD is the traditional Hamming distance |E∆E

| be-

tween edge sets E, E

∈ 2

, with these latter cor-

responding to partitions only when in both graphs

G = (N, E), G

= (N, E

) each component is a com-

plete subgraph.

3 HAMMING DISTANCE

The rank function r : P

→ Z

for partitions is r(P) =

n − |P|, where P

⊥

= {{1}, . . . , {n}} is the bottom ele-

ment: r(P

⊥

) = 0. Atoms are immediately above, with

rank 1, in the associated Hasse diagram. This latter

is ordered by coarsening >, with coarser partitions in

upper levels (Meila, 2007; Aigner, 1997; Stern, 1999)

and P > Q meaning that every block of Q is included

in some block of P. Hence atoms are those partitions

consisting of n − 1 blocks, namely n − 2 singletons

and one pair. These





unordered pairs {i, j} ∈ N

are the same atoms as in subset lattice (2

, ∩, ∪). For

notational convenience, let [i j] ∈ P

denote the atom

where the unique 2-cardinal block is (unordered) pair

{i, j} ∈ [i j].

Consider χ

∈ {0, 1}

as the n-vector with all en-

tries equal to 1 and denote by hx, yi the scalar prod-

uct between x and y. The Hamming distance between

subsets A, B ∈ 2

is |A∆B| = |A| + |B| − 2|A ∩ B| =

= hχ

, χ

i + hχ

, χ

i − 2hχ

, χ

i. (4)

In order to replace subsets A with partitions P, let

(1)

= {[i j] : 1 ≤ i < j ≤ n} be the





-set of atoms

of the partition lattice, i.e. P

(1)

∼ N

. The ana-

log of characteristic function χ

is indicator function

: P

(1)

→ {0, 1} deﬁned, for P ∈ P

, [i j] ∈ P

(1)

, by

([i j]) =



1 if P > [i j]

0 if P 6> [i j]

. In words, if pair {i, j} is

included in some block A of P (i.e. {i, j} ⊆ A ∈ P),

Metrics for Clustering Comparison in Bioinformatics

301

then partition P is coarser than atom [i j], and the

corresponding position I

([i j]) of indicator array I

has entry 1. Otherwise, I

([i j]) equals 0. Top el-

ement P

= {N} for partitions yields I

, i.e. the





-vector with all entries equal to 1. The number

s(P) = |{[i j] : [i j] 6 P}| of atoms ﬁner than P is

(Rossi, 2011) the size s : P

→ Z

, that is

s(P) =

∑

A∈P



|A|



= hI

, I

i. (5)

While the cardinality of subsets |A| = hχ

, χ

i takes

every integer value between 0 and n, the size of parti-

tions s(P) = hI

, I

i does not the same between 0 and





. Minimally, this may be seen for N = {1, 2, 3},

as the B

= 5 partitions are the ﬁnest {{1}, {2}, {3}}

and coarsest {1, 2, 3} ones, together with





= 3

atoms: [12] = {{1, 2}, {3}}, [13] = {{1, 3}, {2}} and

[23] = {{2, 3}, {1}}. Thus, there is no partition with

size equal to 2, in that {1, 2, 3} = [12] ∨ [13] ∨ [23]

and {1, 2, 3} = [12] ∨ [23] = [12] ∨ [13] = [13] ∨ [23].

Available sizes for 1 ≤ n ≤ 7 are in Table I below.

Table 1: Available sizes of partitions of a n-set, 1 ≤ n ≤ 7.

|N| = n {s(P) : P ∈ P

} Available sizes

1 {0}

2 {0, 1}

3 {0, 1, 3}

4 {0, 1, 2, 3, 6}

5 {0, 1, 2, 3, 4, 6, 10}

6 {0, 1, 2, 3, 4, 6, 7, 10, 15}

7 {0, 1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 15, 21}

In atomistic lattices, such as 2

, P

and 2

, ev-

ery element admits a decomposition as a join of atoms

(Aigner, 1997; Stern, 1999). While subsets A ∈ 2

and E ∈ 2

admit a unique such a decomposition,

namely A = ∪

i∈A

{i} and E = ∪

{i, j}∈E

{i, j}, partitions

generally admit several such decompositions. For

n = 3 as above, the coarsest partition {1, 2, 3} de-

composes either as the join of any two atoms, or else

as the join of all the three available atoms at once.

In particular, the rank r(P) = n − |P| of any parti-

tion P is the minimum number of atoms involved

in a join-decomposition of P, while the size s(P) is

the maximum number of atoms involved in a join-

decomposition of P. The coarsest partition {1, 2, 3}

of a 3-cardinal set has rank r({1, 2, 3}) = 3 − 1 = 2

and size s({1, 2, 3}) = 3 =





Proposition 1. The size is strictly monotone: for all

P, Q ∈ P

, if P > Q, P 6= Q, then s(P) > s(Q).

Proof. If P > Q (i.e. P > Q, P 6= Q), then every block

A ∈ P is the union of some blocks B

, . . . , B

∈ Q,

with |Q

| > 1 for at least one block A ∈ P. Recall

that Q

is the partition of A induced by Q (see the

deﬁnition of MMD in section 2 above). The union of

any two such blocks B, B

∈ Q increases the size by



|B|+|B



− [



|B|







] = |B||B

|, which is strictly

positive as blocks are non-empty.

In order to reproduce (1) and (4) above, Hamming

distance HD : P

× P

→ Z

has to count the num-

ber of atoms ﬁner than either one of any two partitions

P, Q ∈ P

but not ﬁner than both, that is HD(P, Q) =

= |{[i j] : P > [i j] 66 Q}| + |{[i j] : P 6> [i j] 6 Q}|. In

view of (5), HD(P, Q) = s

+ s

− 2s(P ∧ Q) =

= hI

, I

i + hI

, I

i − 2hI

, I

i. (6)

Also P ∧ Q = ∨

P>[i j]6Q

[i j] is the maximal decomposi-

tion of P ∧ Q as a join of atoms, thus

HD(P, Q) = hI

, I

i + hI

, I

i − 2hI

P∧Q

, I

3.1 HD and VI: Axioms

Following (Meila, 2007), attention is now placed on

those axioms that characterize both partition distance

measures HD and VI.

Proposition 2. HD is a metric: for all P, P

, Q ∈ P

1. HD(P, Q) = HD(Q, P),

2. HD(P, Q) ≥ 0, with equality if and only if P = Q,

3. HD(P, P

) + HD(P

, Q) ≥ HD(P, Q).

Proof. The ﬁrst condition is obvious. In view of

proposition 1, the second one is also immediate as

min{s(P), s(Q)} ≥ s(P ∧ Q). In fact, HD(P, Q) is the

sum [s(P)−s(P ∧Q)] + [s(Q)− s(P ∧Q)] of two pos-

itive integers, while min

P6=Q

HD(P, Q) = 1 = min

P∈P

s(P).

As for the third condition, known as triangle inequal-

ity, difference HD(P, P

) + HD(P

, Q) − HD(P, Q) =

= 2[s(P

) −s(P ∧ P

) −s(P

∧ Q)+ s(P ∧Q)] must be

shown to be ≥ 0 for all P, P

, Q ∈ P

. Since size

s(P ∧ Q) is given, s(P

) − [s(P ∧ P

) + s(P

∧ Q)] has

to be minimized by suitably choosing P

. Now, sum

s(P ∧ P

) + s(P

∧ Q) is maximized when P ∧ P

= P

(or P

> P) and P

∧Q = Q (or P

> Q). If Q 6 P

> P,

then P

= P∨Q minimizes the whole difference. Thus

HD satisﬁes triangle inequality as long as the size is

supermodular: s(P ∨ Q)− s(P) −s(Q) + s(P ∧ Q) ≥ 0

for all P, Q ∈ P

. The simplest way to see that this

is indeed the case is by focusing on M

obius inver-

sion of lattice (or more generally poset) functions

(Rota, 1964b; Aigner, 1997; Stern, 1999). By def-

inition, the size s : P

→ Z

has M

obius inversion

: P

→ Z given by µ

(P) = 1 if P is an atom (i.e.

P = [i j] ∈ P

(1)

), and µ

(P) = 0 otherwise. In fact,

ICPRAM 2016 - International Conference on Pattern Recognition Applications and Methods

302

s(P) =

∑

Q6P

(Q) for all P ∈ P

. Hence the size

satisﬁes a condition which is sufﬁcient (but not nec-

essary) for supermodularity, as its M

obius inversion

takes only values ≥ 0. This completes the proof.

Triangle inequality is satisﬁed with equality by

HD (and VI) when P

= P ∧ Q (for VI, see (Meila,

2007, pp. 883, 888) properties 6, 10(A.2)).

Proposition 3. HD satisﬁes horizontal collinearity:

HD(P, P ∧ Q) + HD(P ∧ Q, Q) = HD(P, Q).

Proof. HD(P, P ∧ Q) + HD(P ∧Q, Q) =

= [s(P) − s(P ∧ Q)] + [s(Q)− s(P ∧ Q)] as well as

HD(P, Q) = s(P) +s(Q) − 2s(P ∧ Q).

Collinearity also applies to distances between par-

titions P, Q that are comparable, i.e. either P > Q or

Q > P. Firstly consider the case involving the top P

and bottom P

⊥

elements (for VI, see (Meila, 2007, p.

888) property 10(A.1)).

Proposition 4. HD satisﬁes vertical collinearity:

HD(P

⊥

, P) + HD(P, P

) = HD(P

⊥

, P

Proof. HD(P

⊥

, P) + HD(P, P

) =

= s(P) + s(P

) − s(P) = s(P

) independently

from P, as well as HD(P

⊥

, P

) = s(P

) =





Vertical collinearity may be generalized for arbi-

trary comparable partitions P

> P > Q > P

⊥

, in that

HD(Q, P

) + HD(P

, P) = HD(Q, P) for all P

satis-

fying Q 6 P

6 P.

3.2 Complementation

The distance between the bottom and top elements

considered by vertical collinearity leads to regard

such lattice elements as complements, thereby focus-

ing on the distance between other, generic comple-

ments. Maintaining the traditional Hamming distance

between subsets as the fundamental benchmark, it

must be taken into account that the subset and parti-

tion lattices are very different in terms of complemen-

tation. Every subset A ∈ 2

has a unique complement

, and the distance between any two such comple-

ments equals the distance between the bottom and top

elements: |A∆A

| = n = |N∆

0|. Conversely, partitions

P generally have several and quite different comple-

ments, which are all those Q such that P ∧ Q = P

⊥

as well as P ∨ Q = P

. In this respect, MMD mea-

sures the distance between any two complements P, Q

solely through their cardinalities |P|, |Q|, while VI

and HD provide a ﬁne distinction between different

complements, and also agree on which are closer and

which are remoter. The issue may be exempliﬁed as

follows: for N = {1, . . . , 7}, consider P = 123|456|7

and P

∗

= 147|2|3|5|6 and P

∗

= 1|2|34|5|67 (where

vertical bar | separates blocks). Both P

∗

and P

∗

are

complements of P, that is P ∧ P

∗

= P ∧ P

∗

= P

⊥

and

P ∨ P

∗

= P ∨ P

∗

= P

. Here VI, HD and MMD are:

V I(P, P

∗

) ' 1.93 < 1.95 ' V I(P, P

∗

HD(P, P

∗

) = 8 < 9 = HD(P, P

∗

MMD(P, P

∗

) = 4 = MMD(P, P

∗

For MMD this examples generalizes as follows.

Proposition 5. For any two complements P, Q ∈ P

it holds MMD(P, Q) = max{r(P), r(Q)}.

Proof. If P ∧ Q = P

⊥

, then edges {A, B} ∈ E ⊂ P ×Q

of the bipartite graph G = (P ∪ Q, E) deﬁned in sec-

tion 2 have all same weight 1 = |A ∩ B| (see above).

Hence, a maximum-weight matching simply is one

including the maximum number of feasible edges. In

turn, such a number equals

∑

A∈P∨Q

min{|P

|, |Q

|},

because each block (of either partition) can be the

endpoint of at most one edge included in a match-

ing. Also, the number of elements i ∈ N that must be

deleted for the two residual partitions to coincide is

∑

A∈P∨Q

(|A| − min{|P

|, |Q

|}). On the other hand,

P ∨ Q = P

entails

∑

A∈P∨Q

(|A| − min{|P

|, |Q

|}) =

= n − min{|P|, |Q|} = max{r(P), r(Q)}.

The class c : P

→ Z

of partitions (Rota, 1964b)

identiﬁes the vector c(P) = (c

(P), . . . , c

(P)) where

(P) is the number of k-cardinal blocks of P, for

1 ≤ k ≤ n. As shown by the above example, a par-

tition generally has different complements with dif-

ferent classes. In this view, for all P ∈ P

denote by

C O(P) = {Q : P ∧ Q = P

⊥

, P ∨ Q = P

} the set of

complements of P.

A modular element of the partition lattice (Aigner,

1997; Stern, 1999; Stanley, 1971) is any P ∈ P

where all blocks are singletons apart from only one,

at most, i.e.

∑

1<k≤n

(P) ≤ 1. The sublattice P

mod

of modular elements contains the bottom and top el-

ements, and all partitions of the form {A} ∪ P

⊥

with

1 < |A| < n. Hence, |P

mod

| = 2

−n, while P

mod

= P

for n ≤ 3 and P

mod

⊂ P

for n > 3.

Here, the main link between modular elements

and complementation is that an element is modular

if and only if no two of its complements are compa-

rable (see (Stanley, 1971, Theorem 1)). Therefore,

if P 6∈ P

mod

, then there are Q, Q

∈ C O(P) such that

Q > Q

. It seems thus important that the distance be-

tween P and Q differs from the distance between P

and Q

. The following result bounds the Hamming

distance HD between a partition and its complements.

Proposition 6. For all P ∈ P

, if Q ∈ C O(P), then

s(P) + |P| − 1 ≤ HD(P, Q) ≤ s(P) +



|P|



, where the

upper bound is always tight, while the lower one is

tight only if c

(P) ≤ 2 +

∑

1<k≤n

(k − 2)c

(P).

Metrics for Clustering Comparison in Bioinformatics

303

Proof. If Q ∈ C O(P), then HD(P, Q) = s(P) + s(Q).

Hence s(P) + min{s(Q) : Q ∈ CO(P)} ≤ HD(P, Q)

and HD(P, Q) ≤ s(P) + max{s(Q) : Q ∈ C O(P)}. A

complement of P has join-decompositions minimally

involving |P| − 1 atoms [i j]

, . . . , [i j]

|P|−1

∈ P

(1)

, with

∩ {i, j}

| = 1 = |A

m+1

∩ {i, j}

|, 1 ≤ m < |P|.

Considering the upper bound ﬁrst, observe that size

s([i j]

∨ ··· ∨ [i j]

|P|−1

) attains its maximum when

|{i, j}

∩ {i, j}

m+1

| = 1 for all 1 ≤ m < |P| − 1, in

which case s([i j]

∨ · · · ∨ [i j]

) =



m+1



, 1 ≤ m < |P|.

This complement P

∗

= [i j]

∨· · ·∨[i j]

|P|−1

always ex-

ists, whatever the class c(P) of P, making the bound

tight. In fact, P

∗

∈ P

mod

has n − |P| + 1 blocks, out of

which n − |P| are singletons, while the remaining one

B ∈ P

∗

is |P|-cardinal and satisﬁes |B ∩ A| = 1 for all

A ∈ P, i.e. P

∗

= {B} ∪ P

⊥

⇒ s(P

∗

) =



|P|



. For the

lower bound, note that size s([i j]

∨ ·· · ∨ [i j]

|P|−1

) at-

tains its minimum, ideally, when {i, j}

∩{i, j}

1 ≤ m < m

< |P|, i.e. s([i j]

∨ ·· · ∨ [i j]

) = m for all

1 ≤ m < |P|. This is not always possible as each A ∈ P

can have non-empty intersection with a number of

pair-wise disjoint pairs {i, j}

, 1 ≤ m < |P| bounded

above by |A|, entailing that the constraint is given by

the number c

(P) of singletons {i} ∈ P. Speciﬁcally,

nesting together

∑

1<k≤n

(P) non-singleton blocks

requires

∑

1<k≤n

(P) − 1 pairs {i, j}

. If these lat-

ter have to be pair-wise disjoint, then the maximum

number of elements j ∈ N in non-singleton blocks

available to match (into pair-wise disjoint pairs) those

elements {i} ∈ P in singleton blocks is precisely

∑

1<k≤n

(P) − 2



∑

1<k≤n

(P) − 1



Proposition 7. If 2 +

∑

1<k≤n

(k − 2)c

(P) < c

(P),

then min

∗

∈C O(P)

s(P

∗

) = (n − θ(P)b

θ(P)



θ(P)



+[θ(P)(b

θ(P)

c + 1) − n]



θ(P)



where θ(P) = 1 +

∑

1<k≤n

(P)(k − 1).

Proof. If 2 +

∑

1<k≤n

(k − 2)c

(P) < c

(P), then the

above proof of proposition 6 entails that the maximum

number max{|Q| : Q ∈ C O(P)} of blocks of a com-

plement of P is θ(P). Among θ(P)-cardinal partitions

∗

, the size is minimized when |B| ∈ {b

θ(P)

c, d

θ(P)

for all B ∈ P

∗

, where the number of b

θ(P)

c-cardinal

blocks is θ(P)(b

θ(P)

c + 1) − n, while the number of

θ(P)

e-cardinal blocks is n − θ(P)b

θ(P)

Proposition 8. Among complements Q ∈ C O(P),

HD and VI have common minimizers and maximiz-

ers, i.e. arg min

Q∈C O(P)

HD(P, Q) = argmin

Q∈C O(P)

V I(P, Q), and

argmax

Q∈C O(P)

HD(P, Q) = argmax

Q∈C O(P)

V I(P, Q).

Proof. If Q ∈ C O(P), then V I(P, Q) is minimized or

maximized when e(Q) is, respectively, maximized

or minimized, as V I(P, Q) = 2 log n − e(P) − e(Q).

Given this, if P ∈ P

mod

, then all Q ∈ C O(P) have same

rank. Otherwise, there are comparable complements,

i.e. with different rank (see above). Thus, in gen-

eral, among complements Q ∈ C O(P) entropy e(Q)

is minimized when |Q| is minimized and, in addition,

Q ∈ P

mod

. This is precisely where size s(Q) is max-

imized. Similarly, e(Q) is maximized when |Q| is

maximized and, in addition, |B| ∈ {b

|Q|

c, d

|Q|

e} for

all B ∈ Q. Again, this is where s(Q) is minimized.

4 MINIMUM-WEIGHT PATHS

Hamming distance |E∆E

| between edge sets E, E

∈

is the length of a shortest path between vertices

, χ

∈ {0, 1}

(

)

of the





-dimensional unit hyper-

cube [0, 1]

(

)

, where χ

: N

→ {0,1} is the char-

acteristic function deﬁned above in section 2, i.e.

({i, j}) = 1 if {i, j} ∈ E and 0 otherwise. Re-

call that a polytope naturally deﬁnes a graph with its

same vertices and edges (Brøondsted, 1983, p. 93),

and the hypercube is perhaps the main example of

polytope. In fact, the graph of hypercube [0, 1]

(

)

is the Hasse diagram of Boolean lattice (2

, ∩, ∪),

for its edges correspond to the covering relation, that

is to say {E, E

} is an edge of the hypercube if ei-

ther E ⊃ E

, |E| = |E

| + 1 or else the converse, i.e.

⊃ E, |E

| = |E| + 1.

Clearly, a shortest path is a minimum-weight path

as long as every edge has weight 1. This simple

observation is the starting point towards an analog

view of the Hamming distance HD between parti-

tions, namely as the weight of a minimum-weight

path in the associated Hasse diagram. To this end,

deﬁne the polytope of partitions P as the convex

hull P := conv({I

: P ∈ P

}) ⊂ [0, 1]

(

)

containing

all convex combinations of the B

Boolean vectors

deﬁned by the indicator functions of partitions. Also

denote by G = (P

, E) the graph of polytope P or,

equivalently, the Hasse diagram of partition lattice

, ∧, ∨). Edges correspond to the covering rela-

tion: {P, Q} ∈ E if either {P

: Q 6 P

6 P] = {P, Q}

or else {P

: P 6 P

6 Q} = {P, Q} (see above).

Denote this relation by P m Q ⇔ {P, Q} ∈ E. Finally,

let F ⊂ R

be the vector space of symmetric and

order-preserving partition functions f : P

→ R, that

is to say, respectively, for all P, Q ∈ P

(a) c(P) = c(Q) ⇒ f (P) = f (Q), and

(b) P > Q ⇒ f (P) > f (Q) or else

(b’) P > Q ⇒ f (P) < f (Q).

ICPRAM 2016 - International Conference on Pattern Recognition Applications and Methods

304

Entropy, rank and size e, r, s : P

→ R are in F, with

e satisfying (b’) and r, s satisfying (b). For f ∈ F, let

weights w

: E → R

on edges {P, Q} ∈ E of G be

({P, Q}) = max{ f (P), f (Q)} − min{ f (P), f (Q)}.

For P, Q ∈ P

, let Path(P, Q) contain all P − Q-paths

p(P, Q) in graph G. Any p(P, Q) ∈ Path(P, Q) is a

subgraph p(P, Q) = (V

P,Q

, E

P,Q

) ⊂ G with vertex set

P,Q

= {P = P

, P

, . . . , P

= Q} and edge set

P,Q

= {{P

, Q

}, {P

, Q

}, . . . , {P

m−1

, Q

m−1

}},

where P

k+1

= Q

, 0 ≤ k < m. Graph G is connected,

entailing Path(P, Q) 6=

0 for all P, Q ∈ P

. The weight

of p(P, Q) is w

(p(P, Q)) =

∑

0≤k<m

({P

, Q

}).

Deﬁnition 9. For f ∈ F, minimum- f -weight partition

distance δ

: P

× P

→ R

(P, Q) := min

p(P,Q)∈Path(P,Q)

(p(P, Q)). (7)

Proposition 10. For f ∈ F and P, Q ∈ P

, every

minimum- f -weight P − Q-path visits P ∧ Q or P ∨ Q

or both, i.e. V

P,Q

∩ {P ∧ Q, P ∨ Q} 6=

0 for all p(P, Q)

satisfying w

(p(P, Q)) = δ

(P, Q).

Proof. If P > Q, then {P ∨ Q, P ∧ Q} ⊆ V

P,Q

for all

p(P, Q) ∈ Path(P, Q), with P m Q ⇒ {P, Q} = V

P,Q

Differently, if P 6> Q 6> P, then any path p(P, Q)

visits some vertex P

comparable with both P, Q

and, in particular, satisfying either P

> P, Q or else

P, Q > P

. Accordingly, p(P, Q) = p(P, P

) ∪ p(P

, Q),

with E

P,P

∩ E

0, for some P − P

-path p(P, P

)

and P

− Q-path p(P

, Q), entailing that w

(p(P, Q))

equals w

(p(P, P

)) + w

(p(P

, Q)). Finally, since

f is order-preserving and symmetric, P

= P ∨ Q

minimizes w

(p(P, P

)) + w

(p(P

, Q)) over all ver-

tices P

> P, Q as well as P

= P ∧ Q minimizes

(p(P, P

)) + w

(p(P

, Q)) over all P

< P, Q.

Whether a minimum- f -weight path visits the join

or else the meet of any two incomparable partitions

clearly depends on f . A generic f ∈ F may have

associated minimum-weight paths visiting the meet

of some incomparable partitions P, Q and the join of

some others P

, Q

. Whether minimum-weight paths

awlays visit the meet or else the join depends on

whether f or else − f is supermodular. Note that

if f is supermodular, then − f is submodular, i.e.

− f (P ∧ Q) − f (P ∨ Q) ≤ − f (P) − f (Q).

Proposition 11. Let f ∈ F satisfy (b), i.e. P > Q

entails f (P) > f (Q). If f is supermodular, then the

minimum- f -weight partition distance is

(P, Q) = f (P) + f (Q) − 2 f (P ∧ Q),

while if f is submodular, then the minimum- f -weight

partition distance is

(P, Q) = 2 f (P ∨ Q) − f (P) − f (Q).

Proof. Supermodularity entails

2 f (P ∨ Q) − f (P) − f (Q) ≥ f (P ∨ Q) − f (P ∧ Q)

and f (P ∨ Q)− f (P ∧Q) ≥ f (P)+ f (Q)−2 f (P ∧Q),

whereas submodularity entails

2 f (P ∨ Q) − f (P) − f (Q) ≤ f (P ∨ Q) − f (P ∧ Q)

and f (P ∨ Q)− f (P ∧Q) ≤ f (P)+ f (Q)−2 f (P ∧Q),

for all P, Q ∈ P

Since the size s is supermodular (see proposition

2 above), Hamming distance HD is the minimum-s-

weight partition distance, i.e. HD(P, Q) = δ

(P, Q) for

all P, Q ∈ P

. The rank r of partitions being submod-

ular (Aigner, 1997, pp. 259, 265, 274), minimum-r-

weight distance is δ

(P, Q) = |P| + |Q| − 2|P ∨ Q|. In

fact, w

({P, Q}) = 1 for all edges {P, Q} ∈ E, hence

is a shortest-path distance.

Turning to entropy e, a simple example shows

that VI distance does not correspond to the e-based

minimum-weight distance.

Proposition 12. There are partitions P, Q ∈ P

such

that 2e(P ∧ Q) − e(P) − e(Q) > δ

(P, Q).

Proof. For two atoms [i j], [i j

] ∈ P

(1)

, with non

empty intersection {i, j} ∩ {i, j

} = {i}, VI distance

is V I([i j], [i j

]) = 2e([i j] ∧ [i j

]) − e([i j]) − e([i j

]) =

= 2 log n − 2



logn −



, while minimum-e-

weight distance is e([i j]) + e([i j

]) − 2e([i j] ∨ [i j

]) =

= 2



logn −



− 2



logn −

log3



(3log3 − 2) = δ

([i j], [i j

]), with 3log 3 < 4 en-

tailing V I([i j], [i j

]) > δ

([i j], [i j

]).

An alternative measure of partition entropy, called

logical entropy, has been recently proposed (Eller-

man, 2013a) in terms of distinctions or ordered pairs

(i, j) ∈ N × N, hence (i, j) 6= ( j, i). If distinctions

are replaced with unordered pairs {i, j} ∈ N

, then

mutatis mutandis the non-normalized logical entropy

of partitions P is the analog of





− s(P), providing

a further minimum-weight partition distance. Also,

since in information theory partitions are evaluated

through functions f such that P > Q ⇒ f (P) < f (Q),

the approach developed thus far may be applied to the

upside-down Hasse diagram of the partition lattice,

with co-atoms in place of atoms, as detailed below.

4.1 Distinctions, Co-atoms and Fields

A partition P distinguishes between i ∈ N and j ∈ N\i

if i ∈ A ∈ P while j ∈ B ∈ P with A 6= B, and the

set of such distinctions has been recently proposed as

the logical analog of the complement of P, with the

(normalized) number of distinctions providing a novel

measure of the (logical) entropy of partitions (Eller-

man, 2013b; Ellerman, 2013a). This achieves through

apartness relations R

, which are the complement of

Metrics for Clustering Comparison in Bioinformatics

305

equivalence relations R , both being sets of ordered

pairs (i, j) ∈ N ×N (see section 2 above). In terms of

atoms [i j] ∈ P

(1)

, the logical entropy h : P

→ R

partitions (Ellerman, 2013a, p. 127) is

h(P) =

2|{[i j]:P6>[i j]}|

((

)

−s(P)

)

n(n−1)−2s(P)

with h(P

) = 0 = s(P

⊥

) and h(P

⊥

) =

n−1

2s(P

)

Proposition 13. The minimum-h-weight distance is

(P, Q) = 2h(P ∧ Q) − h(P) − h(Q).

Proof. Logical entropy h is symmetric and satisﬁes

P > Q ⇒ h(P) < h(Q), hence h ∈ F. Also, apart from

constant terms, h varies with −s, which is submodu-

lar because s is supermodular. That is to say,

h(P) +h(Q) =



n − 1 −

s(P)+s(Q)



and

h(P ∧ Q) + h(P ∨ Q) =



n − 1 −

s(P∧Q)+s(P∨Q)



Thus s(P ∧ Q) + s(P ∨ Q) ≥ s(P) + s(Q) entails

h(P ∧ Q) + h(P ∨ Q) ≤ h(P) + h(Q). Also, like in

proposition 11 above but with reversed inequalities,

2h(P ∧ Q) − h(P) − h(Q) ≤ h(P ∧ Q) − h(P ∨ Q) and

h(P ∧ Q) − h(P ∨ Q) ≤ h(P)+ h(Q) − 2h(P ∨ Q).

Reasoning in terms of ordered pairs results in a

double counting, in that (i, j) ∈ R

⇒ ( j, i) ∈ R

for all apartness relations R

and all (i, j) ∈ N × N.

Hence an analog logical entropy

h of partitions may

be deﬁned in terms of unordered pairs {i, j} ∈ N

or atoms [i j] ∈ P

(1)

h(P) =

(

)

−s(P)

(

)

= 1 −

s(P)

(

)

Again,

h ∈ F and

h(P

) = 0 as well as

h(P

⊥

) = 1.

Therefore, the minimum-

h-weight distance is

(P, Q) = 2

h(P∧Q)−

h(P)−

h(Q) for all P, Q ∈ P

On the other hand, a distance between partitions

also obtains by dealing directly with their associated

set of distinctions: let D

= {[i j] : P 6> [i j]} and con-

sider the distance between any two partitions P, Q

given by the traditional Hamming distance between

their sets of (unordered) distinctions, i.e. |D

∆D

|. In

particular, |D

∆D

| =

(2h(P ∧ Q) − h(P) − h(Q)).

In view of proposition 13 above, this is the non-

normalized minimum-h-weight distance.

A ﬁeld of subsets is a set system F ⊆ 2

which

is closed under union, intersection and complemen-

tation, hence A ∩ B, A ∪ B, A

∈ F for all A, B ∈ F .

Every partition P ∈ P

generates the ﬁeld F

:= 2

containing all subsets B ∈ 2

obtained as the union of

blocks A ∈ P, with F

⊥

= 2

as well as F

= {

0, N}.

There are 2

n−1

− 1 minimal ﬁelds that strictly include

; they are those F

= F

= {

0, A, A

, N} with

0 ⊂ A ⊂ N. On the other hand, 2-cardinal partitions

{A, A

} ∈ P

are the co-atoms (Aigner, 1997) of par-

tition lattice (P

, ∧, ∨) ordered by coarsening. In

fact, in information theory ﬁner partitions are gener-

ally more valuable than coarser ones, and thus atten-

tion is placed on partition functions f such as entropy

e or logical entropy h satisfying f (P) < f (Q) when-

ever P > Q. In this view, the partition lattice is often

dealt with as ordered by reﬁnement and thus with the

upside-down Hasse diagram. Accordingly, a distance

between partitions also obtains by counting co-atoms

rather than atoms. Deﬁne the co-size cs : P

→ Z

partitions by cs(P) = |{{A, A

} : P 6 {A, A

}}|, with

cs(P

⊥

) = 2

n−1

−1 and cs(P

) = 0. In words, cs(P) is

the number of co-atoms coarser than P.

Proposition 14. The minimum-cs-weight partition

distance is δ

(P, Q) = cs(P) + cs(Q) − 2cs(P ∨ Q).

Proof. Denote by ˆµ

: P

→ Z the M

obius inversion

from above (Rota, 1964b; Aigner, 1997) of the co-

size, with cs(P) =

∑

Q>P

ˆµ

(Q) for all P. By deﬁni-

tion, ˆµ

(P) = 1 if |P| = 2 and 0 otherwise. Like for

the size in proposition 2, this entails supermodularity,

i.e. cs(P ∧ Q) + cs(P ∨ Q) ≥ cs(P) + cs(Q). Also,

cs ∈ F with cs(P) < cs(Q) whenever P > Q, thus

cs(P) +cs(Q) − 2cs(P ∨ Q) ≤ cs(P ∧ Q) − cs(P ∨ Q),

cs(P ∧ Q) − cs(P ∨ Q) ≤ 2cs(P ∧ Q) − cs(P) − cs(Q)

for all P, Q ∈ P

Denote by (ℑ, u, t) the lattice whose elements

are the B

ﬁelds of subsets F

, P ∈ P

ordered by

inclusion ⊇. The meet and join are, respectively,

u F

= F

P∨Q

and F

t F

= F

P∧Q

. The set of

atoms is the collection {F

{A,A

}

0 ⊂ A ⊂ N} of min-

imal ﬁelds, i.e. F

= t

{A,A

}>P

{A,A

}

for all F

∈ ℑ.

Thus δ

(P, Q) is also an analog of the traditional

Hamming distance between subsets: δ

(P, Q) =

= |{{A, A

} : F

{A,A

}

⊆ F

}|+

+|{{A, A

} : F

{A,A

}

⊆ F

}|+

−2|{{A, A

} : F

{A,A

}

⊆ (F

∩ F

)}|. In words, this

is the number of minimal ﬁelds F

{A,A

}

included in

either F

or else in F

, but not in both.

5 THE CONSENSUS PARTITION

PROBLEM

Hamming distance between partitions HD ﬁrstly ap-

pears in the mid ’60s (R

enier, 1965) in terms of the

consensus partition problem, which is important in

many applicative scenarios concerned with statistical

classiﬁcation. From a combinatorial optimization per-

spective, after selecting a metric δ : P

× P

→ R

an instance is a m-collection P

, . . . , P

∈ P

, m ≥ 2,

and the objective is to ﬁnd a partition

P minimizing

the sum of its distances from the m partitions: any

satisfying

∑

1≤k≤m

δ(

P, P

) ≤

∑

1≤k≤m

δ(Q, P

) for all

ICPRAM 2016 - International Conference on Pattern Recognition Applications and Methods

306

Q ∈ P

is a consensus partition. For generic δ, ﬁnd-

ing a solution

P is tipically hard. If δ = MMD, then

each distance δ(Q, P

), 1 ≤ k ≤ m for any Q ∈ P

computable in O(n

) time (Korte and Vygen, 2002,

p. 236), whereas if δ = HD, then in view of expres-

sion (6) above (see section 3) δ(Q, P

) is computable

more rapidly through scalar products. Independently

from the chosen metric δ, the main issue is that the

size B

= |P

| of the search space P

makes all ap-

proaches relying on direct enumeration simply unvi-

able, at least for relevant values of n. The problem

is thus commonly interpreted in terms of heuristics

(Pinto Da Costa and Rao, 2004; Celeux et al., 1989).

Although the consensus problem is generally

harsh, still the analysis conducted thus far identiﬁes

conditions where exact solutions are easy to ﬁnd. If

the chosen metric is a minimum- f -weight partition

distance, i.e. δ = δ

with f ∈ F, and weighting

function f is either supermodular or else submodu-

lar (but not both, see below), then either the meet

P = P

∧ ··· ∧ P

or else the join

P = P

∨ ··· ∨ P

instance elements are consensus partitions. The for-

mer case applies to Hamming distance or size-based

= HD and to logical entropy-based δ

, while the

latter applies to rank-based δ

and to co-size-based

. The computational burden thus reduces solely to

assessing the m distances between instance elements

and their meet (or their join), with no search need.

Proposition 15. If distances between partitions are

measured by HD, then the meet of instance elements

achieves consensus: for Q, P

, . . . , P

∈ P

∑

1≤k≤m

HD(P

∧ ·· · ∧ P

, P

) ≤

∑

1≤k≤m

HD(Q, P

Proof. Firstly note that for m = 2 this consensus con-

dition is in fact a restatement of horizontal collinear-

ity and triangle inequality (see propositions 2 and 3

above). Hence, in order to use induction, assume that

the condition holds for some m ≥ 2, and denote by

the solution or consensus partition of a m + 1-instance

. . . , P

, P

m+1

. By assumption, P

∧ ·· · ∧ P

is a so-

lution of instance P

, . . . , P

, thus novel solution

minimizes the sum of its distances from the previous

solution P

∧ ··· ∧ P

and from the novel instance el-

ement P

m+1

, i.e. for all Q ∈ P

HD(P

∧ ·· · ∧ P

P) + HD(

P, P

m+1

) ≤

≤ HD(P

∧ · · · ∧ P

, Q) + HD(Q, P

m+1

) Then, hor-

izontal collinearity and triangle inequality entail

HD(P

∧ ·· · ∧ P

P) + HD(

P, P

m+1

) ≥

≥ HD(P

∧ ·· · ∧ P

, P

m+1

), with equality if

P = P

∧ ·· · ∧ P

∧ P

m+1

The consensus partition problem may be also

framed in a novel manner through fuzzy modeling.

A fuzzy subset of N is a function q : N → [0, 1] or,

geometrically, a point q = (q

, . . . , q

) ∈ [0, 1]

in the

n-dimensional unit hypercube, where q

= q(i), i ∈ N.

A fuzzy partition is thus commonly intended as a par-

tition P with associated |P| points q

∈ [0, 1]

, A ∈ P

in the hypercube such that q

∈ (0, 1] for all i ∈ A and

all A ∈ P. Also, a fuzzy or random graph with vertex

set N may be seen as one whose edge set is a fuzzy

subset of N

, i.e. a function t : N

→ [0, 1] or, geomet-

rically, a point in the





-dimensional unit hypercube,

i.e. t =



{i, j}

, . . . ,t

{i, j}

(

)



∈ [0, 1]

(

)

By looking at partitions of N as graphs with ver-

tex set N each of whose components is complete (see

above), the fuzzy consensus partition t

associated

with instance I ⊆ P

may be deﬁned as the point

in the interior of the polytope P of partitions (see

above) corresponding to the center of the convex hull

conv({I

: P ∈ I }) consisting of all convex combina-

tions of the indicator functions I

, P ∈ I of instance

elements. Then, the fuzzy consensus partition is a

function ranging in the unit interval [0, 1] and taking

values on the atoms of P

, i.e. t

: P

(1)

→ [0, 1] and

([i j]) =

|I |

∑

P∈I

([i j]) for all atoms [i j] ∈ P

(1)

. In

this framework, the strong patterns of instance I con-

sidered in (Pinto Da Costa and Rao, 2004) are the

blocks of partition P(t

) obtained through defuzziﬁ-

cation of t

as follows: P(t

) = ∨

([i j])=1

[i j]. In words,

P(t

) obtains as the join of all atoms where the fuzzy

consensus partition t

attains its maximum, i.e. 1.

6 CONCLUSIONS

Measuring the distance between partitions is an im-

portant topic in statistical classiﬁcation since the ’60s

(Lerman, 1981; R

enier, 1965). This works considers

the analog of the traditional Hamming distance be-

tween subsets by counting unordered pairs of parti-

tioned elements. Counting ordered and/or unordered

pairs is not new (see (Meila, 2007, Section 2.1)),

but the Hamming distance HD is here analyzed from

a novel geometric perspective. Special attention is

placed on complements in comparison with two dis-

tances proposed in recent years, namely MMD and

VI. Given its low computational complexity and ﬁne

measurement sensitivity, HD seems interesting for ap-

plications, especially in bioinformatics.

HD relies on the size, which counts the atoms ﬁner

than partitions. While the cardinality (or rank) of sub-

sets is a valuation (i.e. supermodular and submodu-

lar), the size is supermodular. In fact, if f is a valu-

ation of the partition lattice, then it is constant, i.e.

f (P) = f (Q) for all partitions P, Q (Aigner, 1997).

Metrics for Clustering Comparison in Bioinformatics

307

Since the Hamming distance between A, B ∈ 2

|A∆B| = |A ∪ B| − |A ∩ B|, it may seem reasonable

to consider distances δ(P, Q) between P, Q ∈ P

the form δ(P, Q) = f (P ∨ Q) − f (P ∧ Q) with f ∈ F.

Yet, this clearly does not distinguish between differ-

ent complements Q, Q

∈ C O(P) when P 6= P

⊥

, P

The geometric approach enables to analyze fur-

ther partition distances obtained by replacing the size

with alternative partition functions such as entropy,

rank and logical entropy, where these latter two are

submodular. In general, any symmetric and order-

preserving partition function f provides a distance be-

tween partitions P, Q by considering f (P), f (Q) and

the values taken on their meet f (P ∧ Q) or else on

their join f (P ∨ Q). Speciﬁcally, f deﬁnes weights on

edges of the Hasse diagram of partitions such that the

corresponding partition distance between any P, Q is

the weight of a lightest P − Q-path.

REFERENCES

Aigner, M. (1997). Combinatorial Theory. Springer.

Almudevar, A. and Field, C. (1999). Estimation of single-

generation sibling relationships based on DNA mark-

ers. Journal of Agricultural, Biological and Environ-

mental Statistics, 4(2):136–165.

Berger-Wolf, T. Y., Sheikh, S. I., DasGupta, B., Ashley,

M. V., Caballero, I. C., Chaovalitwongse, W., and Pu-

trevu, S. L. (2007). Reconstructing sibling relation-

ship in wild populations. Bioinf., 23(13):i49–i56.

Bollobas, B. (1986). Combinatorics. Set Systems, Hyper-

graphs, Families of Vectors, and Combinatorial Prob-

ability. Cambridge University Press.

Brøondsted, A. (1983). An introduction to convex poly-

topes. Springer.

Brown, D. G. and Dexter, D. (2012). Sibjoin: a fast heuristic

for half-sibling reconstruction. Algorithms in Bioin-

formatics, LNCS 7534:44–56.

Celeux, G., Diday, E., Govaert, G., Lechevalier, G., and

Ralambondrainy, H. (1989). Classiﬁcation Automa-

tique Des Donn

ees. Dunod.

Day, W. (1981). The complexity of computing metric dis-

tances between partitions. Math. Soc. Sc., 1(3):269–

287.

Deza, M. M. and Deza, E. (2013). Encyclopedia of Dis-

tances - Second Edition. Springer.

Ellerman, D. (2013a). An introduction to logical entropy

and its relation to Shannon entropy. International

Journal of Semantic Computing, 7(2):121–145.

Ellerman, D. (2013b). An introduction to partition logic.

Logic Journal of the IGPL, 22(1):94–125.

Godsil, C. and Royle, G. F. (2001). Algebraic Graph The-

ory. Springer.

Graham, R., Knuth, D., and Patashnik, O. (1994). Concrete

Mathematics. Addison-Wesley.

unbaum, B. (2001). Convex Polytopes. Springer.

Gusﬁeld, D. (2002). Partition-distance: A problem and

class of perfect graphs arising in clustering. Informa-

tion Processing Letters, 82:159–164.

Hubert, L. and Arabie, P. (1985). Comparing partitions.

Journal of Classiﬁcation, 2(1):193–218.

Konovalov, D. A. (2006). Accuracy of four heuristics for the

full sibship reconstruction problem in the presence of

genotype errors. Adv. Bioinf. Comp. Bio., 3:7–16.

Konovalov, D. A., Bajema, N., and Litow, B. (2005a). Mod-

iﬁed Simpson O(n

) algorithm for the full sibship re-

construction problem. Bioinf., 21(20):3912–3917.

Konovalov, D. A., Litow, B., and Bajema, N. (2005b).

Partition-distance via the assignment problem. Bioinf.,

21(10):2463–2468.

Korte, B. and Vygen, J. (2002). Combinatorial Optimiza-

tion: Theory and Algorithms (2nd edition). Springer.

Lerman, I. C. (1981). Classiﬁcation et Analyse Ordinale

des Donn

ees. Dunod.

Meila, M. (2007). Comparing clusterings - an information

based distance. J. of Mult. Ananysis, 98(5):873–895.

Mirkin, B. G. (1996). Mathematical Classiﬁcation and

Clustering. Kluwer Academic Press.

Mirkin, B. G. and Cherny, L. B. (1970). Measurement of

the distance between distinct partitions of a ﬁnite set

of objects. Aut. and Rem. Con., 31(5):786–792.

Mirkin, B. G. and Muchnik, I. (2008). Some topics of cur-

rent interest in clustering: Russian approaches 1960-

1985. Electronic Journal for History of Probability

and Statistics, 4(2):1–12.

Pinto Da Costa, J. F. and Rao, P. R. (2004). Central parti-

tion for a partition-distance and strong pattern graph.

REVSTAT - Statistical Journal, 2(2):127–143.

enier, S. (1965). Sur quelques aspects math

ematiques des

probl

emes de classiﬁcation automatique. ICC Bul-

letin, 4:175–191. Reprinted in Math

ematiques et Sci-

ences Humaines 82:13-29, 1983.

Rossi, G. (2011). Partition distances. arXiv:1106.4579v1.

Rota, G.-C. (1964a). The number of partitions of a set.

American Mathematical Monthly, 71:499–504.

Rota, G.-C. (1964b). On the foundations of combinatorial

theory I: theory of M

obius functions. Z. Wahrschein-

lichkeitsrechnung u. verw. Geb., 2:340–368.

Seb

o, A. and Tannier, E. (2004). On metric generators of

graphs. Math. of Op. Res., 29(2):383–393.

Sheikh, S. I., Berger-Wolf, T. Y., Khokhar, A. A., Caballero,

I. C., Ashley, M. V., Chaovalitwongse, W., Chou,

C.-A., and DasGupta, B. (2010). Combinatorial re-

construction of half-sibling groups from microsatellite

data. J. Bioinf. Comp. Biol., 8(2):337–356.

Stanley, R. (1971). Modular elements of geometric lattices.

Algebra Universalis, (1):214–217.

Stern, M. (1999). Semimodular Lattices. Theory and Appli-

cations. Encyclopedia of Mathematics and its Appli-

cations 73. Cambridge University Press.

Warrens, M. J. (2008). On the equivalence of Chen’s Kappa

and the Hubert-Arabie adjusted Rand index. Journal

of Classiﬁcation, 25(1):177–183.

Whitney, H. (1935). On the abstract properties of linear

dependence. Amer. J. of Math., 57:509–533.

ICPRAM 2016 - International Conference on Pattern Recognition Applications and Methods

308