Multiple Segmentation of Image Stacks

Jonathan Smets and Manfred Jaeger

Department for Computer Science, Aalborg University, Aalborg, Denmark

Keywords:

Segmentation, Multiple Clustering, Probabilistic Models.

Abstract:

We propose a method for the simultaneous construction of multiple image segmentations by combining a

recently proposed “convolution of mixtures of Gaussians” model with a multi-layer hidden Markov random

ﬁeld structure. The resulting method constructs for a single image several, alternative segmentations that

capture different structural elements of the image. We also apply the method to collections of images with

identical pixel dimensions, which we call image stacks. Here it turns out that the method is able to both

identify groups of similar images in the stack, and to provide segmentations that represent the main structures

in each group.

1 INTRODUCTION

Traditional clustering methods construct a single

(possibly hierarchical) partitioning of the data. How-

ever, clustering when used as an explorativedata anal-

ysis tool may not possess a single optimal solution

that is characterized as the optimum of a unique un-

derlying score function. Rather, there can be multiple

distinct clusterings that each represent a meaningful

view of the data. This observation has led to a re-

cent research trend of developing methods for multi-

ple clustering (or multi-view clustering). The general

goal of these methods is to automatically construct

several clusterings that represent alternative and com-

plementary views of the data (see (M¨uller et al., 2012)

for a recent overview,and the proceedings of the Mul-

tiClust workshop series for current developments).

The perhaps most typical application area for mul-

tiple clustering is document data (e.g. collections of

news articles or web pages). For example, the stan-

dard benchmarkWebKB dataset consists of university

webpages that can be alternatively clustered accord-

ing to page-type (e.g. personal homepage or course

page), or the different universities the pages are taken

from. Turning to image data, previously used bench-

mark sets are the CMU and the Yale Face Images

data, which consists of portrait images of different

persons in several poses, and accordingly can be clus-

tered according to persons or poses (Cui et al., 2007;

Jain et al., 2008). In this setting, each image is a

data-point, and (multiple) clustering means grouping

images. When, instead, one views as a data-point a

single image pixel, then multiple clustering becomes

multiple image segmentation.

Relatively few work has been done on ﬁnding

multiple, alternative image segmentations. (Kim and

Zabih, 2002) developed a quite speciﬁc factorial

Markov random ﬁeld model in which an image is

modeled as an overlay of several layers, and each

layer corresponds to a binary segmentation. (Qi and

Davidson, 2009) apply a general multiple clustering

approach to a variety of datasets, including images.

Their multiple clustering approach falls into the cat-

egory of iterative multiple clustering, where given an

initial (primary) clustering, a single alternative clus-

tering is constructed. Our approach, on the other

hand, falls into the category of simultaneous multi-

ple clustering methods, where an arbitrary number of

different clusterings is constructed at the same time,

and without any priority ordering among the cluster-

ings. Finally, (Kato et al., 2003) generate alternative

segmentations based on color and texture features, re-

spectively. However, the objective here is not to pro-

vide different, alternative segmentations, but to com-

bine the two segmentations into a single one.

It is worth emphasizing that multiple clustering

in the sense here considered is different from the

construction of cluster ensembles (Strehl and Ghosh,

2003). In the latter, numerous clusterings are built

in order to overcome the convergence to only locally

optimal solutions of clustering algorithms, and to con-

struct out of a collection of clusterings a single con-

sensus clustering. The multiple segmentations in the

sense of (Hoiem et al., 2005; Russell et al., 2006) are

Smets J. and Jaeger M..

Multiple Segmentation of Image Stacks.

DOI: 10.5220/0004753200050013

In Proceedings of the 3rd International Conference on Pattern Recognition Applications and Methods (ICPRAM-2014), pages 5-13

ISBN: 978-989-758-018-5

 2014 SCITEPRESS (Science and Technology Publications, Lda.)

segmentation analogues of cluster ensembles, not of

multiple clusterings in our sense.

In this paper we developa method for constructing

multiple segmentations of images and image stacks,

which we deﬁne as a collection of images with equal

pixel dimensions. The most import type of image

stacks are the collection of frames in a video se-

quence. However, we can also consider other such

collections of pixel-aligned images. As we will see in

the experimental section, multiple clustering of such

image stacks can give results that combine elements

of clustering at the image and at the pixel level. For

the design of our method we build on the convolution

of mixtures of Gaussians model of (Jain et al., 2008)

which we customize for the segmentation setting by

combining it with a Markov Random Field structure

to account for the spatial dimension of the data.

Our approach is intended as a general method that

can be applied to image data of quite different types,

and that thereby is a quite general tool for explorative

image data analysis. For more specialized application

tasks, our general method may serve as a basis, but

will presumably require additional modiﬁcations and

adaptations.

2 THE CONVOLUTIONAL

CLUSTERING MODEL

Probabilistic clustering approaches are based on la-

tent variable models where a data point x is assumed

to be sampled from a joint distribution P(X, L | θ)

of an observed data variable X and a latent variable

L ∈ {1,...,k}, governed by parameters θ (throughout

this paper we use bold symbols to denote tuples of

variables, parameters, etc.; when talking about ran-

dom variables, then uppercase letters stand for the

variables, and lowercase letters for concrete values of

the variables). Clustering then is performed by learn-

ing the parameters θ, and assigning x to the cluster

with index i for which P(X = x,L = i | θ) is maximal.

This probabilistic paradigm is readily generalized

to multiple clustering models. One only needs to

design a model P(X,L | θ) containing multiple la-

tent variables L = L

,...,L

. Then the joint assign-

ment L

= i

,...,L

= i

(abbreviated L = i) maxi-

mizing P(X = x,L

= i

,...,L

= i

| θ) deﬁnes the

cluster indices for x in m distinct clusterings. Mod-

els for multiple clustering that are based on multiple

latent variables include the factorial Hidden Markov

Model (Ghahramani and Jordan, 1997), the factorial

Markov Random Fields of (Kim and Zabih, 2002),

convolution of mixtures of Gaussians (Jain et al.,

2008), the latent tree models of (Poon et al., 2010),

i,1

i,2

Figure 1: Multi-layer Hidden Markov Random Field.

and the factorial logistic model of (Jaeger et al.,

2011).

2.1 The Probabilistic Model

Our model is structurally identical to the factorial

Markov Random Field model of (Kim and Zabih,

2002). Figure 1 shows the structure of such a multi-

layer hidden Markov random ﬁeld: with each pixel

i ∈ I (I the set of all pixels) are associated m la-

tent variables L

i,•

= L

i,1

,...,L

i,m

and a vector of ob-

served variables X

. For k = 1,..., m the variables

•,k

= L

1,k

,...,L

|I|,k

take values in the set {1, ...,n

so that the kth segmentation will consist of n

seg-

ments.

For this paper we assume that in the case of single

image analysis, X

is simply the 3-dimensional vec-

tor (R

) of rgb-values at pixel i. In the case

of image stacks with N images, X

will be a 3 · N-

dimensional vector containing the rgb-values of all

images in the stack. We denote with | X |

the dimen-

sion of X

. Though we do not explore this in the cur-

rent paper, we note that X

could also contain differ-

ently deﬁned observed features of pixel i.

For every k = 1,.. .,m, the latent variables L

•,k

form a Markov random ﬁeld with a square grid struc-

ture. The distribution of X

depends conditionally on

the latent variables L

i,•

The marginal distribution P(L | θ) is deﬁned as a

product of m Potts models deﬁned by a common tem-

perature parameter T:

P(L = l | θ) = P(L = l | T) =

∏

k=1

V(L

•,k

)/T

where Z is the normalization constant, and

V(L

•,k

= l

•,k

) =

∑

i, j:i∼ j

I(l

i,k

6= l

j,k

)

with I(l

i,k

6= l

j,k

) = 1 if l

i,k

6= l

j,k

, and = 0 otherwise.

For the conditional distribution P(X | L, θ) the

model of Figure 1 implies conditional independence

ICPRAM2014-InternationalConferenceonPatternRecognitionApplicationsandMethods

for different pixels of the observed pixel features X

given the latent pixel variables L

i,•

. Moreover, we

assume that the conditional model P(X

| L

i,•

,θ) is

identical for all i. It is deﬁned as the convolution

of m mixtures of Gaussians as follows. For k =

1,.. .,m and j = 1, ... ,n

let µ

k, j

∈ R

. Writing

= µ

k,1

,...,µ

k,n

, we obtain for everyk a distribution

for a variable Z

i,k

deﬁned as a mixture of Gaussians

P(Z

i,k

| L

i,k

,µ

) =

∑

j=1

N(µ

k, j

,1)I(L

i,k

= j),

where 1 stands for the unit covariance matrix. For two

distributions P(Y),P(Z) of two k-dimensional real

random variables Y,Z, we denote with P(Y) ∗ P(Z)

their convolution, i.e., the distribution of the sum

X = Y + Z. The ﬁnal model for X

now is deﬁned

as the m-fold convolution:

P(X

| L

i,•

,µ

,...,µ

) =

P(Z

i,1

| L

i,1

,µ

) ∗ ·· · ∗ P(Z

i,m

| L

i,m

,µ

Combining the model for L and X | L, We now

obtain

log(P(L = l,X = x | µ,T)) ≈

− 1/T

∑

k=1

∑

i, j:i∼ j

I(l

i,k

6= l

j,k

)

−

∑

i∈I

k x

−

∑

k=1

k,l

i,k

(1)

2.2 The Regularization Term

Maximizing the log-likelihood (1) alone is a sound

approach to probabilistic multiple segmentation.

However, (Jain et al., 2008) suggest to add to the like-

lihood the regularization term

−λ

∑

k,k

′

=1,...,m

k6=k

′

∑

j=1,...,n

′

=1,...,n

′

(µ

k, j

· µ

′

, j

′

)

(2)

Here λ ≥ 0 is a weight parameter that regulates the

strength of the inﬂuence of the regularization term.

This penalty term is minimized when the means

,µ

′

corresponding to different segmentations lie

in orthogonal subspaces. The rationale given for

this regularization term is twofold. First, the like-

lihood function (1) does not have a unique maxi-

mum. Indeed, taking the case m = 2, the two so-

lutions (µ

1,1

,...,µ

1,n

,µ

2,1

,...,µ

2,n

,T) and (µ

1,1

c,...,µ

1,n

+ c,µ

2,1

− c,..., µ

2,n

− c,T) (c ∈ R

) de-

ﬁne the same distribution, and thereforehave the same

likelihood score. Second, the likelihood alone does

not give an explicit reward for the distinctness, or

complementarity, of the resulting multiple cluster-

ings. Following other approaches to multiple clus-

tering, it is hoped that encouraging the means corre-

sponding to different clusterings to lie in orthogonal

subspaces will lead to a greater diversity of those clus-

terings.

We argue that the form and justiﬁcation for this

particular regularization term is slightly ﬂawed, and

that it should be replaced by a modiﬁed version. First,

we note that the non-uniqueness of the optimal so-

lution for (1) is not a real problem as long as two

different optimal solutions deﬁne the same multiple

segmentation. This, however, is exactly the case for

the two solutions distinguished by the offset vector c

as described above. Second, regularization with (2)

is not invariant under simple shifts of the coordinate

system: adding a constant vector z to all data-points

should have no effect on the optimal segmentation,

which should be characterized by also adding z to all

model parameters µ

k, j

. Since (2) is not invariant un-

der addition of a constant to all µ

k, j

, this is not the be-

havior one obtains with this regularization term. We

therefore propose to modify (2) so as to reward means

,µ

′

to lie in orthogonal afﬁne sub-spaces, rather

than orthogonal linear sub-spaces. Thus, we propose

the following regularization term:

− λ

∑

k,k

′

=1,...,m

k6=k

′

∑

j,h=1,...,n

:j<h

′

=1,...,n

′



k, j

− µ

k,h

k µ

k, j

− µ

k,h

′

, j

′

− µ

′

k µ

′

, j

′

− µ

′



. (3)

Thus, we reward solutions in which normalized

difference vectors between the means of different lay-

ers are orthogonal, rather than the means themselves.

The term (3) now is invariant under adding, respec-

tively subtracting, a constant vector c to all means of

two different layers, and hence we again have the non-

uniqueness of optimal solutions as for the pure like-

lihood (1). However, as argued above, we do not see

this as a problem.

One small practical problem arises when we de-

ﬁne our objective function as the sum of (1) and (3):

the likelihood term (1) increases in magnitude lin-

early with the number of pixels. The regularization

term, on the other hand, only increases as a function

of the number of layers and the number of segments

per layer. The choice of an appropriate tradeoff pa-

rameter λ between likelihood and regularization term,

thus, would depend on the number of pixels. In order

MultipleSegmentationofImageStacks

to get a more uniform scale for λ across different ex-

periments, we therefore normalize the regularization

term with the factor |I| /K, where K is the number of

terms in the sum (3).

We remark that the probabilistic model (1) alone

also has some built-in capability to encourage a di-

versity in the parameters µ

for different layers, and

hence, in the different segmentations. This is because

having two layers with verysimilar means µ

does not

allow a much better ﬁt to the data than a single layer

with those means. Exploiting the full parameter space

of the model to obtain a good ﬁt to the data, thus, will

tend to lead to some diversity in the parameters µ

For this reason, in our experiments, we also pay par-

ticular attention to the case λ = 0, i.e., segmentation

according to the pure probabilistic model (1).

2.3 Clustering Algorithm

We take the model parameter β := 1/T and the reg-

ularization parameter λ as user-deﬁned inputs that

may be varied in an iterative data exploration pro-

cess. Large values of β mean that high emphasis is

put on segmentations with large connected segments

and smooth boundaries. Larger values of λ mean that

diversity of segmentations as measured by the regu-

larization term (3) is more strictly enforced.

Thus, the only model-parameterswe have to ﬁt are

the mean vectors µ

. Our goal, then, is to maximize

a score function S(µ

,...,µ

,l) which is given as the

sum of (1) and (3).

We use a typical 2-phase iterative process for this

optimization: in a MAP-step we compute for a cur-

rent setting of the µ

the most probable assignment

L = l for the latent variables according to the likeli-

hood function (1) (since (3) does not depend on l, we

can ignore it in this phase). In a M(aximization)-step

we recompute for the current setting L = l the µ

op-

timizing S(µ

,...,µ

,l). This well-known clustering

approach (sometimes referred to as hard EM) has also

been proposed for image segmentation in (Chen et al.,

2010).

2.3.1 MAP-step

For the MAP-step we make use of the α-expansion

algorithm of (Boykov et al., 2001; Kolmogorov and

Zabin, 2004; Boykov and Kolmogorov, 2004). This

algorithm provides solutions to segmentation prob-

lems characterized by an energy function E for seg-

mentations s, which are of the form

E(s) =

∑

i, j:i∼ j

i, j

(s(i),s( j)) +

∑

(s(i)), (4)

where s(i) is the segment label of pixel i, V

i, j

is a

penalty function for discontinuities in s, and D

any non-negative function measuring the discrepancy

of the label assignment s(i) with the observed data

for i. It is shown in (Boykov et al., 2001) that if

i, j

(s(i),s( j)) is a metric on the label space, then the

α-expansion algorithm is guaranteed to ﬁnd a solu-

tion s that is within a constant factor of the globally

minimal energy E().

Up to a change of sign (and a corresponding

change from a minimization to a maximization ob-

jective) our likelihood function (1) has the form (4)

for the m-dimensional label space ×

k=1

{1,...,n

}

(i.e. s(i) = (l

i,1

,...,l

i,m

)), with V

i, j

(s(i),s( j)) =

∑

k=1

I(l

i,k

6= l

j,k

) and D

(s(i)) =k x

−

∑

k=1

k,l

i,k

Furthermore, it is straightforward to see that our

i, j

is a metric on the m-dimensional label space.

To use the α-expansion algorithm we ﬂatten our

m-dimensional label space to a one-dimensional label

space with

∏

k=1

different labels. Thus, our method

has a complexity that is exponential in the number

of layers. On the other hand, the α-expansion algo-

rithm in practice is quite efﬁcient as a function of the

number of pixels. It is reputed to show a linear com-

plexity in practice (Boykov et al., 2001), which was

conﬁrmed by our observations in our experiments.

2.3.2 M-step

The M-step is performed by gradient ascent, leading

to a local maximum of the score function given the

current segmentation L = l.

2.3.3 Implementation

The algorithm is implemented in Matlab, using the α-

expansion implementation provided by the gco-v3.0

library available on http://vision.csd.uwo.ca/code/.

3 EXPERIMENTS

In all our experiments we construct multiple segmen-

tations with the same number of segments in each

layer. We therefore refer to a multiple segmentation

with m layers and k segments in each layer as a (m,k)-

segmentation.

3.1 Single Images

Our ﬁrst experiment establishes the baseline result

that the segmentation methods works as intended

when the input closely ﬁts the underlying model-

ing assumption. To this end we construct the image

ICPRAM2014-InternationalConferenceonPatternRecognitionApplicationsandMethods

shown in Figure 2 (c) as the overlay of the two im-

ages (a) and (b), and used our method to construct

(2,3)-segmentations from the single input image (c).

First setting λ = β = 0, we performed 200 runs of the

algorithm with different random initializations. The

highest-scoringsolution that was foundconsists of the

segmentations (d) and (e). In these ﬁgures, the color

of the jth segment in the kth layer is set to ˜µ

k, j

, where

˜µ

k, j

is obtained from µ

k, j

by applying min-max nor-

malization to re-scale the components of all the mean

vectors µ

(k = 1,...,m) into the interval [0..255] of

proper rgb-values. Essentially the same optimal result

was found in 9 out of the 200 runs. In the remain-

ing runs the algorithm converged to local minima, an

example of which is shown by (f) and (g). These re-

sults were clearly identiﬁed by the algorithm as sub-

optimal by being associated with signiﬁcantly lower

score function values.

With increasing λ parameter the results in this ex-

periment deteriorated. At λ = 5000 the “correct” so-

lution was not found in 200 restarts. This is not very

surprising, since for this image with λ = β = 0 the

correct solution is clearly distinguished as the solution

that can achieve a perfect score of 0 on the remaining

Euclidean part of the likelihood term (1).

(a)

(b)

(c)

(d) (e)

(f) (g)

Figure 2: Baseline: overlay image.

Next, we perform a series of experiments on the

butterﬂies image by M.C. Escher, shown in Figure 3,

which has previously been used in (Qi and Davidson,

2009). The size of this image is 402x401 pixels.

We ﬁrst compute (2,3)-segmentations with vary-

ing values of λ (and β = 0). Figure 4 shows the high-

est scoring results (in 20 restarts) obtained for λ =

0,1000,10000. In all cases, essentially the same two

segmentations are computed: one that corresponds

to the main colors of the three types of butterﬂies

Figure 3: Escher’s butterﬂies.

λ = 0

λ = 1000

λ = 10000

Figure 4: Escher (2,3)-segmentations, varying λ

in the image, and one that captures the ﬁner struc-

ture of the borders between the butterﬂies, as well

as the shading inside the butterﬂies. The main ef-

fect of the regularization term here is not a differ-

ence in the segmentations, but only a difference in

the means associated with the segments: for the high

value λ = 10000, the means in the second segmenta-

tion all have a strong green component, whereas the

means of the ﬁrst component only have weak green

components. This makes the means of the two com-

ponents lie in near-orthogonalafﬁne spaces. A similar

color-separation does not appear at λ = 0.

A common way to measure dissimilarity of two

clusterings L

is normalized mutual information

NMI(L

) =

MI(L

)

H(L

)H(L

)

MultipleSegmentationofImageStacks

where MI is the mutual information and H() the en-

tropy of L

, as determined by the empirical joint

distribution of L

deﬁned by the cluster assign-

ments of the pixels. Low values of NMI indicate sta-

tistical independence, and hence dissimilarity of clus-

terings. Furthermore, a justiﬁcation given by (Jain

et al., 2008) for the regularization term (2) is that it

induces a bias towards statistically independent clus-

terings. This justiﬁcation carries over to our modiﬁed

version (3). Therefore, the NMI as an evaluation mea-

sure is quite consistent with our objective function.

However, while low values of the regularization

term can be due to statistical independence of the

segmentations, this is not a strict correlation. As

discussed above, the increasing weight of the regu-

larization term in Figure 4 only leads to a shift of

the mean rgb-vectors without a noticeable change in

the segmentations. This leads to an improvement in

the value of the regularization term from 8.28 · 10

at λ = 1000 to 1.82 · 10

at λ = 10000 (at λ = 0

no regularization term is computed). However, the

NMI values for the three solutions of Figure 4 are

8.4·10

−3

,5.4·10

−2

,7.1·10

−2

for λ = 0,1000, 10000,

respectively. Thus, the NMI values are even slightly

increasing for larger λ-values.

We note at this point that NMI values have to

be used with caution when assessing dissimilarity of

image segmentations (rather than other types of data

clusterings): NMI is a function only of cluster mem-

bership of pixels. However, for segmentations one

is perhaps more interested in the borders deﬁned be-

tween segments, than in the global grouping of pix-

els into segments. Figures 5-7 illustrate this issue.

Figure 5 shows a modiﬁed version of Escher’s but-

terﬂies in which we have superimposed an additional

square grid structure on the butterﬂy image. Figure 6

is shows a hypothetical (2,4)-segmentation (not com-

puted by our method) of this image. Both segmenta-

tions identify the grid structure – the ﬁrst one dividing

the structure according to columns (and background),

the second according to rows (and background). For

the non-background pixels row and column member-

ship are independent random variables. The mutual

information of the two segmentations therefore re-

duces to −P(b)logP(b) − (1 − P(b))log(1 − P(b)),

where P(b) is the probability of background pixels

(i.e. the relative image area covered by background).

In the limit where the size of the squares is increased,

and P(b) → 0, the mutual information of the two seg-

mentations, thus, goes to zero (and so does the nor-

malized mutual information). This shows that dissim-

ilarity as measured by low mutual information need

not correspond to the kind of complementarity we

may be looking for in different segmentations. Fig-

Figure 5: Butterﬂies with squares

Figure 6: Segmentations with low mutual information.

ure 7 shows the (2,4)-segmentation actually obtained

by our method. The result shown is for λ = 0, but re-

sults for higher λ-values are similar. Clearly, we will

not obtain segmentations similar to those in Figure 6,

since these would score very poorly in the likelihood

term (1), and their low NMI score also would not be

reﬂected in a low value of the regularization term.

We see, thus, that neither need there be a good

correspondence between low NMI values and com-

plementarity of segmentations in the intuitive sense,

nor does the regularization term necessarily induce a

strong bias towards low NMI solutions. Fortunately,

as Figure 7 shows, the likelihood score alone is quite

successful in producing segmentations that are com-

plementary in an intuitively meaningful sense.

In the next experiment we keep λ = 0 ﬁxed, and

vary β = 1000, 16000. As the results in Figure 8 show,

the effect is quite consistent with expectations: the al-

ready fairly smooth ﬁrst segmentation remains quite

stable (even though some further smoothing of the

borders occurs), whereas the smoothing of the ini-

tially rather fragmented second segmentation leads

to an eventual dissolving of the structure, including

the elimination of one of the three segments (we note

Figure 7: Actual (2,4)-segmentation for butterﬂies with

squares.

ICPRAM2014-InternationalConferenceonPatternRecognitionApplicationsandMethods

β = 1000

β = 16000

Figure 8: Escher (2,3)-segmentations, varying β.

Figure 9: Escher (3,2)-segmentation.

that we here always manually label segmentations as

“ﬁrst” and “second” to facilitate the comparison; the

algorithm may return either segmentation with index

1 or 2).

Finally, we perform a (3,2)-segmentationwith λ =

β = 0. The result is shown in Figure 9. The ﬁrst seg-

mentation again is based on the main underlyingcolor

distribution, isolating the blue butterﬂies from the

rest. The last segmentation again represents mostly

the border structure and shading. Finally, the segmen-

tation in the middle is mostly identifying the green

butterﬂies, but also represents some structure. (Qi and

Davidson, 2009) present a (2,2)-segmentation for the

butterﬂy image obtained from their iterative cluster-

ing method. Their two segmentations are quite simi-

lar in nature to the ﬁrst two in Figure 9.

3.2 Image Stacks

As a ﬁrst experiment with an image stack, we used the

collection of 25 ﬂag-images shown in Figure 10 (each

at a resolution of 150× 75 pixels).

Again setting λ = β = 0, the highest scoring (2,3)-

segmentation is shown at the bottom of Figure 10.

Here we now depict the different segments using arbi-

trarily chosen greyscale values. The means µ

k, j

char-

acterizing segments now are 3 · 25 dimensional vec-

tors that can be interpreted as an average color se-

Figure 10: Stack of ﬂag images.

quence for pixels in a segment. Taking for visualiza-

tion the average over all colors in the sequence typi-

cally leads to all segments represented by very similar

brownish colors (although, curiously, in this particu-

lar case the average colors for the segmentation with

the vertical stripes yield a somewhat washed-out look-

ing French ﬂag). The same “correct” solution here

was found in 9 out of 50 random restarts.

A second image stack we constructed consists of

10 images each of trains and horses, as shown in Fig-

ure 11. We performed (2,3)-segmentation with λ = 0

and β = 50. The highest scoring result within 400

runs is shown at the bottom of Figure 11. The method

identiﬁes the main structures in the two groups of im-

ages also in this somewhat more diverse collection of

images. The results in the different runs were rela-

tively stable, with other high-scoring solutions simi-

lar to the top-scoring one. Results with lower scores

often separated the two groups of images less clearly,

or contained segmentations in which on segment was

reduced to very few pixels.

In all our experiments results were quite robust

under variations of the λ and β parameters. Good re-

sults are typically already obtained at the baseline set-

ting λ = β = 0. Note that β = 0 means that the Markov

random ﬁeld structure of the model is ignored, and

that the MAP step could be implemented in a much

simpliﬁed manner. In applications where smooth and

contiguous segments are required, settings of β > 0

will be needed. The impact of the λ parameter on the

segmentations was rather small. It appears that larger

values of λ affected the placement of the mean pa-

rameters representing the different segments, but not

MultipleSegmentationofImageStacks

Figure 11: Stack of Horse and Train images.

so much the resulting segmentations themselves.

We close this section with some information on

the runtimes of our experiments: a single run of a

(2,3) or (3,2)-segmentation of the 402x401 pixel but-

terﬂy image takes about 1 minute on average, with

an average of about 8 iterations of MAP and M steps

until our termination criterion is met that the score

improvement in one iteration is less than 2%. The

same experiments with the image at twice the resolu-

tion take about twice as long. The average runtime for

the Horse-Train image stack also is about 1 minute.

The higher dimensionality of the feature vector here

is offset by the smaller number of pixels at the res-

olution of 151x151 for the images in the stack. For

the Horse-Train stack most of the computation time

(about 90%) is taken by the M step, which is more

affected by the dimensionality of the feature vector.

For the butterﬂy image, on the other hand, most of the

time (approx. 70%) is spent on the MAP step.

4 CONCLUSIONS

We have introduced a method for constructing mul-

tiple segmentations of image stacks by combining

the convolution of mixtures of Gaussians model (Jain

et al., 2008) with a multi-layer Markov Random ﬁeld.

While novel in this form, the resulting model is a quite

straightforward combination of existing components.

The main original contributionof this paper is the ﬁrst

dedicated investigation of multiple clustering for im-

age segmentation, and the introduction of (multiple)

segmentation of image stacks. We note that the latter

is different from cosegmentation (Rother et al., 2006)

and standard video segmentation, where also “stacks”

of images are segmented simultaneously, but where a

separate segmentation is computed for each image (or

frame).

We have conducted a range of experiments that

demonstrate that the method is able to produce mean-

ingful results in a broad variety of datasets. Applied

to single images, it is able to identify the structures

of multiple constituent components. Applied to im-

age stacks, it can perform a simultaneous clustering

at the image and at the pixel level. All these results

were obtained using only the basic rgb pixel features.

No task-speciﬁc preprocessing or feature engineer-

ing was needed to obtain our results. One can thus

conclude, that the proposed method provides a useful

baseline approach for explorative image analysis.

For more speciﬁc application purposes or data

analysis objectives, it will be necessary to construct

more speciﬁc pixel features. One possible such appli-

cation domain is multiple segmentation of video se-

quences. The frames of a video can obviously be seen

as an image stack. Using only the rgb pixel features

our method is not very well adapted to video analysis,

since it does not take into account the temporal order

of the frames. New pixel features that capture some

of the temporal dynamics of the pixel values can be

constructed, for example, simply by considering the

variance of the pixel’s rgb values, or by constructing

features that describe the trajectory of the pixel’s rgb

values in rgb-space. Performing multiple segmenta-

tion of video sequences based on such features is a

topic for future work.

In this paper we have also tried to evaluate the

usefulness of regularization terms along the lines pro-

posed in (Jain et al., 2008) for stimulating diversity in

the multiple segmentations. Our results lead to some

doubts both with regard to the effectivenessof the reg-

ularization term to produce segmentations with low

mutual information, and with regard of the usefulness

of mutual information as a measure for diversity in

image segmentations. On the other hand, our results

ICPRAM2014-InternationalConferenceonPatternRecognitionApplicationsandMethods

indicate that the likelihood term (1) alone is quite ca-

pable of identifying the most relevant, distinct seg-

mentations.

REFERENCES

Boykov, Y. and Kolmogorov, V. (2004). An experi-

mental comparison of min-cut/max- ﬂow algorithms

for energy minimization in vision. Pattern Analy-

sis and Machine Intelligence, IEEE Transactions on,

26(9):1124–1137.

Boykov, Y., Veksler, O., and Zabih, R. (2001). Fast ap-

proximate energy minimization via graph cuts. Pat-

tern Analysis and Machine Intelligence, IEEE Trans-

actions on, 23(11):1222–1239.

Chen, S., Cao, L., Wang, Y., Liu, J., and Tang, X. (2010).

Image segmentation by map-ml estimations. Image

Processing, IEEE Transactions on, 19(9):2254–2264.

Cui, Y., Fern, X., and Dy, J. (2007). Non-redundant multi-

view clustering via orthogonalization. In Proceedings

of Seventh IEEE International Conference on Data-

Mining (ICDM 2007), pages 133 – 142.

Ghahramani, Z. and Jordan, M. (1997). Factorial hidden

Markov models. Machine Learning, 29(2-3):245–

273.

Hoiem, D., Efros, A., and Hebert, M. (2005). Geometric

context from a single image. In Computer Vision,

2005. ICCV 2005. Tenth IEEE International Confer-

ence on, volume 1, pages 654–661 Vol. 1.

Jaeger, M., Lyager, S. P., Vandborg, M. W., and Wohlge-

muth, T. (2011). Factorial clustering with an ap-

plication to plant distribution data. In Proceed-

ings of the 2nd MultiClust Workshop: Discover-

ing, Summarizing and Using Multiple Clusterings,

pages 31–42. Online proceedings http://dme.rwth-

aachen.de/en/MultiClust2011.

Jain, P., Meka, R., and Dhillon, I. S. (2008). Simultaneous

unsupervised learning of disparate clusterings. Statis-

tical Analysis and Data Mining, 1(3):195–210.

Kato, Z., Pong, T.-C., and Qiang, S. G. (2003). Unsuper-

vised segmentation of color textured images using a

multilayer mrf model. In Proceedings or the IEEE

International Conference on Image Processing (ICIP

2003), volume 1, pages 961–964. IEEE.

Kim, J. and Zabih, R. (2002). Factorial Markov random

ﬁelds. In Heyden, A., Sparr, G., Nielsen, M., and Jo-

hansen, P., editors, Computer Vision – ECCV 2002,

volume 2352 of Lecture Notes in Computer Science,

pages 321–334. Springer Berlin Heidelberg.

Kolmogorov, V. and Zabin, R. (2004). What energy func-

tions can be minimized via graph cuts? Pattern Anal-

ysis and Machine Intelligence, IEEE Transactions on,

26(2):147–159.

M¨uller, E., G¨unnemann, S., F¨arber, I., and Seidl, T. (2012).

Discovering multiple clustering solutions: Grouping

objects in different views of the data. In Proceedings

of 28th International Conference on Data Engineer-

ing (ICDE-2012), pages 1207–1210.

Poon, L. K. M., Zhang, N. L., Chen, T., and Wang, Y.

(2010). Variable selection in model-based clustering:

To do or to facilitate. In Proceedings of the 27th In-

ternational Conference on Machine Learning (ICML-

2010, pages 887–894.

Qi, Z. and Davidson, I. (2009). A principled and ﬂexi-

ble framework for ﬁnding alternative clusterings. In

Proceedings of the 15th ACM SIGKDD international

conference on Knowledge discovery and data mining

(KDD-09), pages 717–725.

Rother, C., Minka, T., Blake, A., and Kolmogorov, V.

(2006). Cosegmentation of image pairs by histogram

matching-incorporating a global constraint into mrfs.

In Computer Vision and Pattern Recognition, 2006

IEEE Computer Society Conference on, volume 1,

pages 993–1000. IEEE.

Russell, B., Freeman, W., Efros, A., Sivic, J., and Zisser-

man, A. (2006). Using multiple segmentations to dis-

cover objects and their extent in image collections. In

Computer Vision and Pattern Recognition, 2006 IEEE

Computer Society Conference on, volume 2, pages

1605–1614.

Strehl, A. and Ghosh, J. (2003). Cluster ensembles — a

knowledge reuse framework for combining multiple

partitions. J. Mach. Learn. Res., 3:583–617.

MultipleSegmentationofImageStacks