Normalised Diffusion Cosine Similarity and Its Use

for Image Segmentation

Jan Gaura and Eduard Sojka

SB - Technical University of Ostrava, Faculty of Electrical Engineering and Computer Science,

17. listopadu 15, 708 33 Ostrava-Poruba, Czech Republic

Keywords:

Diffusion Distance, Cosine Similarity, Image Segmentation.

Abstract:

In many image-segmentation algorithms, measuring the distances is a key problem since the distance is often

used to decide whether two image points belong to a single or, respectively, to two different image segments.

The usual Euclidean distance need not be the best choice. Measuring the distances along the surface that is

deﬁned by the image function seems to be more relevant in more complicated images. Geodesic distance, i.e.

the shortest path in the corresponding graph, or the k shortest paths can be regarded as the simplest methods.

It might seem that the diffusion distance should provide the properties that are better since all the paths (not

only their limited number) are taken into account. In this paper, we ﬁrstly show that the diffusion distance

has the properties that make it difﬁcult to use it image segmentation, which extends the recent observations of

some other authors. Afterwards, we propose a new measure called normalised diffusion cosine similarity that

is more suitable. We present the corresponding theory as well as the experimental results.

1 INTRODUCTION

Measuring the distance is an important problem in

clustering and image segmentation. The distance is

used as a quantity that makes it possible to decide

whether two image pixels belong to one or two dif-

ferent clusters (image segments). The Euclidean dis-

tance (i.e. the direct straight-line distance) need not

be the best choice. In images, the image points form a

certain surface in some space. Measuring the distance

along this surface promises better results.

The geodesic distance (Papadimitriou, 1985;

Surazhsky et al., 2005) measures the length of the

shortest path lying entirely on the surface. The prob-

lem is that the geodesic distance can be inﬂuenced

signiﬁcantly by relatively small disturbances in image

since only one (and ”thin”) path on the surface deter-

mines the distance. In (Eppstein, 1998), the possibil-

ity of computing k shortest paths is discussed. This

can be viewed as an attempt to take into considera-

tion the connection that is not thin, but has a certain

width, which reduces the inﬂuence of disturbances

and noise.

The resistance distance is a metric on graphs

(Klein and Randi

c, 1993; Babi

c et al., 2002). The

resistance distance between two vertices of graph is

equal to the effective resistance between the corre-

sponding nodes in an equivalent electrical network

(regular grid in this case). The resistances of edges in

the network increase with the increasing local image

contrast. Intuitively, the resistance distance explores

all the existing paths between two points whereas the

geodesic distance explores only the shortest of them.

It was shown that the resistance distance is equiv-

alent to so called commute-time distance (Fouss et al.,

2007; Yen et al., 2007; Qiu and Hancock, 2007)

which is the distance based on summing the diffu-

sion distance in time. Diffusion is a process during

which a certain substance, e.g. heat or electric charge

diffuses from the places of its greater concentration

to the places where the concentration is lower. The

mathematical description can be built on the diffu-

sion equation (i.e. can be physically based) or on the

Markov matrices describing the random walker tech-

nique (Grady, 2006). The diffusion maps were sys-

tematically introduced in (Nadler et al., 2005; Coif-

man and Lafon, 2006). Although further papers ap-

pear, e.g. (Lipman et al., 2010), almost nothing is re-

ported about successful use of diffusion distance for

image segmentation. This can be regarded as surpris-

ing since, at a ﬁrst glance, the method should have the

properties that are useful. For measuring every dis-

tance, it examines many paths on the image surface.

In this paper, we show that the diffusion distance

121

Gaura J. and Sojka E..

Normalised Diffusion Cosine Similarity and Its Use for Image Segmentation.

DOI: 10.5220/0005220601210129

In Proceedings of the International Conference on Pattern Recognition Applications and Methods (ICPRAM-2015), pages 121-129

ISBN: 978-989-758-076-5

 2015 SCITEPRESS (Science and Technology Publications, Lda.)

need not be beneﬁcial for measuring distances in im-

age segmentation. The reason is that the inﬂuence

of different sizes of image segments may overshadow

the inﬂuence of the edges between them (i.e. the dif-

ferences in brightness or colour). This ﬁnding extends

the observations of some other authors that appeared

recently (von Luxburg et al., 2014). We introduce a

new measure called normalised diffusion cosine simi-

larity in which the mentioned problem is signiﬁcantly

reduced. The computational technique (as well as the

time complexity) remains similar as is usually pre-

sented for the diffusion distance, i.e. it is based on

the spectral decomposition of the Laplacian matrix.

The paper is organised as follows. In the following

section, we recall the needed theoretical background.

In Section 3, the problems of diffusion distance are

explained. The new similarity is introduced in Section

4. Section 5 is devoted to the experimental results.

The concluding remarks are given in Section 6.

2 DIFFUSION DISTANCE AND

CLUSTERING

The diffusion-based methods are usually formulated

by making use of the diffusion equation

∂ f (t,x)

∂t

= div(g( f (t,x), x)∇ f (t,x)) , (1)

where f (t,x) is a potential function (e.g. concentra-

tion, temperature, charge) evolving in time; g(·) is a

diffusion coefﬁcient (generally, it is a function). In

some applications, the coefﬁcient does not depend on

f (t, x). If g(·) reduces to a constant G, the right-hand

side of Eq. (1) reduces to G∇

f (t, x). In our con-

text, f (t,x) has the meaning of evolving image bright-

ness or colour. The process of evolving starts at t = 0;

f (0, x) is a given input image.

In the discrete case, the problem is formulated in a

graph (Sharma et al., 2011). The diffusion properties

are represented by edge weights that can be under-

stood as proximity between the neighbouring nodes

connected by the corresponding edge. The weights

may again be considered evolving in time or constant.

In this paper, we follow the latter option. The diffu-

sion equation can now be written in the form of

∂

f (t)

∂t

= L

f (t), (2)

where L is the Laplacian matrix containing the

weights of edges;

f (t) is a vector whose entries cor-

respond to the potential in the particular graph nodes,

i.e.

f (t) = ( f

(t),.. ., f

(t))

(we suppose the graph

with n nodes). The weight, denoted by w

i, j

, of the

edge connecting the nodes i and j is often considered

according to the formula

i, j

= e

−

i, j

2σ

, (3)

where c

i, j

denotes the grey-scale or colour contrast

between the nodes.

The solution of Eq. (2) can be found in the form

of (Sharma et al., 2011)

f (t) = H(t)

f (0) , (4)

where H(t) is a diffusion matrix. The entry h

(p,q) of

H(t) expresses the amount of a substance that is trans-

ported from the q-th node into the p-th node (or vice

versa since h

(p,q) = h

(q, p)) during the time inter-

val [0,t]. It can be shown that the following formula

for H(t) ensures that Eq. (2) is satisﬁed

H(t) =

∑

k=1

−λ

, (5)

where λ

and~u

, respectively, stand for the k-th eigen-

value and the k-th eigenvector of L. Let u

i,k

be the i-th

entry of the k-th eigenvector. For each graph vertex,

the vector of new coordinates can be introduced

(t) =



−λ

i,1

−λ

i,2

,. . . ,e

−λ

i,n



. (6)

If the coordinates are assigned in this way, we call it

diffusion map (Coifman and Lafon, 2006; Lafon and

Lee, 2006). This vector can be used for clustering the

vertices, which will be discussed later. By making use

of this vector, the entries of the diffusion matrix can

be expressed as the following dot product

(p,q) = h~x

(t),~x

(t)i. (7)

The square of diffusion distance is deﬁned as a

sum of the squared differences of the concentrations

caused by putting the unit concentration into the p-

th node and into the q-th node, respectively, which

corresponds to the formula

(p,q) =

∑

i=1

(i, p) −h

(i,q)]

. (8)

After some effort, the following formula can be

deduced from Eq. (8)

(p,q) = h

(p, p) −2h

(p,q) + h

(q,q)

= k~x

(t) −~x

(t)k

, (9)

which shows that introducing the coordinates accord-

ing to Eq. (6) may be seen as creating a diffusion map,

which is a map created in a similar sense as in (Tenen-

baum et al., 2000), where the idea was presented that

measuring the distance along the data manifold in

ICPRAM2015-InternationalConferenceonPatternRecognitionApplicationsandMethods

122

some space can be done by transforming the prob-

lem into a new space in such a way that the Euclidean

distance in the new space is equal to the distance mea-

sured on the data manifold in the original space.

Diffusion clustering is based on the idea to use the

coordinates introduced in Eq. (6) for clustering the

graph nodes, i.e. the image pixels (Nadler et al., 2005;

Lafon and Lee, 2006; Huang et al., 2011). The time

t can be used to set the level of details that is desired.

Often, the k-means clustering method is mentioned

in this context (Lafon and Lee, 2006; Huang et al.,

2011). It is believed that much less than n coordinates

are needed in practice.

3 THE PROBLEMS OF

DIFFUSION DISTANCE

In this section, we show that the diffusion distance

has the properties that make it difﬁcult to use it for

image segmentation. We show that the value of diffu-

sion distance between two image points does not nec-

essarily give a good clue whether or not they belong

to one image segment. We note that a certain criti-

cism in a similar sense has already been published for

the commute-time distance. In (von Luxburg et al.,

2014), the authors came to the conclusion that the

commute-time distance in graph does not reﬂect its

structure correctly if the graph is large. We continue

in this direction and show some further problems that

are relevant for image segmentation. We also show

that the problems appear not only for the commute-

time (resistance) distance, but also for the diffusion

distance, i.e. they cannot be avoided by a certain suit-

able choice of time.

Consider two points, denoted by p, q, in image.

We study two situations (Fig. 1): (i) Both the points

are placed in an image containing one rectangular

area with a constant brightness; the size of image is

w ×h pixels. (ii) The size of image is w ×h pixels

again, but the image area is now split by the vertical

line into two halves (areas); the brightness is constant

inside each area; the difference of brightness between

the areas is equal to 1; each of the points is placed

in one area. The Euclidean distance between p and q

measured in the xy plane is denoted by a (Fig. 1). We

traditionally call these situations as ”without edge”

and ”with edge”, respectively. Clearly, from the point

of view of image segmentation, these two situations

are substantially different. In the second case, we ex-

pect two image segments and a big distance between

p and q. In the ﬁrst case, only one image segment and

a small distance between p and q are expected.

A simple theoretical consideration might be useful

w/2

Figure 1: Two points (p,q) placed into an image containing

a single area (left image) or two areas (right image).

30x11 30x21 30x31 30x41 30x51

Area Size [px]

0.04

0.06

0.08

0.10

0.12

0.14

0.16

0.18

0.20

Diffusion Distance, t= 25

With Edge σ=0.4

With Edge σ=0.5

With Edge σ=0.6

With Edge σ=0.7

Without Edge σ=0.5

30x11 30x21 30x31 30x41 30x51

Area Size [px]

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

Diffusion Distance, t= 100

With Edge σ=0.4

With Edge σ=0.5

With Edge σ=0.6

With Edge σ=0.7

Without Edge σ=0.5

Figure 2: The dependence of diffusion distance on the

length of the edge between the areas: The distance (vertical

axis) is computed for the problem from Fig. 1 with/without

the edge, for a = 15, and for various values of t, σ, and for

the increasing value of h (the length of the edge between the

areas); the width of the areas remains constant (the value of

w). It can be seen that for one value of t and σ, the value of

distance depends on h.

for obtaining the ﬁrst intuitive overview. We compute

the distance d

(p,q) by making use of the formula

from Eq. (8) for both mentioned cases. If we consider

all possible sizes of image (from small to inﬁnitely

big) and all possible values of time (0 ≤ t < ∞), we

can easily see that the values of distance vary between

0 and

√

2 in both cases. (We note that the value of

√

is the distance between every two distinct points for

t = 0.) It follows that it is threatening that from the

value of diffusion distance itself, it will not be clear

whether it was obtained for the case (i) or (ii).

For a more detailed insight, we present the com-

putational simulation of the problem (Fig. 1). Var-

ious image sizes, values of time, and various values

of σ (Eq. (3)) are considered. The results show that

the diffusion distance presented in Figs. 2, 3, and 4

between p and q depends on the length of the edge

between the areas (Fig. 2), on the size of areas (Fig.

3), and on the distance of points in the xy plane (Fig.

4). Special attention should be paid to the fact that,

for some area sizes, it may happen that the diffusion

distance between the points lying in one area (case

(i)) is greater than in the case if the points lie in two

areas (case (ii)) . In Fig. 2, for example, we can

see that for t = 100 and σ = 0.5, the distance for

(w = 30, h = 11) in the case (i) is greater than the dis-

tance for (w = 30, h = 31) in the case (ii). As can be

seen, the problem increases with the increasing value

of σ. We note that the value of σ must be big enough

with respect to the noise intensity that is expected.

In image segmentation, the neighbouring seg-

ments may be of different sizes, which has not been

NormalisedDiffusionCosineSimilarityandItsUseforImageSegmentation

123

26x11 36x21 46x31 56x41 66x51

Area Size [px]

0.00

0.05

0.10

0.15

0.20

0.25

Diffusion Distance, t= 25

With Edge σ=0.4

With Edge σ=0.5

With Edge σ=0.6

With Edge σ=0.7

Without Edge σ=0.5

26x11 36x21 46x31 56x41 66x51

Area Size [px]

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

Diffusion Distance, t= 100

With Edge σ=0.4

With Edge σ=0.5

With Edge σ=0.6

With Edge σ=0.7

Without Edge σ=0.5

Figure 3: The dependence of diffusion distance on the area

size: The distance (vertical axis) is computed for the prob-

lem from Fig. 1 with/without the edge, for a = 15, and for

various values of t, σ, and for the increasing length of the

edge between the areas and for the increasing width of the

areas (both w and h are changing in this case). The value of

distance depends on the size.

20 25

a [px]

0.00

0.01

0.02

0.03

0.04

0.05

0.06

Diffusion Distance, t= 25

With Edge σ=0.4

With Edge σ=0.5

With Edge σ=0.6

With Edge σ=0.7

Without Edge σ=0.5

20 25

a [px]

0.000

0.005

0.010

0.015

0.020

0.025

0.030

0.035

0.040

Diffusion Distance, t= 100

With Edge σ=0.4

With Edge σ=0.5

With Edge σ=0.6

With Edge σ=0.7

Without Edge σ=0.5

Figure 4: The dependence of diffusion distance on the dis-

tance in the xy plane: The distance (vertical axis) is com-

puted for the problem from Fig. 1 with/without the edge,

for a constant image size (w = 50, h = 51), for various val-

ues of t, σ, and for the changing distance in the xy plane (the

value of a in pixels that is shown on the horizontal axis).

The value of diffusion distance depends on the value of a.

taken into account in the above mentioned simulation

(Fig. 1). Therefore, we created another set of test

cases to show that the diffusion distance depends on

the difference in size and on the mutual position of

the segments in which the points are placed. The set

is depicted in Fig. 5. We measure the diffusion dis-

tance between the points p and q lying in the areas

of various shapes. Two cases are considered for each

shape: (i) the points are placed in a single area, (ii)

the big area is split into two areas by inserting the

vertical splitting line (dashed line in Fig. 5); the dif-

ference of brightness between both areas is equal to 1.

The distance between p, q in the xy plane was a = 19

in all cases. Naturally, we would expect that the dis-

tances measured in the cases with two areas (with the

edge) will always be greater than the distances for the

cases with only one area. We could also hope that

the distances for all test cases with only one area will

remain more or less constant (similarly, for the test

cases with two areas). The computational simulation

showed that this is not always true. The resulting dis-

tances for each case are shown in Fig. 6. It can be seen

that the classical diffusion distance does not provide

the ordering in the sense that the distances measured

between the points lying in one image segment should

always be less than the distances measured between

the points lying in two different segments. It follows

p q

50 50

a=19

50 15

a=19

50 20

a=19

50 20

a=19

50 20

50a=19

50 20

a=19

50 20

a=19

50 20

a=19

Figure 5: Various conﬁgurations of image segments used

for testing the suitability of distance measuring methods.

The conﬁgurations presented here are referred to as case 1 -

8 in text.

that the value of distance does not give the informa-

tion that is needed for segmentation if we do not have

any apriori knowledge about the size of segments or

if the sizes may vary.

We also use this test set for evaluating the qual-

ity of measuring the distance and for comparing the

classical diffusion distance with the new measure that

is introduced in the next section. We introduce a dis-

criminative capability of distance measuring method,

which is deﬁned by the following formula

D(t) =

|µ

(t) −µ

(t)|

(t) + σ

(t)

, (10)

where µ

(t) and σ

(t) stand for the mean value and

variance, respectively, of the distance for the cases

without edge. Similarly, µ

(t), σ

(t) stand for the cor-

responding values for the cases with the edge. (We

note that the mentioned values are all dependent on

time.) The higher is the value of D(t), the better is

the method. The formula in Eq. (10) simply reﬂects

the fact that we would welcome if the distance mea-

sured for any case without edge were less than the dis-

tance measured for any case with edge. The compu-

tational simulation gave D (100) = 0.74 for the case

without noise (Fig. 6). The discriminative capabil-

ity shows the unsatisfactory behaviour of the classical

diffusion distance again. As can be seen, the intervals

corresponding to the cases with and without the edge

overlap each other, which says again that the diffusion

distance cannot distinguish between both cases.

For a certain visual illustration of the behaviour of

diffusion distance, we ﬁnally present an example in

Fig. 7. For a synthetic image with noise, the diffu-

ICPRAM2015-InternationalConferenceonPatternRecognitionApplicationsandMethods

124

0.015 0.020 0.025 0.030 0.035 0.040

Distance

Diffusion Distance, t= 100

With Edge

Without Edge

Figure 6: Diffusion distance for various test cases from

Fig. 5 without noise, for the situation with and without the

edge, respectively, for t = 100, and σ = 0.5. The value of

discriminative capability is D(100) = 0.74 (µ

= 0.0294,

= 0.31 ×10

−4

, µ

= 0.02398, σ

= 0.22 ×10

−4

Figure 7: For an image with noise (left image ), the diffu-

sion distances from the image centerpoint to all other pixels

are computed and depicted by brightness (right image); the

dark areas correspond to a small distance from the center-

point. Notice the highly changing distances inside the up-

per bright object area, especially, in the left part having the

shape of vertical strip. The distance step expected along the

boundary between the upper and lower object parts can be

better seen in the central area of the edge between the parts;

in the left area, the distance difference is less convincing.

sion distances from the image centerpoint to all other

pixels are computed. The parameters were set as fol-

lows. The ideal values of brightness were 0.0, 0.6, and

1.0, respectively. Gaussian noise with σ

= 0.075 was

added. The value of sigma from Eq. (3) was σ = 0.15.

The diffusion distance was computed for t = 250.

4 NORMALISED DIFFUSION

COSINE SIMILARITY

In this section, we propose an improvement that re-

duces the problems with the diffusion distance that

have been mentioned in the previous section. We

ﬁrstly deﬁne our approach. Then we explain why it

should be better than the diffusion distance.

For a given image, we introduce the diffusion co-

sine similarity between p, q at the time t as follows

(p,q) =

(p,q)

(p, p)h

(q,q)

. (11)

By substitunig from Eq. (7), it can be easily seen that

(p,q) =

h~x

(t),~x

(t)i

h~x

(t),~x

(t)i

h~x

(t),~x

(t)i

h~x

(t),~x

(t)i

k~x

(t)kk~x

(t)k

. (12)

The value of s

(p,q) is equal to the value of the co-

sine of the angle between the vectors ~x

(t) and ~x

(t).

Since the value of h

(p,q) is always non-negative, the

value of s

(p,q) varies in the range of [0, 1].

To obtain a normalised cosine similarity, we eval-

uate the diffusion cosine similarity two times. Firstly,

for a given image. Secondly, for the corresponding

reference image, which is the image of the same size

as is the given input image, but with a constant bright-

ness everywhere. The normalised cosine similarity is

now the ratio between the similarity in the given im-

age and the similarity in the reference image. We note

that this ratio is only computed if the similarity in the

reference image is not close to zero. Otherwise, the

normalised cosine similarity is set to zero too, which

means that it cannot be computed reliably. Since the

diffusion cosine similarity in the given image is not

greater than the similarity in the reference image, the

maximal possible value of the normalised diffusion

cosine similarity is 1.

We should now explain why the normalised diffu-

sion cosine similarity is better than the diffusion dis-

tance. The reason is simple. The value of normalised

cosine similarity itself tells more clearly whether or

not two points are close one to another (i.e. belong

to one image segment). No other additional informa-

tion is needed. The value of 1.0 expresses the max-

imal possible concordance, decreasing values mean

increasing difference. We stress the following proper-

ties. The normalised cosine similarity is independent

on the length of the edge along which two areas touch

(see Fig. 8 and compare it with Fig. 2 for the dif-

fusion distance). The normalised cosine similarity is

much less dependent on the total size of area (see Fig.

9 and compare it with Fig. 3). The normalised co-

sine similarity is much less dependent on the distance

between the points in the xy plane (see Fig. 10 and

compare it with Fig. 4). Further results will be pre-

sented in the next section.

For completeness, it should be pointed out that the

idea of using the cosine similarity in a related area

is not completely new. In (Brand, 2005), the author

mentions the use of cosine similarity in the context of

maximizing satisfaction and proﬁt and in connection

with the commute-time distance. The author, how-

ever, does not present similar analysis (focused on the

use in the area of image segmentation) as we do in

this paper. Neither he uses the normalisation.

5 EXPERIMENTAL RESULTS

We start with the tests using the synthetic images.

Then the tests with the real-life images are also pre-

NormalisedDiffusionCosineSimilarityandItsUseforImageSegmentation

125

30x11 30x21 30x31 30x41 30x51

Area Size [px]

0.2

0.4

0.6

0.8

1.0

Normalised Diffusion Cosine Similarity, t= 25

With Edge σ=0.4

With Edge σ=0.5

With Edge σ=0.6

With Edge σ=0.7

Without Edge σ=0.5

30x11 30x21 30x31 30x41 30x51

Area Size [px]

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Normalised Diffusion Cosine Similarity, t= 100

With Edge σ=0.4

With Edge σ=0.5

With Edge σ=0.6

With Edge σ=0.7

Without Edge σ=0.5

Figure 8: The dependency of normalised diffusion cosine

similarity on the length of the edge between the areas: The

similarity (vertical axis) is computed for the problem from

Fig. 1 with/without the edge, for a = 15, and for various

values of t, σ, and for the increasing length of the edge be-

tween the areas (the value of h); the width of areas remains

constant. In contrast to the diffusion distance, the new sim-

ilarity does not depend on the edge length in this test envi-

ronment (compare with Fig. 2).

26x11 36x21 46x31 56x41 66x51

Area Size [px]

0.2

0.4

0.6

0.8

1.0

Normalised Diffusion Cosine Similarity, t= 25

With Edge σ=0.4

With Edge σ=0.5

With Edge σ=0.6

With Edge σ=0.7

Without Edge σ=0.5

26x11 36x21 46x31 56x41 66x51

Area Size [px]

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Normalised Diffusion Cosine Similarity, t= 100

With Edge σ=0.4

With Edge σ=0.5

With Edge σ=0.6

With Edge σ=0.7

Without Edge σ=0.5

Figure 9: The dependency of normalised diffusion cosine

similarity on the area size: The similarity (vertical axis)

is computed for the problem from Fig. 1 with/without the

edge, for a = 15, and for various values of t, σ, and for the

increasing length of the edge between the areas and for the

increasing width of the areas (w and h are changing). The

dependence of similarity on the area size is much smaller

than in the case of diffusion distance (Fig. 3).

sented. As a ﬁrst experiment, we evaluate the dis-

criminative capability introduced in Eq. (10) based

on computing the distance or similarity in various area

conﬁgurations (Fig. 5). For the diffusion distance, the

results have already been presented in Fig. 6. For the

new method, the values are stated in Fig. 11. Notice

that the intervals into which the values of normalised

diffusion cosine similarity fall for the cases with and

without the edge, respectively, do not overlap, which

makes the discriminative capability very good.

Naturally, the behaviour of every method is also

20 25

a [px]

0.2

0.4

0.6

0.8

1.0

Normalised Diffusion Cosine Similarity, t= 25

With Edge σ=0.4

With Edge σ=0.5

With Edge σ=0.6

With Edge σ=0.7

Without Edge σ=0.5

20 25

a [px]

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Normalised Diffusion Cosine Similarity, t= 100

With Edge σ=0.4

With Edge σ=0.5

With Edge σ=0.6

With Edge σ=0.7

Without Edge σ=0.5

Figure 10: The dependency of normalised diffusion cosine

similarity on the distance in the xy plane: The similarity

(vertical axis) is computed for the problem from Fig. 1

with/without the edge, for a constant image size (w = 50,

h = 51), for various values of t, σ, and for the increasing

distance in the xy plane (the value of a on the horizontal

axis). Due to the normalisation, the similarity depends on a

much less than the diffusion distance (Fig. 4).

0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00

Similarity

Normalised Diffusion Cosine Similarity, t= 100

With Edge

Without Edge

Figure 11: Normalised diffusion cosine similarity for the

test cases from Fig. 5 without noise, for the situation with

and without edge, respectively, for t = 100, and σ = 0.5.

The value of discriminative capability is D(100) = 26.9

(µ

= 0.669, σ

= 0.71 ×10

−4

, µ

= 0.995, σ

= 0.75 ×

−4

), which is a substantial improvement in comparison

to the value for the diffusion distance shown in Fig. 6.

0.60 0.65 0.70 0.75 0.80 0.85 0.90

Similarity

Normalised Diffusion Cosine Similarity, t= 100

With Edge

Without Edge

Figure 12: Normalised diffusion cosine similarity for the

same situation as in Fig. 11, but with the Gaussian noise

= 0.2 added to the test images. The resulting discrim-

inative capability is D (100) = 4.164 (µ

= 0.626, σ

0.234 ×10

−2

, µ

= 0.854, σ

= 0.651 ×10

−3

), which still

is better than the value for the diffusion distance without

noise.

important in the presence of noise. We carried out the

same test for the noisy images too. Gaussian noise

was added to all test images (Fig 5). For each test

image, 1000 samples were used in simulation (see

Fig. 12 for further details). Even with a relatively

big amount of noise, the results of the new method

were better (Fig. 12) then the results obtained for the

diffusion distance without noise.

As a further example, we also present the result

for the image from Fig. 7. The normalised diffusion

cosine similarity is depicted in Fig. 13. The similar-

ity is measured between the image center point and all

remaining pixels and is depicted as brightness. Since

big similarity corresponds to a small distance, we also

present an inverse image for more convenient compar-

ison with the result for diffusion distance (Fig. 7). Al-

though we do not present any quantitative evaluation

in this case, we believe that the result of the new sim-

ilarity may be regarded as visually better (Fig. 13).

In the rest of this section, we focus on the real-life

images and their seeded (interactive) segmentation

(Sinop and Grady, 2007). For this purpose, the sim-

ilarity (proximity) between the pixel and area should

be deﬁned. Let

∑

top

{collection} stand for the sum

of the biggest Q elements from a collection of real

numbers. The normalised diffusion cosine proximity,

denoted by ˜s

(p,A), between a point p and an area A

ICPRAM2015-InternationalConferenceonPatternRecognitionApplicationsandMethods

126

Figure 13: The value of the normalised diffusion cosine

similarity between the image center point and all remaining

pixels is computed under exactly the same conditions as in

Fig. 7 for the diffusion distance. In the left image, the simi-

larity is depicted by brightness (the bright places indicate a

high similarity). For more convenient comparison with the

result for the diffusion distance, also the inverse image is

presented (right image).

can be introduced by the formula

˜s

(p,A) =

∑

top

{˜s

(p,q)}

q∈A

. (13)

The formula simply reﬂects the fact that p and A may

be regarded as close if at least a certain number of

points exist in A that are close to p. The formula can

also be easily adapted for the diffusion distance (in-

stead of the Q points with the biggest similarities, Q

points with the smallest distances are considered).

We note that the segmentation algorithm itself is

not the direct focus of our work, i.e. we do not pro-

pose a new algorithm. Instead, by making use of a

certain algorithm, we demonstrate the properties of

the new similarity measure that might be useful in var-

ious known or future algorithms based on measuring

the distance or similarity. The algorithm used for test-

ing should make it possible to present the properties

of the new measure clearly; understanding the results

should not be made more difﬁcult due to the proper-

ties of algorithm. For this reason, we use the simple

one-step seeded segmentation.

The seeds of the objects and the background are

deﬁned manually. Once it is done, the distance to the

seeds is computed for all remaining image pixels. If

the distance of a pixel is lower to the object seed than

to the background seed, the pixel is marked as an ob-

ject pixel. Otherwise, it is marked as a background

pixel. The algorithm can be easily adapted for the use

of similarity instead of distance.

Several real-life images from the Berkeley Seg-

mentation Dataset (Martin et al., 2001) were used

(Fig. 14) and processed as follows. The conversion

to greyscale was carried out, which was followed by

the normalisation of intensity values into the interval

[0,1]. The normalised images were slightly ﬁltered by

making use of anisotropic diffusion ﬁltering; the ﬁl-

tered images used for further processing are shown in

Fig. 14 too. For all images, we used σ = 0.07 (Eq. 3),

t = 150, Q = 10, and 750 eigenvectors. The size of

images was 160 ×240 pixels. The suitable values of

t and Q were determined experimentally on the basis

of visual evaluation of the results. For the diffusion

distance as well as for the new similarity, the best re-

sults were obtained for 100 ≤ t ≤ 200, 4 ≤ Q ≤ 50.

The results of the segmentation using the new nor-

malised diffusion cosine similarity, the diffusion dis-

tance, and the cosine similarity mentioned in (Brand,

2005) are shown in Fig. 14. It can be seen that the

new similarity gives visually better results than the

remaining mentioned measures. Naturally, all the re-

sults could be improved by modifying the position of

the seeds. However, we did not do so since we wanted

to show the properties of the distance/similarity mea-

sures clearly.

Finally, we note that we did not aim at compar-

ing the above segmentation algorithms with all other

main state-of-the-art approaches. Instead, we used

them to show that the theoretical ﬁndings and expec-

tations presented before are correct and useful for the

practice (see the conclusion for the discussion about

our main goals and contributions).

6 CONCLUSIONS

Measuring the distances along the surface that is de-

ﬁned by the image function seems to be useful in

more complicated situations. The use of geodesic dis-

tance is often mentioned in this context, but its disad-

vantages are known. One would intuitively say that

the diffusion distance should have good properties

for the mentioned purpose. We showed (including

the computational simulations of the situations that

are important for segmentation) that the diffusion dis-

tance need not be useful since the presence of edges

may be overshadowed by the varying size of image

segments (and the size is not often known in advance).

We proposed a new measure called normalised diffu-

sion cosine similarity that suffers from these problems

to a much lesser extent. We have also demonstrated

that it can be used in image segmentation algorithms.

We believe that the geodesic distance and diffu-

sion distance (resistance or commute-time distance)

are two opposite approaches. While the geodesic dis-

tance only searches for the shortest path between the

points, the diffusion distance takes into account all

possible paths. The idea of simultaneously examining

more paths seems to be generally useful. The ques-

tion, however, remains how it should be exactly done.

The diffusion distance does not seem to be the best

solution. We believe that a certain gap exists in this

area and that the corresponding efﬁcient methods will

probably be developed in the future. We intended this

paper as a certain step in this direction rather than a

paper proposing a new segmentation method for ev-

NormalisedDiffusionCosineSimilarityandItsUseforImageSegmentation

127

Figure 14: One-step seeded segmentation: The source im-

ages (the ﬁrst row); the seeds for the objects and the back-

ground (the second row); the ﬁltered images that were used

for further processing (the third row); the normalised diffu-

sion cosine similarity of pixels to the object seeds, the bright

areas correspond to a high similarity (the fourth row); the

objects extracted by making use of the new similarity (the

ﬁfth row); the diffusion distance from the object seeds, the

dark areas correspond to a small distance (the sixth row); the

objects extracted by making use of the diffusion distance

(the seventh row); the objects extracted by making use of

the cosine similarity mentioned in (Brand, 2005) (the last

row).

eryday use. That is why we did not aim at comparing

the algorithm mentioned in the previous section with

various other state-of-the-art algorithms. The goal

was to show that some alternatives exist in the area of

the diffusion-like distances that may have a chance to

be developed into useful and practical tools. We hope

that introducing the new normalised diffusion cosine

similarity can be regarded as a step in this direction.

ACKNOWLEDGEMENTS

This work was partially supported by the grant of SGS

No. SP2014/170, V

SB - Technical University of Os-

trava, Czech Republic.

REFERENCES

Babi

c, D., Klein, D. J., Lukovits, I., Nikoli, S., and Trina-

jsti, N. (2002). Resistance-distance matrix: A com-

putational algorithm and its application. International

Journal of Quantum Chemistry, 90(1):166–176.

Brand, M. (2005). A random walks perspective on maxi-

mizing satisfaction and proﬁt. In Proceedings of the

2005 SIAM International Conference on Data Mining,

pages 12–19. SIAM.

Coifman, R. R. and Lafon, S. (2006). Diffusion

maps. Applied and Computational Harmonic Anal-

ysis, 21(1):5–30.

Eppstein, D. (1998). Finding the k shortest paths. SIAM J.

Computing, 28(2):652–673.

Fouss, F., Pirotte, A., Renders, J.-M., and Saerens, M.

(2007). Random-walk computation of similarities be-

tween nodes of a graph with application to collabo-

rative recommendation. IEEE Trans. on Knowl. and

Data Eng., 19(3):355–369.

Grady, L. (2006). Random walks for image segmentation.

IEEE Trans. Pattern Anal. Mach. Intell., 28(11):1768–

1783.

Huang, H., Yoo, S., Qin, H., and Yu, D. (2011). A robust

clustering algorithm based on aggregated heat kernel

mapping. In Cook, D. J., Pei, J., 0010, W. W., Zaane,

O. R., and Wu, X., editors, ICDM, pages 270–279.

IEEE.

Klein, D. J. and Randi

c, M. (1993). Resistance distance.

Mathematical Chemistry, 12(1):81–95.

Lafon, S. and Lee, A. B. (2006). Diffusion maps and coarse-

graining: A uniﬁed framework for dimensionality re-

duction, graph partitioning and data set parameteriza-

tion. IEEE Transactions on Pattern Analysis and Ma-

chine Intelligence, 28:1393–1403.

Lipman, Y., Rustamov, R. M., and Funkhouser, T. A.

(2010). Biharmonic distance. ACM Trans. Graph.,

29(3):27:1–27:11.

Martin, D., Fowlkes, C., Tal, D., and Malik, J. (2001). A

database of human segmented natural images and its

ICPRAM2015-InternationalConferenceonPatternRecognitionApplicationsandMethods

128

application to evaluating segmentation algorithms and

measuring ecological statistics. In in Proc. 8th Intl

Conf. Computer Vision, pages 416–423.

Nadler, B., Lafon, S., Coifman, R. R., and Kevrekidis, I. G.

(2005). Diffusion maps, spectral clustering and eigen-

functions of fokker-planck operators. In in Advances

in Neural Information Processing Systems 18, pages

955–962. MIT Press.

Papadimitriou, C. H. (1985). An algorithm for shortest-path

motion in three dimensions. Information Processing

Letters, 20(5):259 – 263.

Qiu, H. and Hancock, E. R. (2007). Clustering and embed-

ding using commute times. IEEE Trans. Pattern Anal.

Mach. Intell., 29(11):1873–1890.

Sharma, A., Horaud, R., Cech, J., and Boyer, E. (2011).

Topologically-Robust 3D Shape Matching Based on

Diffusion Geometry and Seed Growing. In CVPR ’11

- IEEE Conference on Computer Vision and Pattern

Recognition, pages 2481–2488, Colorado Springs,

United States. IEEE Computer Society Press.

Sinop, A. K. and Grady, L. (2007). A seeded image seg-

mentation framework unifying graph cuts and random

walker which yields A new algorithm. In IEEE 11th

International Conference on Computer Vision, ICCV

2007, Rio de Janeiro, Brazil, October 14-20, 2007,

pages 1–8. IEEE.

Surazhsky, V., Surazhsky, T., Kirsanov, D., Gortler, S. J.,

and Hoppe, H. (2005). Fast exact and approximate

geodesics on meshes. ACM Trans. Graph., 24(3):553–

560.

Tenenbaum, J. B., de Silva, V., and Langford, J. C. (2000).

A global geometric framework for nonlinear dimen-

sionality reduction. Science, 290(5500):2319.

von Luxburg, U., Radl, A., and Hein, M. (2014). Hit-

ting and commute times in large random neighbor-

hood graphs. Journal of Machine Learning Research,

15:1751–1798.

Yen, L., Fouss, F., Decaestecker, C., Francq, P., and

Saerens, M. (2007). Graph nodes clustering based

on the commute-time kernel. In Proceedings of the

11th Paciﬁc-Asia Conference on Advances in Knowl-

edge Discovery and Data Mining, PAKDD’07, pages

1037–1045, Berlin, Heidelberg. Springer-Verlag.

NormalisedDiffusionCosineSimilarityandItsUseforImageSegmentation

129