A Dense Medial Descriptor for Image Analysis

Matthew van der Zwan

, Yuri Meiburg

and Alexandru Telea

1,2

Scientiﬁc Visualization and Computer Graphics, University of Groningen, Nijenborgh 9, Groningen, The Netherlands

University of Medicine and Pharmacy Carol Davila, Bucharest, Romania

Keywords:

Medial Axes, Image Segmentation, Shape Analysis.

Abstract:

We present dense medial descriptors, a new technique which generalizes the well-known medial axes to en-

code and manipulate whole 2D grayvalue images, rather than binary shapes. To compute our descriptors, we

ﬁrst reduce an image to a set of threshold-sets in luminance space. Next, we compute a simpliﬁed represen-

tation of each threshold-set using a noise-resistant medial axis transform. Finally, we use these medial axis

transforms to perform a range of operations on the input image, from perfect reconstruction to segmentation,

simpliﬁcation, and artistic effects. Our pipeline can robustly handle any 2D grayscale image, is easy to use,

and allows an efﬁcient CPU or GPU-based implementation. We demonstrate our dense medial descriptors

with several image-processing applications.

1 INTRODUCTION

Skeletons, or medial axes, are well-known 2D shape

descriptors used in many applications in shape analy-

sis and classiﬁcation, shape recognition, shape match-

ing, topological analysis, image registration, and path

planning. Medial axis structures, augmented with dis-

tance information from the medial axis to its corre-

sponding shape, generate the so-called Medial Axis

Transform (MAT), which is a true dual for the input

shape. In other words, the MAT can be used for the

exact reconstruction and also for the simpliﬁcation of

shapes at user-speciﬁed levels of detail. 2D skeletons

and MATs have been extended to three dimensions

to create surface and curve skeletons and their corre-

sponding medial surface transform (MST), which al-

low processing of 3D shapes analogously to their 2D

counterparts. In this paper, we focus on 2D skeletons

and MATs.

However powerful, skeletons and MATs have the

crucial limitation that they require as input a digital

shape, i.e. a closed boundary which divides the em-

bedding space into inside and outside regions. This

limits their direct application to datasets containing

pre-segmented shapes. However, in many applica-

tions, one has continuous ﬁelds as inputs, such as

grayscale or color 2D images or 3D scalar volumes

such as CT or MRI scans. Although pre-segmenting

such datasets into binary shapes and further using

skeletons to analyze such shapes is possible, this is a

non-trivial process which requires a priori knowledge

on the nature and position of the shapes of interest.

Moreover, since skeletons require binary shapes, they

cannot directly handle fuzzy shapes whose bound-

aries are deﬁned by a range of scalar values. Elim-

inating this limitation, i.e. enabling skeletal and MST

descriptors to directly handle grayscale images, can

open new ways for using the analytic power of such

descriptors for image segmentation, editing, and clas-

siﬁcation applications.

In this paper, we present a framework for repre-

senting and manipulating 2D images using a new de-

scriptor: Dense medial axes. Our framework operates

in three steps. First, we decompose a grayscale im-

age into several so-called threshold sets T

, i.e. pixels

whose values exceed a given set of scalar values v

Next, we compute a simpliﬁed medial axis transform

of each threshold set T

, using a suitable simpliﬁ-

cation value τ

. Finally, we use the medial transforms

to perform several types of image processing op-

erations on the initial image, ranging from perfect im-

age reconstruction to image simpliﬁcation, segmenta-

tion, editing, and artistic painting effects. The set of

threshold-values v

and medial simpliﬁcation values

effectively create a two-dimensional scale-space in

which we encode the image luminance variations and

shapes present in the image, respectively. We propose

an efﬁcient GPU-based implementation of our dense

medial descriptors, which can compute these and the

associated image processing operations in real-time

285

Zwan M., Meiburg Y. and Telea A..

A Dense Medial Descriptor for Image Analysis.

DOI: 10.5220/0004279202850293

In Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP-2013), pages 285-293

ISBN: 978-989-8565-47-1

 2013 SCITEPRESS (Science and Technology Publications, Lda.)

on mega-pixel images. We demonstrate our frame-

work with several image processing applications.

The structure of this paper is as follows. Sec-

tion 2 overviews related work on 2D medial descrip-

tors. Section 3 details the three steps of our frame-

work: threshold-set computation (Sec. 3.1), medial

transform computation (Sec. 3.2), and image recon-

struction (Sec. 3.3). Section 4 presents ways in which

we can parameterize the above-described steps of our

pipeline to achieve several types of image processing

operations, and illustrates these with examples. Sec-

tion 5 discusses our framework. Section 6 concludes

the paper.

2 RELATED WORK

Given a two-dimensional binary shape Ω ⊂ R

with boundary ∂Ω, we ﬁrst deﬁne its distance trans-

form DT

∂Ω

: Ω → R

∂Ω

(x ∈ Ω) = min

y∈∂Ω

kx −yk (1)

The skeleton, or medial axis, of Ω is next deﬁned as

S(∂Ω) = { x ∈ Ω|∃f

∈ ∂Ω, f

6= f

, (2)

kx −f

k = kx −f

k = DT (x)},

where f

and f

are the contact points with ∂Ω of

the maximally inscribed disc in Ω centered at x, also

called feature transform (FT) points (Strzodka and

Telea, 2004) or spoke vectors (Stolpner et al., 2009),

where the feature transform is deﬁned as

∂Ω

(x ∈ Ω) = argmin

y∈∂Ω

kx −yk. (3)

The skeleton, together with the distance transform,

form the Medial Axis Transform (MAT), which can

be used to exactly reconstruct the input shape Ω (Sid-

diqi and Pizer, 2009; Telea and van Wijk, 2002).

Two-dimensional MAT techniques can be clas-

siﬁed into three groups. Geometric methods use a

polygonized version of ∂Ω to compute its Voronoi

diagram and the skeleton as a subset thereof (Og-

niewicz and Kubler, 1995). Thinning methods it-

eratively remove ∂Ω pixels while preserving con-

nectivity (Palagyi and Kuba, 1999). Pixel re-

moval in distance-to-boundary order enforces cen-

teredness (Pudney, 1998). Such methods are sim-

pler than geometric methods and they also directly

use a pixel-based image representation. Distance

ﬁeld methods ﬁnd the MAT along singularities of

∂Ω

(Rumpf and Telea, 2002; Telea and van Wijk,

2002; Wan et al., 2001; Siddiqi et al., 2002; Hes-

selink and Roerdink, 2008), and can be efﬁciently

implemented on GPUs (Strzodka and Telea, 2004;

Sud et al., 2005; van Dortmont et al., 2006; Cao

et al., 2010). General-ﬁeld methods use ﬁelds

smoother (with less singularities) than distance trans-

forms (Ahuja and Chuang, 1997; Cornea et al., 2005;

Hassouna and Farag, 2009), thus are more robust for

noisy shapes. Foskey et al. compute the θ-SMA, an

approximate simpliﬁed medial axis, using the angle

between feature vectors (Foskey et al., 2003). The

θ-SMA can get disconnected along the so-called lig-

ature branches. An accuracy comparison of different

ﬁeld methods for 2D distance and feature transforms

is given in (Reniers and Telea, 2007).

Clear, or regularized, skeletons are extracted from

noisy shapes by thresholding importance measures

to prune skeleton pixels caused by small shape de-

tails (Shaked and Bruckstein, 1998). One of the sim-

plest, and most effective, such measures is the col-

lapsed boundary length metric ρ : S → R

, which

ranks skeleton pixels x by the boundary length, along

∂Ω, between their feature points (Ogniewicz and

Kubler, 1995; Costa and Cesar, 2000; Telea and van

Wijk, 2002), i.e.

ρ(x ∈ S) = k∂Ω(f

)k, (4)

where f

and f

are the feature points of the skele-

ton point x. The metric ρ increases monotonically

on skeletons of genus 0 shapes from their periphery

to their center, so thresholding it is guaranteed to di-

rectly yield a connected skeleton (Costa and Cesar,

2000; Telea and van Wijk, 2002).

However effective for shape registration (Sundar

et al., 2003), matching (Bai and Latecki, 2008; van

Eede et al., 2006), classiﬁcation (Costa and Cesar,

2000), and recognition (Macrini et al., 2008), skele-

tons can only be computed for binary shapes Ω.

Given a grayscale image, Ω can be computed using

various segmentation methods (Kass et al., 1988; Co-

maniciu and Meer, 2002; Shi and Malik, 2000; Li

et al., 2010). Simpler, but more generic, segmentation

methods include classical level-sets and threshold-

sets (Sethian, 2002), which are nested structures that

capture all image points whose grayvalue is equal to,

or respectively larger than, a given value. However,

segmentation poses two problems. First, we need to

know, in advance, which shapes we are searching for

in a given image. Secondly, this approach only de-

livers skeletons of the image subset captured by the

segmentation. In other words, we do not have a me-

dial descriptor for the entire image, e.g., we cannot

talk about the skeleton of a fuzzy shape.

VISAPP2013-InternationalConferenceonComputerVisionTheoryandApplications

286

reconstructed image I

input grayscale image I threshold sets T

255

medial axis transforms

,DT

)

reconstructed layers T

thresholding

salience metric σ

relevance metric r

island removal metric ε

skeletonization

reconstruction

interpolation

~ ~

Figure 1: Dense medial descriptor computation pipeline.

3 PROPOSED FRAMEWORK

We propose to join the grayvalue information present

in threshold set descriptors with the shape information

delivered by medial descriptors in a single new dense

medial descriptor (DMD), as follows (see also Fig. 1).

First, we reduce an image to a set of threshold-sets

(Sec. 3.1). Secondly, we compute a simpliﬁed MAT

for each threshold-set (Sec. 3.2). Finally, we use the

threshold-set grayvalues and computed MATs to gen-

erate our DMD (Sec. 3.3), which we can use next for

various image processing operations (Sec. 4).

3.1 Threshold Set Computation

Given a grayscale image I : R

→R

and a grayvalue

v ∈ R

, we deﬁne the threshold-set

T (v) = {x ∈ R

|I(x) ≥ v}. (5)

By deﬁnition, threshold-sets for increasing grayscale

values are nested 2D structures. For a digital n-bit-

per-pixel image, we have thus 2

threshold sets, or

layers, T

= T (i), 0 ≤ i < 2

. Here and further, we

use a value of n = 8. Further, from each layer, we re-

move foreground and background islands with an area

ε smaller than 3% of the layer’s area |T

|. This further

simpliﬁes our medial descriptors used to encode the

layers (Sec. 3.2).

The density of layer borders is proportional with

the probability of having an edge in the image: Low-

density areas indicate that consecutive layers are far

apart, thus the image is relatively ﬂat. High-density

areas indicate close consecutive layers, i.e. quickly

varying grayvalues. We will further use this observa-

tion in the reconstruction pass (Sec. 3.3). Threshold-

sets are nested, i.e. ∀i < j,T

⊂ T

. This observation

is important, as it implies that if we remove a pixel

from T

, this pixel will get the (darker) grayvalue of

its closest parent layer.

3.2 Simpliﬁed Medial Axis

A threshold-set T

can be seen as an arbitrary ‘slice’

in the image grayvalue space. Geometrically speak-

ing, T

can have any shape, e.g. a collection of noisy

disconnected components. We propose here to use

MATs to capture the essence of the shape of a T

and

remove its spurious details.

For this, we compute the distance transform DT

DT (T

) and simpliﬁed skeleton S

= S(T

), follow-

ing Eqns. 1 and 3 respectively. We compute DT

us-

ing the GPU-based exact Euclidean method of Cao et

al. (Cao et al., 2010). This method also computes the

feature transform of a shape (Eqn. 3). Hence, it is

trivial to modify this method to determine, for each

point x ∈ T

, which are its two feature points, and

next, following a simple arc-length parameterization

of ∂T

, analogous to (Telea and van Wijk, 2002), the

collapsed boundary length at x.

As noted earlier, T

can contain a large amount of

geometrical and topological noise. Once we have S

and DT

, we can remove these easily. As skeleton im-

portance, we use the so-called salience metric

σ : S → R

= ρ/DT, (6)

equal to the collapsed boundary length ρ (Eqn. 4)

divided by the distance transform (Eqn. 1) (Telea,

2012). This metric has the desirable property that

it removes small-scale boundary noise, but it keeps

salient features, such as cusps or dents. Figure 2 illus-

trates this: The input image is an 8-bit noisy human

brain CT (a), from which we select the threshold-set

corresponding to the level 132 (b). Next, we remove

small foreground and background islands (as men-

tioned in Sec. 3.1), compute the simpliﬁed MAT of

this threshold-set, regularized by the saliency metric

σ, and reconstruct this set from this skeleton. The

result (c) captures the main shape described by the

threshold-set (b), ignoring small-scale details such as

specks, holes, and boundary noise.

ADenseMedialDescriptorforImageAnalysis

287

a) b) c)

Figure 2: Salience metric for skeleton simpliﬁcation: (a)

grayvalue image; (b) threshold set; (c) threshold set recon-

structed from saliency skeleton.

3.3 Image Reconstruction

So far, we have reduced our input grayvalue image

I to a set of threshold-sets T

, each having an MAT

. We can reconstruct a simpliﬁed version

of each

(in increasing order of i, i.e. from dark to light

layers) from its corresponding MAT (S

,DT

) by ei-

ther executing a Fast-Marching-Method (FMM) ‘in-

ﬂation’ of S

outwards until each point x ∈ S

reaches

the distance DT

(x) (Sethian, 2002; Telea and van

Wijk, 2002), or alternatively drawing discs centered

at x with radius DT

(x), and coloring each layer T

with the grayvalue i. The ﬁrst approach (FMM) works

best on a CPU, while the second scales better on the

GPU. However, this method is basically a nearest-

neighbor (zero-order interpolation) reconstruction of

I, so it shows intensity-banding effects around the

boundaries ∂T

of the layers T

. A better way is as

follows: For each two consecutive layers i and i + 1,

whrere i ∈ [0, 255], we reconstruct the layers T

and

i+1

as above (either on the CPU or GPU), and next,

we set the grayvalue v(x) of each pixel x located be-

tween the boundaries ∂T

and ∂T

i+1

v(x) =



min



i+1



+ max



1 −

i+1



i+1



(7)

This achieves a smooth distance-based interpolation

between the boundaries ∂T

(with grayvalue v

) and

∂T

i+1

(with grayvalue v

i+1

) in the Hausdorff sense.

Applying Eqn. 7 at all pixels x of the image space

yields our ﬁnal reconstructed image

I. Examples of

reconstruction are discussed next in Sec. 4.1.

4 APPLICATIONS

We next present several applications of our dense me-

dial descriptor.

4.1 Reconstruction

Overall, the interpretation of our reconstruction tech-

nique is simple: Given several layers T

with cor-

responding MATs (S

,DT

), we can reconstruct a

smooth version of the original image I from which

have been produced. Thus, I is integrally encoded

in our dense MAT (S

,DT

). For instance, if we en-

code all layers T

present in the original image, and

do not simplify at all the resulting MATs S

, the re-

construction presented in Sec. 3.3 is an exact copy of

the original image, by deﬁnition, i.e. since we encode

all luminance layers and since an unsimpliﬁed skele-

ton exactly preserves the shape of each layer. Thus,

our dense medial descriptor (DMD) can encode the

full input information, if desired. However, our DMD

can be used to simplify the input image I in several

ways, as follows.

First, we can only encode the relevant layers T

A layer is deemed relevant if its removal from the re-

construction (Sec. 3.3) causes a too large difference

between the original image I and the reconstructed

image (Eqn. 7). Indeed, the advantage of our recon-

struction scheme is that it allows to easily remove,

or keep, layers T

in the reconstruction process. As

such, given a typical image with 255 layers, we can

decide on-the-ﬂy which layers are relevant for the re-

construction or not, depending on application-speciﬁc

metrics.

The simplest of such metrics is the relevance of a

layer: Given all layers T

, we can remove those which

contribute less to reconstructing an image close to the

input image I. To compare the reconstruction

I with

the original image I, we use the well-known mean

structural similarity index (SSIM) metric (Wang et al.,

2004). Figure 3 ilustrates this. Here, we have re-

moved the least relevant layers to the reconstructed

image (as according to SSIM) and plotted the SSIM

metric. We see that we can remove around 30..50%

of the 255 layers of an 8-bit image without a percep-

tual decrease in image quality: Details such as salient

sharp boundaries, highlights, and even global small-

scale patterns (such as the mandrill’s hair structure)

are well preserved. Accordingly, this means we can

compress an image, by the same layer removal ratio,

i.e. 60% (a), 23% (b), 78% (c), and 37% (d) respec-

tively, with little perceptual loss. Consequently, if we

accept the implied perceptual difference, between the

input image and our simpliﬁcation, techniques such

as JPEG encoding can be subsequently applied atop

of our simpliﬁcation.

4.2 Segmentation

Segmenting an image into its salient shapes has count-

less applications in medical imaging, computer vi-

sion, and image classiﬁcation. We show below how

VISAPP2013-InternationalConferenceonComputerVisionTheoryandApplications

288

a) input image (cameraman)

)cam

c) input image (mandrill)

b) reconstruction (MSSIM=0.84, 102 layers removed)

d) reconstruction (MSSIM=0.55, 61 layers removed)

e) input image (peppers)

f) reconstruction (MSSIM=0.73, 198 layers removed)

g) input image (landscape)

h) reconstruction (MSSIM=0.69, 96 layers removed)

Figure 3: Image reconstruction accuracy (vs SSIM) while

removing layers.

our DMD representation can be used for image seg-

mentation. Given an image I with 0 ≤ i < 256 gray-

values, we compute, for each layer T

, its relevance r

as being the difference between I and the reconstruc-

tion using all layers except T

. Next, we select for re-

construction only the k most relevant layers, where k

is a small user-supplied value. This will keep the most

’salient’ shapes present in the image. Moreover, since

the shape of each layer is simpliﬁed by island removal

(Sc. 3.1) and boundary jaggies removal (Sec. 3.2), the

resulting reconstruction will have simpler shapes.

Figure 4 shows an example. Statistics of the in-

put image (a) are displayed in Fig. 4e. The area

| of a layer is computed as the pixels that have

precisely grayvalue i. We notice, as expected, that

the darkest 20% layers are empty and brighter layers

have increasingly less pixels (highlights are smaller

than darker zones). The number of shapes (connected

components) per layer is relatively large for the layers

having a large area, which indicates that the image is

non-trivial to segment. The relevance r shows several

local peaks: These are layers which are (1) globally

signiﬁcant for the image representation and (2) more

signiﬁcant than layers having similar grayvalues. We

select the k = 6 most signiﬁcant such layers as shown

in Figure 4b. Here, we do not use the linear inter-

polation of consecutive layer grayvalues (Eqn. 7), so

the result shows a luminance-quantization-like seg-

mentation of the image. In contrast, using linear in-

terpolation (Fig. 4c) blurs the reconstruction where

the original image has low contrast (since, as ex-

plained in Sec. 3.1, layers in such zones have far-apart

boundaries) but keeps sharp luminance edges visible

(since these correspond to high-density layer bound-

aries). This yields a fuzzy segmentation of the in-

put image. For comparison purposes, Fig. 4d shows

the result of mean-shift segmentation (Comaniciu and

Meer, 2002) applied on the input image. Although the

produced segments are not identical to ours (Fig. 4b),

the overall segmentation impression is similar.

The exact selection of the k most important layers

is not critical for the segmentation results. Figure 5

shows an example. Here, we reconstructed the input

image (a) by using the 60% most relevant layers. The

result (b) is quite similar with the results of mean shift

segmentation (c), see e.g. the segments corresponding

to the house walls, window panes, and bush. Our seg-

ments are less jagged than the ones produced by mean

shift and still preserve their salient sharp corners, see

e.g. the areas marked in red on the ﬁgure. The reason

for this is the working of the salience regularization

metric for medial axes, which, as explained, elimi-

nates small boundary jaggies but keeps salient cor-

ners unchanged. On the other hand, our result (b) has

a slightly more fuzzy segmentation aspect than mean

shift (c). If a more clear-cut segmentation is desired

(less segments), fewer most-relevant layers can be se-

lected, as shown in the earlier example (Fig. 4).

Figure 6 shows a ﬁnal segmentation example. The

input image (a) shows a skin lesion (naevus) pho-

tograph taken with a Handyscope mobile dermatol-

ogy device in the framework of a digital dermatol-

ogy skin-cancer screening project. A typical network

naevus structure is visible herein. The central part of

the naevus has a slightly darker, and denser, network

pattern, which is only visible on the original high-

resolution 1936 by 2592 pixels image. The marked

ADenseMedialDescriptorforImageAnalysis

289

a b

c d

grayvalue i

layer area |T

layer boundary |∂ T

shapes/layer

layer relevance

selected

layers

Figure 4: Segmentation example. (a) input image; (b) 5 most relevant layers selected; (c) reconstruction; (d) mean shift

segmentation comparison; (e) layer statistics visualization (see Sec. 4.2).

a b c

Figure 5: Segmentation example. (a) input image; (b) our method (60% layers); (c) mean shift segmentation (see Sec. 4.2).

boundary (in green) shows the segmentation of the le-

sion as manually drawn by a dermatology expert atop

of this image. We processed the input image, with-

out the manually-drawn segmentation, to obtain the

result in Figure (b). Here, we see the three most rele-

vant layers segmented from the input image, i.e., the

lesion’s extent atop of the healthy skin (A), and two

regions corresponding to the darker and denser central

area (B,C). This ﬁgure was obtained with relatively

low salience and island-removal values, ε = 0.03 and

σ = 2 (Secs. 3.1 and 3.2). If we increase these values

to ε = 0.05 and σ = 5, more small-scale islands and

also jaggies on the layers’ boundaries get removed.

Figure (c) shows the result: The lesion’s outer bound-

ary has now been considerably smoothed. Note, also,

that this layer is indeed by far the most relevant from

all the image’s layers, as indicated by its large rele-

vance value (Fig. 6d (A)), and its shape is quite sim-

ilar to the manual segmentation. The lesion’s inner

layers are also simpliﬁed, but to a lesser extent.

The target users (dermatology medical specialists)

noted that the tool can be very useful as a guided aid to

their manual work rather than an automatic segmen-

tation techinque: The relevance values suggest salient

structures in the input images. Seeing such values,

users select them in the relevance metric-bar, visual-

ize the corresponding structures, and decide whether

these are useful segments of the case under analysis.

4.3 Artistic Editing

Our method can also be used to generate painting-

VISAPP2013-InternationalConferenceonComputerVisionTheoryandApplications

290

Figure 6: Skin image segmentation. (a) original image showing manual segmentation; (b) detail-preserving automatic seg-

mentation; (c) simpliﬁed automatic segmentation with corresponding relevance metric (d).

d f

Figure 7: Original images: color example (a) and grayscale example (d). Painting-like effects obtained using the method

of (Papari et al., 2007) (b,d) and our method (c,f).

like effects from a given (sharp) photograph, simi-

lar to the artistic edge and corner preserving smooth-

ing effect of Papari et al. (Papari et al., 2007). By

increasing the skeleton saliency metric σ (Eqn. 6),

we eliminate small-scale jaggies of all threshold-sets,

i.e. isophote contours, while keeping their sharp cor-

ners. The reconstruction (Sec. 3.3) interpolates be-

tween these simpliﬁed contours, yielding effects akin

to painting. Figure 7 (c,f) illustrates this on two com-

plex, ﬁne-grained detail, images. The resulting im-

ages, where MSTs have been simpliﬁed by a saliency

value of σ = 0.4 and we kept 65% of the original

threshold-sets, show a painting-like effect of the in-

put forest images, where small-scale details are ‘clus-

tered’ into larger shapes (due to the skeleton simpli-

ﬁcation), but the contrast is not unnecessarily blurred

(due to keeping a signiﬁcant number of the original

grayvalues, or threshold-sets). As such, salient details

such as the dark thin trees and light spots are well

preserved, but small-scale and weak-contrast details

such as the foliage, are simpliﬁed. The painting effect

is strikingly similar with the results produced by the

method of Papari et al., see Fig. 7 (b,e).

5 DISCUSSION

Below we discuss several aspects of our method.

Robustness. We use medial axes for saliency-based

simpliﬁcation and encoding of image layers. Al-

though medial axes are known to be unstable and

not robust to noise, we should stress that this does

not affect our method. Indeed, we use regularized

medial axes, i.e. eliminate noisy branches by means

of the salience metric (Eqn. 6). As explained in detail

ADenseMedialDescriptorforImageAnalysis

291

Figure 8: Original color image (a). Simpliﬁed representation using our method in the RGB space (b) and HSV space (c).

in (Telea and van Wijk, 2002; Telea, 2012), this

regularization produces medial axes which are robust

to arbitrary boundary noise for shapes of arbitrary

genus. Also, we should note that the medial axes

are exact, i.e. precisely centered in their shapes and

pixel-thin, by construction, given the exact Euclidean

DT we use (Cao et al., 2010) and the underlying

skeletonization algorithm (Telea and van Wijk, 2002).

Speed. Our method relies on the fast computation of

distance transforms and skeletons (Secs. 3.2,3.3). On

the CPU, we have used for this the method presented

in (Telea and van Wijk, 2002), which is worst-case

O(n log(

√

n)) for an image of n pixels. On the GPU,

using the method in (Cao et al., 2010), we achieve

a complexity of O(n). For images of 512

pixels,

our CPU method takes about 1 minute on a PC at 2.5

GHz, while on an Nvidia 330 GTM GPU, we take

1..2 seconds. The memory complexity is O(n), as we

only need to store a ﬁxed set of 256 MSTs per image.

Parameters. Our method selects a subset of rel-

evant threshold-sets from the 256 possible sets,

and then computes simpliﬁed MSTs for each such

threshold-set according to the speciﬁed saliency.

Hence, saliency and relevance (σ,r) create a

two-dimensional scale-space for the input image. Se-

lecting less threshold-sets (high r) emphasizes fewer

high-relevance structures in the image (Sec. 3.3.

Simplifying each MST (high σ) reduces the border-

detail of such contours (Sec. 3.2). The third and

ﬁnal parameter is the size ε of the foreground and

background islands to be removed (Sec. 3.1). For

typical applications, setting ε to values between 3 and

5% of the area of a layer achieves the desired effect,

i.e. removal of small isolated bright or dark specks.

Color Images. Applying our DMD representation to

color images is trivial. For this, we apply the entire

pipeline (threshold sets, medial axes, and simpliﬁed

reconstruction) to each channel of a color image.

Figure 8 illustrates this. As visible, choosing either an

RGB or HSV color space does not create signiﬁcant

differences, as both hues and luminances are well

preserved. Computing DMDs for color images is

three times slower than for grayscale images, given

that we process each color channel independently.

Applications. We have illustrated our method with

applications in image segmentation, simpliﬁcation,

and artistic manipulation. For all such use-cases,

there exist obviously more specialized methods which

yield better results. Our purpose in selecting these

use-cases was mainly to illustrate the versatility of our

framework, i.e. the fact that the proposed DMD rep-

resentation can be seen as a potential, simple, alter-

native for a wide spectrum of image processing tasks.

As such, we see the DMD as a low-level descriptor

atop of which more advanced manipulations can be

built, and not as an end-user instrument by itself.

6 CONCLUSIONS

We have presented dense medial descriptors, a new

representation that encodes shape and luminance in-

formation in grayvalue images. To allow using me-

dial descriptors for such images, we ﬁrst decompose

an image into all its possible threshold-sets, and then

encode each such set using classical medial axes reg-

ularized by a corner-preserving saliency metric. The

resulting descriptor allows an exact reconstruction of

the initial image using distance-based interpolation

techniques, and also an application-dependent simpli-

ﬁcation by eliminating shapes, or shape details, of low

interest or relevance. We have implemented our de-

scriptor using GPU-based techniques to achieve near-

real-time performance. We demonstrate our proposal

with applications in image simpliﬁcation, segmenta-

tion, and artistic painting effects.

Many possible extensions of our proposal ex-

ist. First, we can exploit the topological informa-

tion present in our dense medial axes, e.g. branching

or looping structures, to perform higher-level image

VISAPP2013-InternationalConferenceonComputerVisionTheoryandApplications

292

analysis tasks such as fuzzy object recognition. Sec-

ondly, we can exploit the spatial and topological re-

lations of medial axes of consecutive image layers to

perform new types of image editing, e.g. fuzzy object

deformation, and also to study new methods for im-

age compression. Finally, generalizing our method to

3D scalar volumes is an interesting avenue to explore.

ACKNOWLEDGEMENTS

This project was co-ﬁnanced by the research grant

PN-II-RU-TE-2011-3-2049 “Image-assisted diagno-

sis and prognosis of cutaneous melanocitary tumors”

offered by ANCS, Romania.

REFERENCES

Ahuja, N. and Chuang, J. (1997). Shape representation us-

ing a generalized potential ﬁeld model. IEEE TPAMI,

19(2):169–176.

Bai, X. and Latecki, L. (2008). Path similarity skeleton

graph matching. IEEE TPAMI, 30(7):1282–1292.

Cao, T., Tang, K., Mohamed, A., and Tan, T. (2010). Paral-

lel banding algorithm to compute exact distance trans-

form with the GPU. In Proc. SIGGRAPH I3D Symp.,

pages 134–141.

Comaniciu, D. and Meer, P. (2002). Mean shift: A robust

approach toward feature space analysis. IEEE TPAMI,

24(5):603–619.

Cornea, N., Silver, D., Yuan, X., and Balasubramanian, R.

(2005). Computing hierarchical curve-skeletons of 3D

objects. Visual Comput., 21(11):945–955.

Costa, L. and Cesar, R. (2000). Shape analysis and classiﬁ-

cation. CRC Press.

Foskey, M., Lin, M., and Manocha, D. (2003). Efﬁcient

computation of a simpliﬁed medial axis. In Proc.

Shape Modeling, pages 135–142.

Hassouna, M. and Farag, A. (2009). Variational curve

skeletons using gradient vector ﬂow. IEEE TPAMI,

31(12):2257–2274.

Hesselink, W. and Roerdink, J. (2008). Euclidean skele-

tons of digiral image and volume data in linear time

by the integer medial axis transform. IEEE TPAMI,

30(12):2204–2217.

Kass, M., Witkin, A., and Terzopoulos, D. (1988). Snakes:

Active contour models. IJCV, 1(4):321–331.

Li, C., Xu, C., Gui, C., and Fox, M. (2010). Distance regu-

larized level set evolution and its application to image

segmentation. IEEE TIP, 19(12):32433254.

Macrini, D., Siddiqi, K., and Dickinson, S. (2008). From

skeletons to bone graphs: Medial abstraction for ob-

ject recognition. In Proc. CVPR, pages 324–332.

Ogniewicz, R. L. and Kubler, O. (1995). Hierarchic voronoi

skeletons. Patt. Recog., (28):343– 359.

Palagyi, K. and Kuba, A. (1999). Directional 3D thinning

using 8 subiterations. In Proc. DGCI, volume 1568,

pages 325–336. Springer LNCS.

Papari, G., Petkov, N., and Campisi, P. (2007). Artistic

edge and corner preserving smoothing. IEEE TIP,

16(10):2449–2462.

Pudney, C. (1998). Distance-ordered homotopic thinning:

A skeletonization algorithm for 3D digital images.

CVIU, 72(3):404–413.

Reniers, D. and Telea, A. (2007). Tolerance-based fea-

ture transforms. In Advances in Comp. Graphics and

Comp. Vision (eds. J. Jorge et al.), pages 187–200.

Springer.

Rumpf, M. and Telea, A. (2002). A continuous skeletoniza-

tion method based on level sets. In Proc. VisSym,

pages 151–158.

Sethian, J. (2002). Level Set Methods and Fast Marching

Methods. Cambridge Univ. Press.

Shaked, D. and Bruckstein, A. (1998). Pruning medial axes.

CVIU, 69(2):156–169.

Shi, J. and Malik, J. (2000). Normalized cuts and image

segmentation. IEEE TPAMI, 22(8):888–905.

Siddiqi, K., Bouix, S., Tannenbaum, A., and Zucker, S.

(2002). Hamilton-Jacobi skeletons. IJCV, 48(3):215–

231.

Siddiqi, K. and Pizer, S. (2009). Medial Representations:

Mathematics, Algorithms and Applications. Springer.

Stolpner, S., Whitesides, S., and Siddiqi, K. (2009). Sam-

pled medial loci and boundary differential geometry.

In Proc. IEEE 3DIM, pages 87–95.

Strzodka, R. and Telea, A. (2004). Generalized distance

transforms and skeletons in graphics hardware. In

Proc. VisSym, pages 221–230.

Sud, A., Foskey, M., and Manocha, D. (2005). Homotopy-

preserving medial axis simpliﬁcation. In Proc. SPM,

pages 103–110.

Sundar, H., Silver, D., Gagvani, N., and Dickinson, S.

(2003). Skeleton based shape matching and retrieval.

In Proc. SMI, pages 130–138.

Telea, A. (2012). Feature preserving smoothing of shapes

using saliency skeletons. Visualization in Medicine

and Life Sciences, pages 155–172.

Telea, A. and van Wijk, J. J. (2002). An augmented fast

marching method for computing skeletons and center-

lines. In Proc. VisSym, pages 251–259.

van Dortmont, M., van de Wetering, H., and Telea, A.

(2006). Skeletonization and distance transforms of

3D volumes using graphics hardware. In Proc. DGCI,

pages 617–629. Springer LNCS.

van Eede, M., Macrini, D., Telea, A., and Sminchisescu, C.

(2006). Canonical skeletons for shape matching. In

Proc. ICPR, pages 542–550.

Wan, M., Dachille, F., and Kaufman, A. (2001). Distance-

ﬁeld based skeletons for virtual navigation. In Proc.

IEEE Visualization, pages 239–246.

Wang, Z., Bovik, A. C., Sheikh, H. R., and Simoncelli, E. P.

(2004). Image quality assessment: From error visibil-

ity to structural similarity. IEEE TIP, 13(4):600–612.

ADenseMedialDescriptorforImageAnalysis

293