THE POTENTIAL OF CONTOUR GROUPING FOR IMAGE

CLASSIFICATION

Christoph Rasche

Laboratorul de Analiza si Prelucrarea Imaginilor, Bucharest Politechnica University, Bucuresti, Romania

Keywords:

Image classiﬁcation, Contour description, Contour grouping.

Abstract:

An image classiﬁcation system is introduced, that is predominantly based on a description of contours and

their relations. A contour is described by geometric parameters characterizing its global aspects (arc or al-

ternating) and its local aspects (degree of curvature, edginess, symmetry). To express the relation between

contours, we use a multi-dimensional vector, whose parameters describe distances between contour points and

the contours’ local aspects. This allows comparing for instance L features or parallel contours with a simple

distance measure. The approach has been evaluated on two image collections (Caltech 101 and Corel) and

shows a reasonable categorization performance, yet its future lies in exploiting the preprocessing to understand

’parts’ of the image.

1 INTRODUCTION

Recent approaches to image classiﬁcation have used

a variety of methods for their success. For instance

Oliva and Torralba use the Fourier transform to pre-

process gray-scale images of outdoor scenes (urban

and natural), whose spectra are then classiﬁed (Oliva

and Torralba, 2001); the group by Perona uses the

principal component analysis to classify rigid ob-

jects with clear silhouettes (the Caltech-101 collec-

tion (Fergus et al., 2007); others achieve compara-

ble performances using histograms of selected fea-

tures (Perronnin et al., 2006), systematic image his-

togramming (Lazebnik et al., 2006) and inspiration

by Gestalt laws (Bileschi and Wolf, 2007). These ap-

proaches are good at discriminating image categories,

but once the image is classiﬁed, their preprocessing

output does not allow to analyze the structure, for

instance to understand parts of the image or to de-

termine the orientation of the recognized object. To

carry out such a structural analysis it required a novel

processing of the image. For instance, in case of the

spatial envelope system by Oliva and Torralba, a pre-

processing based on local orientations was developed,

that allows for a visual search (Torralba et al., 2006),

an effort which appears to be a move toward a struc-

tural description. Clearly, a structural description is

still the most promising approach to a complete scene

understanding system.

The idea of structural description is typically as-

sociated with an exact reconstruction of the image,

starting for instance with image segmentation. This

direction is perceived as little promising for the task

of image classiﬁcation given the wave of above men-

tioned attempts to ’directly’ classify (see also explicit

arguments by Oliva and Torralba in (Oliva and Tor-

ralba, 2001)). Furthermore, contours often appear

fragmented and seemingly do not allow for a straight-

forward assignment to their ’parts’. Still, contour and

structural-description approaches show their promise

in object-search systems, for instance as templates

of a ’Cubist’ representation (Nelson and Selinger,

1998) (see also (Shotton et al., 2008; Opelt and Prinz,

2006; Zheng et al., 2007); others develop learning al-

gorithms for edge detection for speciﬁc image sets

(Dollr et al., 2006); or try to use contour informa-

tion to detect junctions in natural images (Maire et al.,

2008). In this study, a structural description is pur-

sued that describes contour geometry and their rela-

tions very accurately; elaborate image segmentation

is avoided with the consequence of producing fre-

quent accidental detections, which however are made

negligible by a matching process using redundant cat-

egory representations. But because structure is de-

scribed exhaustively, it potentially allows a detailed

image analysis without the need of a novel prepro-

cessing.

One way to relate contours is to use the

symmetric-axis transform (Blum, 1973).

481

Rasche C. (2010).

THE POTENTIAL OF CONTOUR GROUPING FOR IMAGE CLASSIFICATION.

In Proceedings of the International Conference on Computer Vision Theory and Applications, pages 481-486

DOI: 10.5220/0002815904810486

 SciTePress

Though conceptually elegant, it suffers from sus-

ceptibility to speckled noise which leads to distor-

tions of the sym-axes, and it generates local rela-

tions only, meaning the sym-axes are formed only be-

tween immediately neighboring contours. However,

what it also requires are global relations (or group-

ings) between contours, whereby irrelevant contours

may lie in-between (e.g. speckled noise). The study

by Bileschi and Wolf is a step toward that direction

(Bileschi and Wolf, 2007): it relies on ﬁnding group-

ing principles from pixel correlations, that however

are time intensive (ca. 80 secs/image). Instead, a

grouping by contours would be less intensive and po-

tentially more powerful, yet has been hardly pursued.

That is the novelty of this study. To pursue such a

contour-based approach, it requires a method which

can reliably identify contours. We thereby use the

method described in (Rasche, 2009), that is summa-

rized in subsection 2.1. Subsection 2.2 explains the

grouping procedure tested in this study.

2 MODEL

2.1 Contour Description, Partitioning

and Extraction

The contour description is derived from distance dis-

tributions that in turn are obtained from systematic

measurements along the contour. For an arbitrary

contour a so-called local/global (LG) space is cre-

ated, which is a description analogous to the scale

space (ﬁne/coarse space) but does not involve low-

pass ﬁltering (Rasche, 2009). The contour’s global

geometry is classiﬁed into either arc (a) or alternat-

ing (w), whereby the values are scalar and express the

strength of these aspects. The contour’s local aspects

are described by the curvature parameter (b), that ex-

presses the circularity and amplitude of the arc and

alternating contour respectively; the edginess param-

eters, that expresses the sharpness of a curve (L fea-

ture or bow); the symmetry parameter, that expresses

the ’eveness’ of the contour.

Contours are partitioned as follows: if a contour

contains an ’end’ - a turn of 180 degrees - it is par-

titioned at its point of highest curvature. After ap-

plication of this rule, any contour appears either as

elongated in a coarse sense and can thus be classiﬁed

as either alternating or curved (w or a). An excep-

tion to this rule are smooth arcs, whose arc length is

larger than 180 degrees; they are extracted separately.

Exemplifying these two partitioning steps on the Ω

shape: the shape is havened and its circular part is

extracted.

An alternating contour may span several objects

(or parts) and that can be very characteristic to a

category (as for instance the vertical wiggly contour

for a person). Yet, its individual curved and straight

segments can form potentially useful groupings with

other contours of the structure. Thus, further parti-

tioning for the purpose of grouping meant also losing

potential category speciﬁcity. In this study, such al-

ternating contours are not further partitioned but any

straight or reasonably smooth, curved segment of suf-

ﬁcient arc length is extracted from it. Such elemen-

tary segments can be identiﬁed using the LG space.

For a wiggly, natural contour such segments hardly

exist, but many object silhouettes contain multiple

such segments. Thus, the decomposition process does

not strictly partition the contours into separate seg-

ments, but will create partially overlapping segments

to some extent. Taking the Ω shape as the example

again, it is partitioned into 5 segments: one smooth

arc segment; two L features representing the corners;

and two straight segments (if of sufﬁcient arc length).

The left graph in Figure 1 shows an example out-

put of this decomposition. The long smooth arc out-

lining the wheel shows multiple segment extractions

(straight and curved) because a) the segment is an el-

lipse, and b) due to the aliasing problem and the asso-

ciated difﬁculty of discriminating between a smooth

arc and a circularly aligned (open) polygon, that is

discriminating between circle and hexagon for in-

stance. Filtering techniques could resolve this latter

issue (Lowe, 1989), but would introduce additional

computation time, which we think can be avoided as

we merely intend to ﬁnd the semantic content of the

image and not a precise reconstruction. In addition to

those contours (denoted as c), we use the symmetric-

axis descriptors a as in (Rasche, 2009).

2.2 Grouping

Relating all contours with each other is excessive and

leads to an unspeciﬁc structural description. Thus,

there need to be some constraints that reduce the num-

ber of all possible relations to a smaller set of mean-

ingful groupings. Such constraints have already been

described by Lowe for the purpose of determining the

orientation of objects in 3D space (Lowe, 1985). For

instance, closely spaced contour endpoints or paral-

lel lines are ’salient’ groupings which point to certain

object poses. In case of a description for an arbitrary

structure, the issue of grouping is more complexas es-

sentially any spatial arrangement of two contours can

be very category speciﬁc. Thus, the aim is therefore

to ﬁnd criterions that eliminate irrelevant pairings and

VISAPP 2010 - International Conference on Computer Vision Theory and Applications

482

50 100 150 200 250 300

100

150

200

250

Smooth Arcs and Straight Segments (img=8904)

50 100 150 200 250 300

100

150

200

250

Figure 1: Decomposition output for an image (wheel chair from Caltech-101 collection). Left: Contour endpoints are marked

as small black circles; squares and circles denote straight and curved segments respectively (their size reﬂects segment length -

not curvature). Overlapping circles result from the global-to-local identiﬁcation of elementary segments. Right: Smooth arcs

(thick gray [magenta]) and straight segments (thick stippled) after application of a simple smoothness criterion to individual

contours.

keep the potentially category-characteristic pairs. In

this study, we tested the following criterions:

1) Smoothness: only reasonably smooth contours

were allowed for pairs, by choosing segments (c)

with a low edginess value (see Figure 1 right graph).

See also (Felzenszwalb and McAllester, 2006) for a

method of ﬁnding salient curves.

2) Nearest Choice: For a given contour, only the

two most proximal contours are admitted as pairs.

Proximity is determined for the two endpoints and the

center point of a contour.

Those two criterions reduce the number of pair-

ings substantially, but they may already eliminate

some category-characteristic pairs, in particular the

nearest choice criterion as there could exist distinct

pairs on a global scale. We therefore used one pair-

ing that is salient independent of the intersegment dis-

tance of the two segments:

3) Closure: A contour pair that appeared as round

or as encapsulating an area, e.g. two curved segments

lying on opposite sides of a circle.

A contour-pair vector is created consisting of the

following parameters: the distance between the prox-

imal endpoints (d

); the distance between the center

points (d

); the distance between the distal endpoints

), average contour length (l), the asymmetry of

contour lengths (y); the curvature values for the two

contours (b

and b

; obtained from the contour de-

scription):

p(o, d

, d

, l, y, b

, b

). (1)

We also tested a texture descriptor t consisting of

the appearance dimensions of the sym-axis descrip-

tor only. And we also tested a ’cluster’ descriptor r,

which expresses the geometry of a contour cluster in a

statistical sense. Those descriptors are not explained

further for reason of brevity.

3 IMPLEMENTATION &

EVALUATION

The closure criterion was implemented by choosing

pairs whose distance between the center points was

larger than the distances between the end points by a

factor of 1.1; this is a simple but somewhat loose rule,

selecting also a small number of ’distorted’ pairings

(e.g. segments not facing each other symmetrically).

The preprocessing was carried out for four (spa-

tial) scales (σ=1,2,3,5). The average operation dura-

tion for a Caltech image at scale σ = 1 using Mat-

lab on an Intel 2GHz was: 1300ms for creating the

contour-pair vectors; 940ms for the cluster vectors;

another ca. 8 seconds were used for contour ex-

traction, description and sym-axes generation (total

ca. 10 secs). For scale σ = 5, the average image-

preprocessing duration was 4.8 seconds; for all 4

scales it was ca. 30 seconds.

THE POTENTIAL OF CONTOUR GROUPING FOR IMAGE CLASSIFICATION

483

Faces

Faces easy

Leopards

Motorbikes

accordion

airplanes

anchor

ant

barrel

bass

beaver

binocular

bonsai

brain

brontosaurus

buddha

butterfly

camera

cannon

car side

ceiling fan

cellphone

chair

chandelier

cougar body

cougar face

crab

crayfish

crocodile

crocodile head

cup

dalmatian

dollar bill

dolphin

dragonfly

electric guitar

elephant

emu

euphonium

ewer

CALTECH, LRE, s =3

Figure 2: Category-speciﬁc contour pairs for the ﬁrst 40 categories of the Caltech-101 collection (using two training images).

The thin dotted line connects a pair; contour thickness corresponds to the descriptor weight. A pair is obtained by comparing

the individual vectors of two images (of the same category).

The model was evaluated on the Caltech 101 col-

lection (Fergus et al., 2007) and the COREL collec-

tion (see e.g. (Rasche, 2009) for its use). In a learning

phase, category-speciﬁc descriptors (the ’Cubist’ rep-

resentation) were determined by ﬁnding similar de-

scriptors amongst two images of the same category

(see Figure 2 for contour pairs). Descriptor weights

were set by ’cross-correlating’ the Cubist representa-

tions and determining how often (rare) they occur in

any other categories.

In a testing (categorization) phase, the descriptors

of each Cubist representation are matched against the

descriptors of a test image. More speciﬁcally, the

descriptors v

of a test image were matched against

the category-speciﬁc descriptors v

of a category, re-

sulting in a distance matrix D

. The shortest dis-

tance for each category-speciﬁc descriptor was se-

lected d

= max

and multiplied with the descrip-

tor weights. A simple integration across descriptors

and scales, followed by a maximum search decided

on the preferred category.

For two training images, the correct-

categorization performance was ca. 19 percent;

the average ranking value was ca. 16, that is the

rank number at which the correct category appears

(1=correct categorization; 51=random ranking). The

left graph in Figure 3 shows the average ranking

for all categories: about half of the categories were

ranked among the top 18; the last 10 categories

seemed randomly ranked (value around 51).

The average descriptor weight was highest for the

area vector, likely because the descriptor contained

the largest number of dimensions (12). Given the

small number of dimensions for the contour pair vec-

tor, its average weight was relatively high.

The performance for individual descriptors and

scales, as well as knock-out (leave-one-out) simula-

tions is depicted in Figure 4. The contour-pair de-

scriptor had the largest impact on performance, which

is evidenced by the large individual performance (ca.

13 percent, see ’Descriptor Individual’) and by the

signiﬁcant performance decrease for a knock-out sim-

ulation (ca. 10 percent, see ’Descriptor KnockOut’).

Lower (spatial) scales were more effectivethan higher

spatial scales, e.g. 14 and 15 percent for scales 1 and

For the COREL collection, the categorization per-

formance was slightly lower (17 percent, two training

images) and the performance pattern for the various

tests looked similar.

A learning process for 5 images was also tested

for the Caltech collection only. Firstly, category-

speciﬁc descriptors for each pair of images are de-

termined, followed by pooling the ones that were dif-

ferent amongst image pairs. The number of features

increased by ca. 40 percent, but the performance

increased only by 25 percent (Figure 4b), a rather

marginal increase. However the performance of in-

dividual descriptors increased by several folds except

for the pairing vector.

VISAPP 2010 - International Conference on Computer Vision Theory and Applications

484

20 40 60 80 100

100

Ordered Categories

Ranking no./Distinctness

Ranking

1 2 3 5

0.89

0.9

0.91

0.92

0.93

0.94

0.95

0.96

0.97

0.98

Scale

Average Weight

Avg. Weight/Descriptor

Figure 3: Performance analysis for 2 training images.

Left: Average ranking with corresponding standard devi-

ation (dotted). Right: Average weight per descriptor and

scale. Error bars: standard deviation of crossfolds.

Figure 5 shows the 7 most similar categories for the

10 best- and worst-ranking categories. Even for the

worst-ranking categories the similar categories can

show structural similarities; sometimes, the similar

categories correspond to the same super-ordinate cat-

egory, e.g. animal categories would select other ani-

mals as similar categories.

4 DISCUSSION

The overall categorization performance (19 percent

for 2 training images) is not quite comparable yet to

other categorization attempts (up to 50 percent for 2

training images, see citations in introduction), but the

study demonstrates the power of expressing contours

as vectors and relating them by simple vector calcula-

tion. Furthermore and more importantly, the present

approach bears the possibility to interpret the prepro-

cessing output if a more detailed understanding of the

image is desired, e.g. the parameters describe very ac-

curately the geometry and spatial location of contours

and areas. This is a speciﬁcity that none of the other

image classiﬁcation systems provides.

One potentially simple way to increase the per-

formance is to combine our approach with an ap-

proach that is based on appearance, e.g. (Perronnin

et al., 2006; Lowe, 2004). However, to implement a

complete image understanding system, it requires the

structural thoroughness as pursued here. There are

many other sites, where the system can be improved

but we expect the largest performance increase from

the following improvements: a) reﬁned grouping, for

c a p t r1

Percent

Descriptor KnockOut

c a p t r1

Percent

Descriptor Individual

1 2 3 5

Percent

Scale Knock Out

1 2 3 5

Percent

Scale Individual

c a p t r1

Percent

Descriptor KnockOut

c a p t r1

Percent

Descriptor Individual

1 2 3 5

Percent

Scale Knock Out

1 2 3 5

Percent

Scale Individual

a 2 training images

b 5 training images

Figure 4: Performance (correct categorization) for 2 and

5 training images (a and b) for descriptor knock-out (up-

per left), individual descriptors (upper right), spatial scale

knock-out (lower left) and individual scale (lower right).

Note the performance increase for individual descriptors for

5 training images (compare upper right graph of a and b).

Stippled horizontal line: total performance; dotted lines and

error bars: standard deviation of crossfolds. c: contours; a:

areas; p: polygons; t: textures; r1: regions.

instance a grouping by geometrical similarity; b) fur-

ther distance measurements and parameterization, for

instance clusters of intersections; c) a probabilistic

formulation of the presence of descriptors for a cat-

egory representation; d) a better learning process.

THE POTENTIAL OF CONTOUR GROUPING FOR IMAGE CLASSIFICATION

485

Similar Categories...

___________________________________

BEST

WORST

Selected

Figure 5: Best and worst performing categories and their

most similar categories, top ten and bottom ten rows respec-

tively (Caltech 101). The ﬁrst image in each row, represents

a randomly selected image of that category; the remaining

7 images represent the most similar categories with increas-

ing distance.

REFERENCES

Bileschi, S. and Wolf, L. (2007). Image representations be-

yond histograms of gradients: The role of gestalt de-

scriptors. In IEEE Computer Society Conference on

Computer Vision and Pattern Recognition, pages 1–8.

Blum, H. (1973). Biological shape and visual science (part

I). Journal of Theoretical Biology, 38:205–287.

Dollr, P., Tu, Z., and Belongie, S. (2006). Supervised learn-

ing of edges and object boundaries. In IEEE Computer

Society Conference on Computer Vision and Pattern

Recognition, pages 1964–1971.

Felzenszwalb, P. and McAllester, D. (2006). A min-cover

approach for ﬁnding salient curves. In IEEE Con-

ference on Computer Vision and Pattern Recognition,

page 185.

Fergus, R., Perona, P., and Zisserman, A. (2007). Weakly

supervised scale-invariant learning of models for vi-

sual recognition. International Journal Of Computer

Vision, 71(3):273–303.

Lazebnik, S., Schmid, C., and Ponce, J. (2006). Beyond

bags of features: Spatial pyramid matching for rec-

ognizing natural scene categories. In IEEE Computer

Society Conference on Computer Vision and Pattern

Recognition.

Lowe, D. G. (1985). Perceptual Organization and Visual

Recognition. Kluwer Academic Publishers, Boston.

Lowe, D. G. (1989). Organization of smooth image curves

at multiple scales. International Journal of Computer

Vision, 3:119–130.

Lowe, D. G. (2004). Distinctive image features from scale-

invariant keypoints. International Journal of Com-

puter Vision, 60(2):91–110.

Maire, M., Arbelez, P., Fowlkes, C., and Malik, J. (2008).

Using contours to detect and localize junctions in nat-

ural images. In IEEE Conference on Computer Vision

and Pattern Recognition, pages 1–8.

Nelson, R. C. and Selinger, A. (1998). A cubist approach to

object recognition. In Sixth International Conference

on Computer Vision.

Oliva, A. and Torralba, A. (2001). Modeling the shape of

the scene: A holistic representation of the spatial en-

velope. Int. J. Comput. Vision, 42(3):145–175.

Opelt, A. and Prinz, A. (2006). Fusing shape and appear-

ance information for object category detection. In

British Machine Vision Conference (BMVC).

Perronnin, F., Dance, C., Csurka, G., and Bressan, M.

(2006). Adapted vocabularies for generic visual cate-

gorization. In European Conference on Computer Vi-

sion, Graz, Austria, volume 4, pages 464–475.

Rasche, C. (2009). An approach to the parameterization

of structure for fast categorization. Int. J. Comput.

Vision.

Shotton, J., Blake, A., and Cipolla, R. (2008). Multi-

scale categorical object recognition using contour

fragments. IEEE Transactions of Pattern Analysis and

Machine Intelligence, 30(7):1270–1281.

Torralba, A., Castelhano, M. S., Oliva, A., and Hender-

son, J. M. (2006). Contextual guidance of eye move-

ments and attention in real-world scenes: the role of

global features in object search. Psychological Re-

view, 113:766–786.

Zheng, S., Tu, Z., and Yuille, A. L. (2007). Detecting object

boundaries using low-, mid, - and high-level informa-

tion. In IEEE Conference on Computer Vision and

Pattern Recognition, Minneapolis, USA, pages 1–8.

VISAPP 2010 - International Conference on Computer Vision Theory and Applications

486