Contour Localization based on Matching Dense HexHoG Descriptors

Yuan Liu and J. Paul Siebert

School of Computing Science, University of Glasgow, Glasgow, U.K.

Keywords:

Feature Extraction, Local Matching, Object Detection, Edge Detection, Edge Contour Labelling, Segmenta-

tion Features, HexHoG Descriptors.

Abstract:

The ability to detect and localize an object of interest from a captured image containing a cluttered background

is an essential function for an autonomous robot operating in an unconstrained environment. In this paper, we

present a novel approach to reﬁning the pose estimate of an object and directly labelling its contours by dense

local feature matching. We perform this task using a new image descriptor we have developed called the Hex-

HoG. Our key novel contribution is the formulation of HexHoG descriptors comprising hierarchical groupings

of rotationally invariant (S)HoG ﬁelds, sampled on a hexagonal grid. These HexHoG groups are centred on

detected edges and therefore sample the image relatively densely. This formulation allows arbitrary levels of

rotation-invariant HexHoG grouped descriptors to be implemented efﬁciently by recursion. We present the

results of an evaluation based on the ALOI image dataset which demonstrates that our proposed approach can

signiﬁcantly improve an initial pose estimation based on image matching using standard SIFT descriptors.

In addition, this investigation presents promising contour labelling results based on processing 2892 images

derived from the 1000 image ALOI dataset.

1 INTRODUCTION

This paper addresses the issue of accurate object edge

contour localisation given an initial estimate of an ob-

ject’s pose with respect to its pose captured within a

reference image. Appearance-based methods (Dalal

and Triggs, 2005; Lazebnik et al., 2006; Murphy

et al., 2006; Felzenszwalb et al., 2010; Borji and

Itti, 2012) and contour-based methods (Kontschieder

et al., 2011; Schlecht and Ommer, 2011; Shotton

et al., 2005; Xu et al., 2012) for object detection have

been extensively studied in recent years. Appearance-

based methods represent the dominant approach to

object detection, and typically are based on a pipeline

that ﬁrst extracts local patch features, and then em-

ploys a sliding window to scan across the whole im-

age to detect a target. Alternatively, the pipeline can

be structured to employ local features in order to de-

tect object parts, which can then be associated to-

gether to detect the whole target. Since an object’s

edge contours afford crucial information for visual

perception, edge contour-based approaches have also

been extensively developed. The edge contour repre-

sentation could be represented by local curvature in-

formation, or by the spatial structural relationship be-

tween edge fragments. Such edge contour represen-

tations can be employed individually for part match-

ing, or combined together to generate a shape model

suitable for whole object detection. It is inherently

difﬁcult to extract the edge contours of an object di-

rectly, particularly when the object appears within

a cluttered background, since background structures

that intersect an object’s boundary tend to corrupt, or

distort, the extracted bounding edge contour. There-

fore, appearance-based methods are predominantly

used for object detection. However, the ability to lo-

calise an object’s boundaries would allow the pixels

representing the object to be speciﬁed, as opposed to

merely knowing the approximate position of a bound-

ing box containing the object, as currently afforded

by sparse local feature-based methods. Therefore, ac-

curately extracted edge contours could serve both to

segment an object from the scene and also to provide

a shape-based representation of the segmented object.

Accordingly, the combination of appearance-based

and edge contour-based methods (Schlecht and Om-

mer, 2011) has the potential to provide accurate object

localisation and additional information describing an

object’s semantics.

The principal contribution of this paper is a new

method for combining appearance and edge informa-

tion to detect and localise an object’s edge contours

656

Liu Y. and Siebert P..

Contour Localization based on Matching Dense HexHoG Descriptors.

DOI: 10.5220/0004744006560666

In Proceedings of the 9th International Conference on Computer Vision Theory and Applications (VISAPP-2014), pages 656-666

ISBN: 978-989-758-003-1

 2014 SCITEPRESS (Science and Technology Publications, Lda.)

within a cluttered background. A new feature descrip-

tor, the HexHoG, based on a hexagonal, hierarchical

grouping mechanism that confers it with sufﬁcient re-

liability and distinctiveness to enable it to be used to

sample the image at all detected edgel positions (as

opposed to only corner locations). An initial pose

estimation is ﬁrst obtained by means of sparse local

feature matching using a standard SIFT implementa-

tion. Based on this estimation result, a dense local

edge matching process is then applied using our new

HexHoG feature to reﬁne the initial pose estimation,

and this reﬁned pose estimation is then used to con-

strain local dense edge matching to obtain object edge

contour labelling (and correspondences between the

contour edgels detected in the test and reference im-

ages). Therefore, in this work we are not employing

HexHoG descriptors for object detection, but instead

utilising HexHoG descriptors for edge contour match-

ing, edge labelling and pose estimation reﬁnement as

a post detection & classiﬁcation process.

The proposed method is validated using the

dataset ALOI (Geusebroek et al., 2005). Our re-

sults show that our proposed method signiﬁcantly

improves pose estimation reﬁnement and exhibits

promising results for edge contour labelling. The re-

mainder of this paper is organized as follows: Section

2 presents a brief review of related work. Section 3

introduces our complete system for object pose esti-

mation and edge contour labelling. Our experimental

results are presented in Section 4, followed by the pa-

per’s conclusions.

2 RELATED WORK

Many object detection methods are able to achieve

approximate localization of an object within a clut-

tered background. Borenstein & Ullman (Boren-

stein and Ullman, 2002) propose a Top-Down class-

speciﬁc segmentation protocol to identify the struc-

ture of an object by means of high-level information,

instead of using the traditional image-based criteria.

Their method can detect an object which is labelled by

means of previously learned ’building blocks’, which

do not precisely delineate the pixels comprising the

detected object. Yu &Shi (Yu and Shi, 2003) present

an integration model incorporating low-level edge de-

tection and high-level patch detection to label an ob-

ject of interest segregated from the background. How-

ever, no statistical evaluation of this method is pre-

sented in (Yu and Shi, 2003). Leibe et al (Leibe et al.,

2008) contribute an Implicit Shape Model which af-

fords their system a greater degree of ﬂexibility by en-

abling it to learn different object shapes and use these

shape models to categorize objects in novel images

whilst inferring a probabilistic segmentation, which

then in turn improves the robustness of the catego-

rization and detection processes. Schlecht & Om-

mer (Schlecht and Ommer, 2011) propose a method

for complementing appearance information with con-

tour information in order to detect an object within a

bounding box. Neither of these above two methods

provide precise object boundaries which would allow

the shape of segmented objects to be represented and

recovered. Ferrari et al (Ferrari et al., 2010) provide a

detection method by learning an object shape model

represented using local contour features. Novelobject

instances could be localized in new images and the

object boundaries were labelled rather than just being

contained within a bounding box. A signiﬁcant lim-

itation of this system, however, is the computational

cost of its learning process.

Feature extraction has been explored extensively

in the context of object detection and localization.

Gradient histogram-based descriptors have been re-

searched intensively and applied widely for this pur-

pose. Local densely sampled descriptors have been

reported to give promising results in human detection

(Dalal and Triggs, 2005) and wide-baseline match-

ing (Tola et al., 2010), although such descriptors do

not usually posses the property of rotation invari-

ance. Sparse, distinctive features (Lowe, 2004; Miko-

lajczyk and Schmid, 2005; Alahi et al., 2012) achieve

rotation invariance by rotating the local sampling co-

ordinate frame according to the local dominant gradi-

ent orientation direction prior to compute an orien-

tated gradient histogram distribution. Accordingly,

this rotation normalization process is expensive to

compute and is therefore inherently unsuitable when

dense feature extraction is required. Furthermore,

such features do not extract object edge information,

which affords a crucial cue for visual perception.

3 APPROACH

In this section, we give the details of our proposed

methods based on HexHoG feature extraction and

dense local edge matching. The overview of our sys-

tem is summarized in Fig.1.

Dense local edge matching for pose

estimation refinement

Images inputted

Detection and initial pose

estimation

Dense local edge matching for

edge contour labelling

Edge detection

Figure 1: The overview of our system.

ContourLocalizationbasedonMatchingDenseHexHoGDescriptors

657

3.1 Feature Extraction

3.1.1 SHoG Feature Extraction

Local image features based on the histogram of

oriented gradients (HoG) representation have been

widely adopted (Mikolajczyk and Schmid, 2005;

Dalal and Triggs, 2005; Brown et al., 2011). Rotating

the sampling coordinate frame according to the dom-

inant local image gradient orientation provides a gen-

eral way to achieve rotation invariance for local image

features. In this work we adopt an alternative well es-

tablished, but simpler, method to afford a substantial

degree of rotation invariance within standard HoG. A

single patch is ﬁrst weighted by a Gaussian function

and represented by a gradient orientation distribution

histogram. In the histogram, the location of the high-

est bin, i.e. exhibiting the dominant gradient orien-

tation, is barrel-shifted to the head of the histogram,

which means the histogram starts with the frequency

value of the dominant orientation, Fig.2. Therefore,

we achieve rotation invariance by simply shifting the

histogram rather than rotating and resampling the im-

age coordinate frame as shown in Fig.6. We term this

orientation normalised HoG as SHoG and the pseu-

docode for its construction is given in Algorithm 1.

Algorithm 1: SHoG Construction.

HoG: Histogram of Oriented Gradient

Num

Bin: Number of Bins in HoG

Max: Max HoG Bin value

Index: Index to the Max HoG Bin

e ← 0

for i ← Index : Num Bin do

e ← e+ 1

SHoG(e) ← HoG(i)

end for

r ← Num Bin− Index+ 1

for i ← 1 : (Index− 1) do

r ← r +1

SHoG(r) ← H(i)

end for

frequency

bin

HoG

…..

frequency

bin

…..

SHoG

peak

Figure 2: Local patch represented by HoG and SHoG.

3.1.2 HexHoG Feature Extraction

Based on SHoG, we investigate a hexagon grouping

mechanism which is similar to DAISY (Tola et al.,

2010) with the difference that this hexagon grouped

local descriptor HexHoG can be recursively con-

structed to generate hierarchical descriptors. More-

over, unlike DAISY, HexHoG is substantially rota-

tionally invariant.

A hexagon has its inherent rotational symmetry

in geometry, which contributes to rotation invari-

ance over a certain angular range. The hexagonally

grouped local regions comprising HexHoG are con-

structed as shown in Fig.3. Each black circle rep-

resents a locally sampled region represented by an

SHoG descriptor. Each black circle centre is a sam-

pling point located on a hexagon vertex, and the cen-

tre point marks the sampling point at the centre of

the hexagon on which each HexHoG group is con-

structed. Since we sample SHoG ﬁelds at not only the

hexagon vertices but also the centre of the hexagon

group, 7 rather than 6 SHoG ﬁelds are grouped to-

gether. Therefore, strictly we are computing a sep-

timal, i.e. 7 element, grouping based on hexagonal

geometry.

Dominant Orientation of the Covered Region

Figure 3: The ﬁrst level HexHoG structure.

We can freely set both the radius of the circular re-

gions denoting each SHoG ﬁeld and the distance be-

tween neighbouring sampling points. These parame-

ters control the overlap between the SHoG ﬁelds of

each grouping, which inﬂuences the degree of rota-

tion invariance of the ﬁnal HexHoG descriptor and

also the distinctiveness of this representation. We

compute the dominant orientation of the region cov-

ered by red dashed circle by computing a HoG ﬁeld

spatially weighted by a Gaussian envelope, and there-

after selecting the peak HoG orientation bin, as per-

formed in SIFT.

The above protocol determines where to sample

the 6 vertexes of the hexagon once the hexagon cen-

ter has been ﬁxed. Three sampling points, includ-

ing the center point, are co-aligned in the direction

of the dominant orientation. Then we can gener-

VISAPP2014-InternationalConferenceonComputerVisionTheoryandApplications

658

ate this hexagonally grouped feature by concatenating

SHoG

(

i=1,2,...7

) by ﬁrst assigning the central SHoG

descriptor to the head of the grouped descriptor, fol-

lowed by the SHoG descriptor which is aligned to the

dominant orientation. All of the remaining SHoG

de-

scriptors will subsequently be concatenated in anti-

clockwise order. The complete Level 1 HexHoG de-

scriptor is constructed about its centre point as fol-

lows:

L1 HexHoG = SHoG

,SHoG

,. .. ,SHoG

(1)

The feature is then normalized by its magnitude

to achieve robustness to illumination variations. This

process can be applied recursively to generate higher

level hexagonal descriptors using the same concate-

nating mechanism. Accordingly, L2

HexHoG is gen-

erated based on the seven L1

HexHoGs centred on

the red points in Fig.4. For clarity, we have enlarged

the ﬁrst level hexagon edge length to make it easier

to illustrate. The ordering mechanism used to con-

catenate the SHoG for L1

HexHoG is consistent with

the above description. However, the dominant ori-

entation for each region covered by a L1

HexHoG

group is deﬁned differently here, except for the cen-

tral L1

HexHoG group which retains its original dom-

inant orientation, computed when it was originally ex-

tracted, as described above. The pseuodocode to gen-

erate L1

HexHoG and L2 HexHoG is given in Algo-

rithm 2 and 3 respectively.

The blue arrow in Fig.4 shows the dominant

orientation of the central region covered by a

HexHoG. The dominant orientations of all the

other 6 L1 HexHoG

are deﬁned by the red arrows,

respectively, each of which illustrates the direction

from the whole group centre to the vertex of each cor-

responding hexagon. The right ﬁgure in Fig.4 illus-

trates how we generate a L1

HexHoG feature for the

red dashed region. Finally, the second levelhexagonal

feature is constructed by:

L2 HexHoG = L1 HexHoG

,L1 HexHoG

HexHoG

,..., L1 HexHoG

(2)

Figure 4: The second level HexHoG structure.

Algorithm 2: L1 HexHoG Construction.

< Px

,Py

>: HexHoG centre, i.e. Sample Point lo-

cation

r: Hexagon Side Length

θ :Dominant Orientation of the Sampled Point <

,Py

< Px

,Py

> (i ← 2...7): Six Vertex Positions of the

hexagon centred on < Px

,Py

ts ← 2pi/6

for i ← 1 : 6 do

tv ← (i− 1)ts+ θ

i+1

← Py

+ rsin(tv)

i+1

← Px

+ rcos(tv)

end for

for i ← 1 : 7 do

Construct SHoG

at Point < Px

,Py

end for

HexHoG ← Normalize(SHoG

SHoG

,...,SHoG

)

Algorithm 3: L2 HexHoG Construction.

< Px

,Py

>: HexHoG centre, i.e. Sample Point lo-

cation

r: Hexagon Side Length

: Deﬁned Dominant Orientation for the Sampled

Point < Px

,Py

< Px

,Py

> (i ← 2...7): the Six Vertex Positions of

the hexagon centred on < Px

,Py

: Deﬁned Dominant Orientation for the Sampled

Point < Px

,Py

ts ← 2pi/6

for i ← 1 : 6 do

tv ← (i− 1)ts+ θ

i+1

← Py

+ rsin(tv)

i+1

← Px

+ rcos(tv)

i+1

← tv

end for

for i ← 1 : 7 do

Construct L1

HexHoG at Point < Px

,Py

end for

L2 HexHoG ← Normalize(L1 HexHoG

HexHoG

,...,L1 HexHoG

)

3.2 Detection and Edge Contour

Labelling

3.2.1 Detection with Pose Estimation

The objective of this paper is to localize the edge

contours of an object within a cluttered background

based on a dense local edge matching process, where

this object has already been detected by conventional

sparse feature matching. Accordingly, this edge seg-

mentation process relies on a correct prior object de-

tection and classiﬁcation result and the quality of the

ContourLocalizationbasedonMatchingDenseHexHoGDescriptors

659

pose estimation obtained during the prior detection

and classiﬁcation process. Since SIFT (Lowe, 2004)

is an established benchmark for state-of-the-art per-

formance in object detection, in this paper, we adopt

SIFT in our experiments for object detection and ini-

tial pose estimation purposes. We directly match

sparse SIFT descriptors extracted from a test image to

the corresponding SIFT descriptors extracted from a

reference image, grouped and ﬁltered using the GHT

and RANSAC respectively. In order to obtain a more

accurate pose estimation, we perform a further reﬁne-

ment step by means of dense local HexHoG matching,

described as follows:

• 1. Compute edge label (edgel) maps for both the

test image and the corresponding reference image

using the Canny Edge Detector;

• 2. Project the edgels of the reference image into

the test image edgel map according to the initial

pose estimation;

• 3. Find the set of the test image edgels that neigh-

bour each projected edgel from the reference im-

age edgel map, within a constrained search area

for each projected edgel;

• 4. From the set of neighbouringtest-image edgels,

ﬁnd the best matching test image edgel for each

projected edge point by comparing their HexHoG

features, computed from the input images;

• 5. Re-estimate the pose transformation from the

reference image to the test image, based on all

the matched edgel-pair correspondences obtained

above.

The constrained search area reduces false-positive

matches between background clutter edgels and the

reference object’s edgels, while the use of edgel-

located feature matching provides many more feature

correspondences than corner-based features alone, es-

pecially when the reference object inherently lacks

corners, i.e. contains mainly smooth edge contours.

Validation. A validation method is required to evalu-

ate how well the proposed pose estimation reﬁnement

method performs. For each test image, we record

ground-truth information specifying the rotation and

translation used to embed the reference object pix-

els into a background image. Therefore, we know

the precise location of edge contours of the reference

object in the test image. According to the pose esti-

mation provided by the image matching process (ei-

ther SIFT or dense HexHoG), the estimated object

edgel positions are obtained by projecting the refer-

ence edgels into the test image. For each reference

edgel, the distance between its estimated position and

its ground-truth position is then computed to give its

pose estimation error. The mean and standard devia-

tion of matched point displacement error for the test

set is used to evaluate pose estimation performance.

3.2.2 Edge Contour Labelling

Object edge contour labelling is implemented follow-

ing pose estimation reﬁnement. Estimated edgel po-

sitions in the test image are found by projecting the

reference edgels using the reﬁned pose estimation

transformation. The search process, constrained to a

limited range in X and Y, is then repeated to match

between the edgels positions estimated using sparse

matching and the edgels in the test edgel map. The

edgels within the test image which match to the pro-

jected reference image edgels are then labelled in the

test image as being contour edgels. An edge connec-

tivity post-process is then executed as follows: If an

edgel in the test image is labelled as contour edgel,

all connected edgels (comprising its 8 nearest neigh-

bours) will be likewise labelled. This process is then

repeated for each newly labeled contour edgel. We

perform 6 iterations in our experiment in order to

label those edgels which comprise the object’s edge

contours and thereby potentially capture the shape of

the detected object in terms of observed edgels.

4 EXPERIMENTAL RESULTS

The data employed in our validation experiments has

been obtained from the Amsterdam Library of Object

Images (ALOI) (Geusebroek et al., 2005). A selection

of test object images is shown in Fig.5. The top row

comprises objects randomly selected from ALOI; the

middle row shows in-plane rotated versions; the bot-

tom row shows rotated objects embedded into a back-

ground. We ﬁx the Gaussian weighted patch size to

be 7 pixels wide for SHoG, and the sampling hexagon

edge length to 3 pixels, which results in the HexHoG

grouping structure shown in Fig.3.

Figure 5: Examples of the data used in our experiments.

VISAPP2014-InternationalConferenceonComputerVisionTheoryandApplications

660

4.1 Rotation Invariance Performance

The performance of local feature matching in terms

of rotation invariance is evaluated for both HoG and

our proposed features. We randomly select 20 differ-

ent images from ALOI as a reference set, and rotate

each image by 1

◦

per step in range [0,90]

◦

to gener-

ate a set of test images, respectively. For each rota-

tion, a set of keypoints is detected using the Fast Cor-

ner Detector (Rosten and Drummond, 2006). The de-

scriptor for each keypoint in each reference image is

computed and compared to the descriptor of the cor-

responding point in each test image. We record the

dot product of the corresponding descriptors and com-

pute the average dot product over 20 different test im-

ages as a function of degree of in-plane rotation. The

performance obtained using HoG and our proposed

features to match local features is illustrated in Fig.6.

In our system, 8 histogram bins are used to record

the relative frequency of 8 local gradient orientation

directions. This explains the periodic performance

observed every 45

◦

for all our proposed features in

Fig. 6, Although the rotation invariance of the feature

is getting weaker with the grouping level increased,

HexHoG can still give the matching dot product

greater than 0.8, which is the matching threshold we

applied through our system. On the other hand, the

performance of HoG declines monotonically with ob-

ject rotation, falling below an average dot product of

0.8 at around 25

◦

of in-plane rotation.

0 10 20 30 40 50 60 70 80 90

0.4

0.5

0.6

0.7

0.8

0.9

Rotation Degree

Mean Dot

HoG

SHoG

HexHoG1

HexHoG2

HexHoG3

Figure 6: Local feature matching performance.

4.2 Pose Reﬁnement Performance

Before the pose estimation reﬁnement process can be

implemented, we must ﬁrst decide which level Hex-

HoG feature to adopt for local edgel matching. We

devised the following experiment to determine the

displacement error resulting from local edge match-

ing: 20 different images from ALOI are randomly se-

lected as a reference set and then rotated incremen-

tally to form a test set. Therefore, for each edgel

in each test image we generate, we know the corre-

sponding edgel in the reference image original. The

HexHoG feature for each reference image edgel is

then computed and compared to the features com-

puted within a local neighbourhood of 2 pixels in ra-

dius, centered on the corresponding test image edgel.

We ﬁnd the best dot product match and record its po-

sition. The spatial distance between the matched po-

sition and the corresponding true feature position is

computed for each reference edgel as the displace-

ment error for local matching. Thereafter the average

error is computed over 20 reference images and we

obtain the displacement error distribution as a func-

tion of rotation for 3 levels of feature grouping, as

shown in Fig.7. The level3 HexHoG feature gives

the smallest displacement error for all applied rota-

tions, which suggests that L3

HexHoG will give better

localisation performance compared to our other, less

grouped, features for the purpose of pose estimation

reﬁnement.

0 20 40 60 80 100

0.2

0.4

0.6

0.8

1.2

1.4

1.6

1.8

Rotation degrees

Mean of displacement err (pixels)

HexHoG1

HexHoG2

HexHoG3

Figure 7: Displacement error for local HexHoG matching.

We investigated pose reﬁnement performance

with respect to the constrained search bounds, by

varying the X,Y search range from ± 1 to ± 10 pixels,

and computing the reﬁned pose estimation error ac-

cordingly. By comparing the reﬁned pose estimation

error to the initial pose estimation error for each test

image, we can determine the number of test images

which exhibit an improvement in pose estimation due

to the reﬁnement process. Both the average pixel er-

ror and the standard deviation for the entire test set of

initial estimations, and reﬁned estimations, are also

computed. We employed all 1000 different objects

from the ALOI database as reference images to val-

idate our local matching approach to contour edgel

labelling. Each of these reference images is randomly

rotated in-plane and embedded into 5 different back-

grounds respectively to generate a test dataset com-

prising 5000 images. Fig.5 illustrates examples of the

image sets described above.

In Table.1, we present (in pixel units) the mean er-

ror and standard deviation of the reﬁned pose estima-

ContourLocalizationbasedonMatchingDenseHexHoGDescriptors

661

Figure 8: Failed examples of detection by SIFT: the ﬁrst column shows the reference objects; the remaining columns show

the test objects with backgrounds.

Table 1: Pose estimation reﬁnement performance.

Search range ± Mean StdDev No. Improved Pose Est. Improv. Ratio

1 1.35 2.68 2807 97.06

2 0.94 2.93 2771 95.82

3 0.84 2.89 2757 95.33

4 0.91 6.42 2738 94.67

5 0.83 2.84 2720 94.05

6 0.84 2.59 2696 93.22

7 0.86 2.56 2669 92.29

8 0.89 2.57 2637 91.18

9 0.92 2.58 2607 90.15

10 0.99 2.73 2560 88.52

Initial Pose Estimate 2.20 2.69 0 0

tion for the test dataset matched using different search

bounds and also the initial error in pose estimation ob-

tained using SIFT. The results in Table.1 conﬁrm that

the pose estimation reﬁnement process improves the

mean pose estimation error for the whole test dataset

by approximately a factor of 2. All test images were

ﬁrst classiﬁed by means of SIFT matching, employing

the GHT and RANSAC for pose estimation. When

the object of interest has less distinctive corners and is

not sufﬁciently distinguishable from the background,

SIFT will fail to detect such an object. In this ex-

periment, 2892 images were successfully detected out

of 5000 images in total. A selection of failed exam-

ples is shown in Fig.8. Consequently, we only ap-

ply our pose estimation reﬁnement and edge labelling

process to test image examples containing success-

fully detected object instances. The number of im-

proved object pose estimations and their correspond-

ing fraction of the test set is also presented in Table.1.

When the search range for edgel matching was con-

strained to less than 10 pixels, the HexHoG based

pose estimator achieved an improvement in over 90%

of the initially successful object detections. We can

observe in Table.1 that the mean pose estimation er-

ror is least for a search range in the region of ±5 or

±6 pixels (as a reference point for comparison, the

HexHoG used for matching is 28 pixels in diam-

eter). However, the number of pose estimations that

exhibit an improvement declines monotonically with

search range. Therefore, there is a tradeoff between

the degree of pose reﬁnement and the number of ob-

ject detections that are improved. For subsequent

edge contour labelling experiments, reported below,

we choose a search range of ±6 pixels. A selection of

examples of post pose estimation reﬁnement is illus-

trated in Fig.9.

4.3 Edge Labelling Performance

Finally, we re-applied dense local edge matching in

order to label directly the edgels detected within the

test image that comprise the contour edgels of the ob-

ject of interest, rather than project edgels from the

reference image into the test image, according to the

recovered pose estimation (using a ± 6 pixel search

range). Fig .10 shows examples of the labelling re-

sults we obtained by matching three different group-

ing levels of HexHoG descriptor. When the im-

age background is very cluttered, or the object outer

boundary is not easily distinguished from the back-

ground, missed object boundary detections can re-

sult and background edgels close to the object can

be mis-labelled as belonging to the object. We can

observe in Fig .10 that each level of HexHoG de-

scriptor produces slightly different labellings, making

it difﬁcult to conclude which level HexHoG feature

grouping gives best results. It would appear that the

distraction from the background is greater for larger

higher level descriptors (which straddle both the ob-

ject boundary and the background to a greater de-

VISAPP2014-InternationalConferenceonComputerVisionTheoryandApplications

662

(a) Initial projection (b) Reﬁned projection

(e) Initial projection (f) Reﬁned projection

(g) Initial projection (h) Reﬁned projection

Figure 9: Edge projection from the reference objects into the test images according to the initial and reﬁned pose estimation:

the ﬁrst two rows show the examples with improvement from the reﬁned projection; the last two rows show the examples

which failed to achieve improvement from the reﬁned projection

ContourLocalizationbasedonMatchingDenseHexHoGDescriptors

663

(a) Level1 (b) Level2 (c) Level3

(d) Level1 (e) Level2 (f) Level3

(g) Level1 (h) Level2 (i) Level3

(j) Level1 (k) Level2 (l) Level3

(m) Level1 (n) Level2 (o) Level3

Figure 10: Object edge contour labelling results: from the ﬁrst column to the third column, edge labelling results by using

HexHoG, L2 HexHoG, L2 HexHoG, are shown respectively.

VISAPP2014-InternationalConferenceonComputerVisionTheoryandApplications

664

gree) while the lower level descriptors have less re-

liability. Therefore, in our future research we pro-

pose to combine the different level features, perhaps

within a coarse-to-ﬁne search framework, in order to

optimize the labelling performance. In this case the

largest grouping would be ﬁrst matched and then the

search process repeated using successively low-level

groupings which are then matched using increasingly

constrained search bounds.

5 CONCLUSIONS

In this paper, we present a new hexagonally grouped

and rotationally invariant image descriptor, the Hex-

HoG, that can be computed recursively to generate hi-

erarchical features. Hierarchical grouping affordssuf-

ﬁcient discriminability to allow HexHoG descriptors

to be sampled at all detected edgel positions (as op-

posed to only corner locations) in order to match edge

contours between a reference and test image. Given

an initial class and pose for a detected object, we are

then able to apply dense local HexHoG matching, to

both improve the detected object’s pose estimation

and also directly label the edge contours of the object

as they appear in a test image. Therefore our pro-

posed methodology supports segmentation-through-

matching.

Our validation experiments show that matching

HexHoG features, which are based only on appear-

ance information computed at edgel locations, has the

potential to improve the performance of object pose

estimation by approximately a factor of 2. By im-

proving the accuracy of the pose estimation process,

it is then possible to project contours from the refer-

ence image into the test image and annotate the lo-

cation of a detected object with sufﬁcient accuracy

for many practical tasks such as grasping in robotics.

However, improvedpose estimation also improvesthe

search constraints required to match test image edge

contours directly, to allow HexHoG matching to offer

the possibility of recovering the actual edgel labels

detected in the test image that correspond to contour

edgels in the reference image, as described above.

Our results indicate that for purely afﬁne pose

transformations, the proposed scheme can recover a

signiﬁcant fraction of edgel labellings in the test im-

age. In many situations, where for example the pose

relationship between the target object contained in the

reference and test images is non-afﬁne, e.g. for out-

of-plane rotation or under projective distortion, dense

HexHoG feature matching has the potential to main-

tain pixel-accurate correspondences between the edge

contours detected within the test and reference object

images.

Our future work will focus on incorporating an

improved edge detector, hierarchical approaches to

matching the HexHoG features and improved post-

lablling processing for determining edgel connectiv-

ity and edgel contour shape representation.

ACKNOWLEDGEMENTS

The authors acknowledge ﬁnancial support from the

Chinese Scholarship Council, China, and the Eu-

ropean Union within the Strategic Research Project

Clopema, Project No. FP7-288553.

REFERENCES

Alahi, A., Ortiz, R., and Vandergheynst, P. (2012). Freak:

Fast retina keypoint. In Computer Vision and Pat-

tern Recognition (CVPR), 2012 IEEE Conference on,

pages 510–517. IEEE.

Borenstein, E. and Ullman, S. (2002). Class-speciﬁc, top-

down segmentation. In Computer VisionECCV 2002,

pages 109–122. Springer.

Borji, A. and Itti, L. (2012). Exploiting local and global

patch rarities for saliency detection. In Computer

Vision and Pattern Recognition (CVPR), 2012 IEEE

Conference on, pages 478–485. IEEE.

Brown, M., Hua, G., and Winder, S. (2011). Discriminative

learning of local image descriptors. Pattern Analy-

sis and Machine Intelligence, IEEE Transactions on,

33(1):43–57.

Dalal, N. and Triggs, B. (2005). Histograms of oriented gra-

dients for human detection. In Computer Vision and

Pattern Recognition, 2005. CVPR 2005. IEEE Com-

puter Society Conference on, volume 1, pages 886–

893. IEEE.

Felzenszwalb, P. F., Girshick, R. B., McAllester, D., and

Ramanan, D. (2010). Object detection with discrim-

inatively trained part-based models. Pattern Analy-

sis and Machine Intelligence, IEEE Transactions on,

32(9):1627–1645.

Ferrari, V., Jurie, F., and Schmid, C. (2010). From images

to shape models for object detection. International

Journal of Computer Vision, 87(3):284–303.

Geusebroek, J.-M., Burghouts, G. J., and Smeulders, A. W.

(2005). The amsterdam library of object images. In-

ternational Journal of Computer Vision, 61(1):103–

112.

Kontschieder, P., Riemenschneider, H., Donoser, M., and

Bischof, H. (2011). Discriminative learning of con-

tour fragments for object detection. In BMVC, pages

1–12.

Lazebnik, S., Schmid, C., and Ponce, J. (2006). Beyond

bags of features: Spatial pyramid matching for rec-

ognizing natural scene categories. In Computer Vi-

sion and Pattern Recognition, 2006 IEEE Computer

ContourLocalizationbasedonMatchingDenseHexHoGDescriptors

665

Society Conference on, volume 2, pages 2169–2178.

IEEE.

Leibe, B., Leonardis, A., and Schiele, B. (2008). Robust ob-

ject detection with interleaved categorization and seg-

mentation. International journal of computer vision,

77(1-3):259–289.

Lowe, D. G. (2004). Distinctive image features from scale-

invariant keypoints. International journal of computer

vision, 60(2):91–110.

Mikolajczyk, K. and Schmid, C. (2005). A perfor-

mance evaluation of local descriptors. Pattern Analy-

sis and Machine Intelligence, IEEE Transactions on,

27(10):1615–1630.

Murphy, K., Torralba, A., Eaton, D., and Freeman, W.

(2006). Object detection and localization using local

and global features. In Toward Category-Level Object

Recognition, pages 382–400. Springer.

Rosten, E. and Drummond, T. (2006). Machine learning

for high-speed corner detection. In Computer Vision–

ECCV 2006, pages 430–443. Springer.

Schlecht, J. and Ommer, B. (2011). Contour-based object

detection. In Proceedings of the British Machine Vi-

sion Conference. BVA Press.

Shotton, J., Blake, A., and Cipolla, R. (2005). Contour-

based learning for object detection. In Computer

Vision, 2005. ICCV 2005. Tenth IEEE International

Conference on, volume 1, pages 503–510. IEEE.

Tola, E., Lepetit, V., and Fua, P. (2010). Daisy: An efﬁcient

dense descriptor applied to wide-baseline stereo. Pat-

tern Analysis and Machine Intelligence, IEEE Trans-

actions on, 32(5):815–830.

Xu, Y., Quan, Y., Zhang, Z., Ji, H., Fermuller, C., Nishi-

gaki, M., and Dementhon, D. (2012). Contour-based

recognition. In Computer Vision and Pattern Recogni-

tion (CVPR), 2012 IEEE Conference on, pages 3402–

3409. IEEE.

Yu, S. and Shi, J. (2003). Object-speciﬁc ﬁgure-ground

segregation. In Computer Vision and Pattern Recogni-

tion, 2003. Proceedings. 2003 IEEE Computer Society

Conference on, volume 2, pages II–39. IEEE.

VISAPP2014-InternationalConferenceonComputerVisionTheoryandApplications

666