Contour Localization based on Matching Dense HexHoG Descriptors
Yuan Liu and J. Paul Siebert
School of Computing Science, University of Glasgow, Glasgow, U.K.
Keywords:
Feature Extraction, Local Matching, Object Detection, Edge Detection, Edge Contour Labelling, Segmenta-
tion Features, HexHoG Descriptors.
Abstract:
The ability to detect and localize an object of interest from a captured image containing a cluttered background
is an essential function for an autonomous robot operating in an unconstrained environment. In this paper, we
present a novel approach to refining the pose estimate of an object and directly labelling its contours by dense
local feature matching. We perform this task using a new image descriptor we have developed called the Hex-
HoG. Our key novel contribution is the formulation of HexHoG descriptors comprising hierarchical groupings
of rotationally invariant (S)HoG fields, sampled on a hexagonal grid. These HexHoG groups are centred on
detected edges and therefore sample the image relatively densely. This formulation allows arbitrary levels of
rotation-invariant HexHoG grouped descriptors to be implemented efficiently by recursion. We present the
results of an evaluation based on the ALOI image dataset which demonstrates that our proposed approach can
significantly improve an initial pose estimation based on image matching using standard SIFT descriptors.
In addition, this investigation presents promising contour labelling results based on processing 2892 images
derived from the 1000 image ALOI dataset.
1 INTRODUCTION
This paper addresses the issue of accurate object edge
contour localisation given an initial estimate of an ob-
ject’s pose with respect to its pose captured within a
reference image. Appearance-based methods (Dalal
and Triggs, 2005; Lazebnik et al., 2006; Murphy
et al., 2006; Felzenszwalb et al., 2010; Borji and
Itti, 2012) and contour-based methods (Kontschieder
et al., 2011; Schlecht and Ommer, 2011; Shotton
et al., 2005; Xu et al., 2012) for object detection have
been extensively studied in recent years. Appearance-
based methods represent the dominant approach to
object detection, and typically are based on a pipeline
that first extracts local patch features, and then em-
ploys a sliding window to scan across the whole im-
age to detect a target. Alternatively, the pipeline can
be structured to employ local features in order to de-
tect object parts, which can then be associated to-
gether to detect the whole target. Since an object’s
edge contours afford crucial information for visual
perception, edge contour-based approaches have also
been extensively developed. The edge contour repre-
sentation could be represented by local curvature in-
formation, or by the spatial structural relationship be-
tween edge fragments. Such edge contour represen-
tations can be employed individually for part match-
ing, or combined together to generate a shape model
suitable for whole object detection. It is inherently
difficult to extract the edge contours of an object di-
rectly, particularly when the object appears within
a cluttered background, since background structures
that intersect an object’s boundary tend to corrupt, or
distort, the extracted bounding edge contour. There-
fore, appearance-based methods are predominantly
used for object detection. However, the ability to lo-
calise an object’s boundaries would allow the pixels
representing the object to be specified, as opposed to
merely knowing the approximate position of a bound-
ing box containing the object, as currently afforded
by sparse local feature-based methods. Therefore, ac-
curately extracted edge contours could serve both to
segment an object from the scene and also to provide
a shape-based representation of the segmented object.
Accordingly, the combination of appearance-based
and edge contour-based methods (Schlecht and Om-
mer, 2011) has the potential to provide accurate object
localisation and additional information describing an
object’s semantics.
The principal contribution of this paper is a new
method for combining appearance and edge informa-
tion to detect and localise an object’s edge contours
656
Liu Y. and Siebert P..
Contour Localization based on Matching Dense HexHoG Descriptors.
DOI: 10.5220/0004744006560666
In Proceedings of the 9th International Conference on Computer Vision Theory and Applications (VISAPP-2014), pages 656-666
ISBN: 978-989-758-003-1
Copyright
c
2014 SCITEPRESS (Science and Technology Publications, Lda.)
within a cluttered background. A new feature descrip-
tor, the HexHoG, based on a hexagonal, hierarchical
grouping mechanism that confers it with sufficient re-
liability and distinctiveness to enable it to be used to
sample the image at all detected edgel positions (as
opposed to only corner locations). An initial pose
estimation is first obtained by means of sparse local
feature matching using a standard SIFT implementa-
tion. Based on this estimation result, a dense local
edge matching process is then applied using our new
HexHoG feature to refine the initial pose estimation,
and this refined pose estimation is then used to con-
strain local dense edge matching to obtain object edge
contour labelling (and correspondences between the
contour edgels detected in the test and reference im-
ages). Therefore, in this work we are not employing
HexHoG descriptors for object detection, but instead
utilising HexHoG descriptors for edge contour match-
ing, edge labelling and pose estimation refinement as
a post detection & classification process.
The proposed method is validated using the
dataset ALOI (Geusebroek et al., 2005). Our re-
sults show that our proposed method significantly
improves pose estimation refinement and exhibits
promising results for edge contour labelling. The re-
mainder of this paper is organized as follows: Section
2 presents a brief review of related work. Section 3
introduces our complete system for object pose esti-
mation and edge contour labelling. Our experimental
results are presented in Section 4, followed by the pa-
per’s conclusions.
2 RELATED WORK
Many object detection methods are able to achieve
approximate localization of an object within a clut-
tered background. Borenstein & Ullman (Boren-
stein and Ullman, 2002) propose a Top-Down class-
specific segmentation protocol to identify the struc-
ture of an object by means of high-level information,
instead of using the traditional image-based criteria.
Their method can detect an object which is labelled by
means of previously learned ’building blocks’, which
do not precisely delineate the pixels comprising the
detected object. Yu &Shi (Yu and Shi, 2003) present
an integration model incorporating low-level edge de-
tection and high-level patch detection to label an ob-
ject of interest segregated from the background. How-
ever, no statistical evaluation of this method is pre-
sented in (Yu and Shi, 2003). Leibe et al (Leibe et al.,
2008) contribute an Implicit Shape Model which af-
fords their system a greater degree of flexibility by en-
abling it to learn different object shapes and use these
shape models to categorize objects in novel images
whilst inferring a probabilistic segmentation, which
then in turn improves the robustness of the catego-
rization and detection processes. Schlecht & Om-
mer (Schlecht and Ommer, 2011) propose a method
for complementing appearance information with con-
tour information in order to detect an object within a
bounding box. Neither of these above two methods
provide precise object boundaries which would allow
the shape of segmented objects to be represented and
recovered. Ferrari et al (Ferrari et al., 2010) provide a
detection method by learning an object shape model
represented using local contour features. Novelobject
instances could be localized in new images and the
object boundaries were labelled rather than just being
contained within a bounding box. A significant lim-
itation of this system, however, is the computational
cost of its learning process.
Feature extraction has been explored extensively
in the context of object detection and localization.
Gradient histogram-based descriptors have been re-
searched intensively and applied widely for this pur-
pose. Local densely sampled descriptors have been
reported to give promising results in human detection
(Dalal and Triggs, 2005) and wide-baseline match-
ing (Tola et al., 2010), although such descriptors do
not usually posses the property of rotation invari-
ance. Sparse, distinctive features (Lowe, 2004; Miko-
lajczyk and Schmid, 2005; Alahi et al., 2012) achieve
rotation invariance by rotating the local sampling co-
ordinate frame according to the local dominant gradi-
ent orientation direction prior to compute an orien-
tated gradient histogram distribution. Accordingly,
this rotation normalization process is expensive to
compute and is therefore inherently unsuitable when
dense feature extraction is required. Furthermore,
such features do not extract object edge information,
which affords a crucial cue for visual perception.
3 APPROACH
In this section, we give the details of our proposed
methods based on HexHoG feature extraction and
dense local edge matching. The overview of our sys-
tem is summarized in Fig.1.
Dense local edge matching for pose
estimation refinement
Images inputted
Detection and initial pose
estimation
Dense local edge matching for
edge contour labelling
Edge detection
Figure 1: The overview of our system.
ContourLocalizationbasedonMatchingDenseHexHoGDescriptors
657
3.1 Feature Extraction
3.1.1 SHoG Feature Extraction
Local image features based on the histogram of
oriented gradients (HoG) representation have been
widely adopted (Mikolajczyk and Schmid, 2005;
Dalal and Triggs, 2005; Brown et al., 2011). Rotating
the sampling coordinate frame according to the dom-
inant local image gradient orientation provides a gen-
eral way to achieve rotation invariance for local image
features. In this work we adopt an alternative well es-
tablished, but simpler, method to afford a substantial
degree of rotation invariance within standard HoG. A
single patch is first weighted by a Gaussian function
and represented by a gradient orientation distribution
histogram. In the histogram, the location of the high-
est bin, i.e. exhibiting the dominant gradient orien-
tation, is barrel-shifted to the head of the histogram,
which means the histogram starts with the frequency
value of the dominant orientation, Fig.2. Therefore,
we achieve rotation invariance by simply shifting the
histogram rather than rotating and resampling the im-
age coordinate frame as shown in Fig.6. We term this
orientation normalised HoG as SHoG and the pseu-
docode for its construction is given in Algorithm 1.
Algorithm 1: SHoG Construction.
HoG: Histogram of Oriented Gradient
Num
Bin: Number of Bins in HoG
Max: Max HoG Bin value
Index: Index to the Max HoG Bin
e 0
for i Index : Num Bin do
e e+ 1
SHoG(e) HoG(i)
end for
r Num Bin Index+ 1
for i 1 : (Index 1) do
r r +1
SHoG(r) H(i)
end for
frequency
bin
HoG
…..
frequency
bin
…..
SHoG
peak
peak
Figure 2: Local patch represented by HoG and SHoG.
3.1.2 HexHoG Feature Extraction
Based on SHoG, we investigate a hexagon grouping
mechanism which is similar to DAISY (Tola et al.,
2010) with the difference that this hexagon grouped
local descriptor HexHoG can be recursively con-
structed to generate hierarchical descriptors. More-
over, unlike DAISY, HexHoG is substantially rota-
tionally invariant.
A hexagon has its inherent rotational symmetry
in geometry, which contributes to rotation invari-
ance over a certain angular range. The hexagonally
grouped local regions comprising HexHoG are con-
structed as shown in Fig.3. Each black circle rep-
resents a locally sampled region represented by an
SHoG descriptor. Each black circle centre is a sam-
pling point located on a hexagon vertex, and the cen-
tre point marks the sampling point at the centre of
the hexagon on which each HexHoG group is con-
structed. Since we sample SHoG fields at not only the
hexagon vertices but also the centre of the hexagon
group, 7 rather than 6 SHoG fields are grouped to-
gether. Therefore, strictly we are computing a sep-
timal, i.e. 7 element, grouping based on hexagonal
geometry.
Dominant Orientation of the Covered Region
1
2
7
6
5
4
3
Figure 3: The first level HexHoG structure.
We can freely set both the radius of the circular re-
gions denoting each SHoG field and the distance be-
tween neighbouring sampling points. These parame-
ters control the overlap between the SHoG fields of
each grouping, which influences the degree of rota-
tion invariance of the final HexHoG descriptor and
also the distinctiveness of this representation. We
compute the dominant orientation of the region cov-
ered by red dashed circle by computing a HoG field
spatially weighted by a Gaussian envelope, and there-
after selecting the peak HoG orientation bin, as per-
formed in SIFT.
The above protocol determines where to sample
the 6 vertexes of the hexagon once the hexagon cen-
ter has been fixed. Three sampling points, includ-
ing the center point, are co-aligned in the direction
of the dominant orientation. Then we can gener-
VISAPP2014-InternationalConferenceonComputerVisionTheoryandApplications
658
ate this hexagonally grouped feature by concatenating
SHoG
i
(
i=1,2,...7
) by first assigning the central SHoG
descriptor to the head of the grouped descriptor, fol-
lowed by the SHoG descriptor which is aligned to the
dominant orientation. All of the remaining SHoG
i
de-
scriptors will subsequently be concatenated in anti-
clockwise order. The complete Level 1 HexHoG de-
scriptor is constructed about its centre point as fol-
lows:
L1 HexHoG = SHoG
1
,SHoG
2
,SHoG
3
,. .. ,SHoG
7
(1)
The feature is then normalized by its magnitude
to achieve robustness to illumination variations. This
process can be applied recursively to generate higher
level hexagonal descriptors using the same concate-
nating mechanism. Accordingly, L2
HexHoG is gen-
erated based on the seven L1
HexHoGs centred on
the red points in Fig.4. For clarity, we have enlarged
the first level hexagon edge length to make it easier
to illustrate. The ordering mechanism used to con-
catenate the SHoG for L1
HexHoG is consistent with
the above description. However, the dominant ori-
entation for each region covered by a L1
HexHoG
group is defined differently here, except for the cen-
tral L1
HexHoG group which retains its original dom-
inant orientation, computed when it was originally ex-
tracted, as described above. The pseuodocode to gen-
erate L1
HexHoG and L2 HexHoG is given in Algo-
rithm 2 and 3 respectively.
The blue arrow in Fig.4 shows the dominant
orientation of the central region covered by a
L1
HexHoG. The dominant orientations of all the
other 6 L1 HexHoG
i
are defined by the red arrows,
respectively, each of which illustrates the direction
from the whole group centre to the vertex of each cor-
responding hexagon. The right figure in Fig.4 illus-
trates how we generate a L1
HexHoG feature for the
red dashed region. Finally, the second levelhexagonal
feature is constructed by:
L2 HexHoG = L1 HexHoG
1
,L1 HexHoG
2
,
L1
HexHoG
3
,..., L1 HexHoG
7
(2)
1
3
2
7
6
5
4
Figure 4: The second level HexHoG structure.
Algorithm 2: L1 HexHoG Construction.
< Px
1
,Py
1
>: HexHoG centre, i.e. Sample Point lo-
cation
r: Hexagon Side Length
θ :Dominant Orientation of the Sampled Point <
Px
1
,Py
1
>
< Px
i
,Py
i
> (i 2...7): Six Vertex Positions of the
hexagon centred on < Px
1
,Py
1
>
ts 2pi/6
for i 1 : 6 do
tv (i 1)ts+ θ
Py
i+1
Py
1
+ rsin(tv)
Px
i+1
Px
1
+ rcos(tv)
end for
for i 1 : 7 do
Construct SHoG
i
at Point < Px
i
,Py
i
>
end for
L1
HexHoG Normalize(SHoG
1
,
SHoG
2
,...,SHoG
7
)
Algorithm 3: L2 HexHoG Construction.
< Px
1
,Py
1
>: HexHoG centre, i.e. Sample Point lo-
cation
r: Hexagon Side Length
θ
1
: Defined Dominant Orientation for the Sampled
Point < Px
1
,Py
1
>
< Px
i
,Py
i
> (i 2...7): the Six Vertex Positions of
the hexagon centred on < Px
1
,Py
1
>
θ
i
: Defined Dominant Orientation for the Sampled
Point < Px
i
,Py
i
>
ts 2pi/6
for i 1 : 6 do
tv (i 1)ts+ θ
Py
i+1
Py
1
+ rsin(tv)
Px
i+1
Px
1
+ rcos(tv)
θ
i+1
tv
end for
for i 1 : 7 do
Construct L1
HexHoG at Point < Px
i
,Py
i
>
end for
L2 HexHoG Normalize(L1 HexHoG
1
,
L1
HexHoG
2
,...,L1 HexHoG
7
)
3.2 Detection and Edge Contour
Labelling
3.2.1 Detection with Pose Estimation
The objective of this paper is to localize the edge
contours of an object within a cluttered background
based on a dense local edge matching process, where
this object has already been detected by conventional
sparse feature matching. Accordingly, this edge seg-
mentation process relies on a correct prior object de-
tection and classification result and the quality of the
ContourLocalizationbasedonMatchingDenseHexHoGDescriptors
659
pose estimation obtained during the prior detection
and classification process. Since SIFT (Lowe, 2004)
is an established benchmark for state-of-the-art per-
formance in object detection, in this paper, we adopt
SIFT in our experiments for object detection and ini-
tial pose estimation purposes. We directly match
sparse SIFT descriptors extracted from a test image to
the corresponding SIFT descriptors extracted from a
reference image, grouped and filtered using the GHT
and RANSAC respectively. In order to obtain a more
accurate pose estimation, we perform a further refine-
ment step by means of dense local HexHoG matching,
described as follows:
1. Compute edge label (edgel) maps for both the
test image and the corresponding reference image
using the Canny Edge Detector;
2. Project the edgels of the reference image into
the test image edgel map according to the initial
pose estimation;
3. Find the set of the test image edgels that neigh-
bour each projected edgel from the reference im-
age edgel map, within a constrained search area
for each projected edgel;
4. From the set of neighbouringtest-image edgels,
find the best matching test image edgel for each
projected edge point by comparing their HexHoG
features, computed from the input images;
5. Re-estimate the pose transformation from the
reference image to the test image, based on all
the matched edgel-pair correspondences obtained
above.
The constrained search area reduces false-positive
matches between background clutter edgels and the
reference object’s edgels, while the use of edgel-
located feature matching provides many more feature
correspondences than corner-based features alone, es-
pecially when the reference object inherently lacks
corners, i.e. contains mainly smooth edge contours.
Validation. A validation method is required to evalu-
ate how well the proposed pose estimation refinement
method performs. For each test image, we record
ground-truth information specifying the rotation and
translation used to embed the reference object pix-
els into a background image. Therefore, we know
the precise location of edge contours of the reference
object in the test image. According to the pose esti-
mation provided by the image matching process (ei-
ther SIFT or dense HexHoG), the estimated object
edgel positions are obtained by projecting the refer-
ence edgels into the test image. For each reference
edgel, the distance between its estimated position and
its ground-truth position is then computed to give its
pose estimation error. The mean and standard devia-
tion of matched point displacement error for the test
set is used to evaluate pose estimation performance.
3.2.2 Edge Contour Labelling
Object edge contour labelling is implemented follow-
ing pose estimation refinement. Estimated edgel po-
sitions in the test image are found by projecting the
reference edgels using the refined pose estimation
transformation. The search process, constrained to a
limited range in X and Y, is then repeated to match
between the edgels positions estimated using sparse
matching and the edgels in the test edgel map. The
edgels within the test image which match to the pro-
jected reference image edgels are then labelled in the
test image as being contour edgels. An edge connec-
tivity post-process is then executed as follows: If an
edgel in the test image is labelled as contour edgel,
all connected edgels (comprising its 8 nearest neigh-
bours) will be likewise labelled. This process is then
repeated for each newly labeled contour edgel. We
perform 6 iterations in our experiment in order to
label those edgels which comprise the object’s edge
contours and thereby potentially capture the shape of
the detected object in terms of observed edgels.
4 EXPERIMENTAL RESULTS
The data employed in our validation experiments has
been obtained from the Amsterdam Library of Object
Images (ALOI) (Geusebroek et al., 2005). A selection
of test object images is shown in Fig.5. The top row
comprises objects randomly selected from ALOI; the
middle row shows in-plane rotated versions; the bot-
tom row shows rotated objects embedded into a back-
ground. We fix the Gaussian weighted patch size to
be 7 pixels wide for SHoG, and the sampling hexagon
edge length to 3 pixels, which results in the HexHoG
grouping structure shown in Fig.3.
Figure 5: Examples of the data used in our experiments.
VISAPP2014-InternationalConferenceonComputerVisionTheoryandApplications
660
4.1 Rotation Invariance Performance
The performance of local feature matching in terms
of rotation invariance is evaluated for both HoG and
our proposed features. We randomly select 20 differ-
ent images from ALOI as a reference set, and rotate
each image by 1
per step in range [0,90]
to gener-
ate a set of test images, respectively. For each rota-
tion, a set of keypoints is detected using the Fast Cor-
ner Detector (Rosten and Drummond, 2006). The de-
scriptor for each keypoint in each reference image is
computed and compared to the descriptor of the cor-
responding point in each test image. We record the
dot product of the corresponding descriptors and com-
pute the average dot product over 20 different test im-
ages as a function of degree of in-plane rotation. The
performance obtained using HoG and our proposed
features to match local features is illustrated in Fig.6.
In our system, 8 histogram bins are used to record
the relative frequency of 8 local gradient orientation
directions. This explains the periodic performance
observed every 45
for all our proposed features in
Fig. 6, Although the rotation invariance of the feature
is getting weaker with the grouping level increased,
L3
HexHoG can still give the matching dot product
greater than 0.8, which is the matching threshold we
applied through our system. On the other hand, the
performance of HoG declines monotonically with ob-
ject rotation, falling below an average dot product of
0.8 at around 25
of in-plane rotation.
0 10 20 30 40 50 60 70 80 90
0.4
0.5
0.6
0.7
0.8
0.9
1
Rotation Degree
Mean Dot
HoG
SHoG
HexHoG1
HexHoG2
HexHoG3
Figure 6: Local feature matching performance.
4.2 Pose Refinement Performance
Before the pose estimation refinement process can be
implemented, we must first decide which level Hex-
HoG feature to adopt for local edgel matching. We
devised the following experiment to determine the
displacement error resulting from local edge match-
ing: 20 different images from ALOI are randomly se-
lected as a reference set and then rotated incremen-
tally to form a test set. Therefore, for each edgel
in each test image we generate, we know the corre-
sponding edgel in the reference image original. The
HexHoG feature for each reference image edgel is
then computed and compared to the features com-
puted within a local neighbourhood of 2 pixels in ra-
dius, centered on the corresponding test image edgel.
We find the best dot product match and record its po-
sition. The spatial distance between the matched po-
sition and the corresponding true feature position is
computed for each reference edgel as the displace-
ment error for local matching. Thereafter the average
error is computed over 20 reference images and we
obtain the displacement error distribution as a func-
tion of rotation for 3 levels of feature grouping, as
shown in Fig.7. The level3 HexHoG feature gives
the smallest displacement error for all applied rota-
tions, which suggests that L3
HexHoG will give better
localisation performance compared to our other, less
grouped, features for the purpose of pose estimation
refinement.
0 20 40 60 80 100
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Rotation degrees
Mean of displacement err (pixels)
HexHoG1
HexHoG2
HexHoG3
Figure 7: Displacement error for local HexHoG matching.
We investigated pose refinement performance
with respect to the constrained search bounds, by
varying the X,Y search range from ± 1 to ± 10 pixels,
and computing the refined pose estimation error ac-
cordingly. By comparing the refined pose estimation
error to the initial pose estimation error for each test
image, we can determine the number of test images
which exhibit an improvement in pose estimation due
to the refinement process. Both the average pixel er-
ror and the standard deviation for the entire test set of
initial estimations, and refined estimations, are also
computed. We employed all 1000 different objects
from the ALOI database as reference images to val-
idate our local matching approach to contour edgel
labelling. Each of these reference images is randomly
rotated in-plane and embedded into 5 different back-
grounds respectively to generate a test dataset com-
prising 5000 images. Fig.5 illustrates examples of the
image sets described above.
In Table.1, we present (in pixel units) the mean er-
ror and standard deviation of the refined pose estima-
ContourLocalizationbasedonMatchingDenseHexHoGDescriptors
661
Figure 8: Failed examples of detection by SIFT: the rst column shows the reference objects; the remaining columns show
the test objects with backgrounds.
Table 1: Pose estimation refinement performance.
Search range ± Mean StdDev No. Improved Pose Est. Improv. Ratio
1 1.35 2.68 2807 97.06
2 0.94 2.93 2771 95.82
3 0.84 2.89 2757 95.33
4 0.91 6.42 2738 94.67
5 0.83 2.84 2720 94.05
6 0.84 2.59 2696 93.22
7 0.86 2.56 2669 92.29
8 0.89 2.57 2637 91.18
9 0.92 2.58 2607 90.15
10 0.99 2.73 2560 88.52
Initial Pose Estimate 2.20 2.69 0 0
tion for the test dataset matched using different search
bounds and also the initial error in pose estimation ob-
tained using SIFT. The results in Table.1 confirm that
the pose estimation refinement process improves the
mean pose estimation error for the whole test dataset
by approximately a factor of 2. All test images were
first classified by means of SIFT matching, employing
the GHT and RANSAC for pose estimation. When
the object of interest has less distinctive corners and is
not sufficiently distinguishable from the background,
SIFT will fail to detect such an object. In this ex-
periment, 2892 images were successfully detected out
of 5000 images in total. A selection of failed exam-
ples is shown in Fig.8. Consequently, we only ap-
ply our pose estimation refinement and edge labelling
process to test image examples containing success-
fully detected object instances. The number of im-
proved object pose estimations and their correspond-
ing fraction of the test set is also presented in Table.1.
When the search range for edgel matching was con-
strained to less than 10 pixels, the HexHoG based
pose estimator achieved an improvement in over 90%
of the initially successful object detections. We can
observe in Table.1 that the mean pose estimation er-
ror is least for a search range in the region of ±5 or
±6 pixels (as a reference point for comparison, the
L3
HexHoG used for matching is 28 pixels in diam-
eter). However, the number of pose estimations that
exhibit an improvement declines monotonically with
search range. Therefore, there is a tradeoff between
the degree of pose refinement and the number of ob-
ject detections that are improved. For subsequent
edge contour labelling experiments, reported below,
we choose a search range of ±6 pixels. A selection of
examples of post pose estimation refinement is illus-
trated in Fig.9.
4.3 Edge Labelling Performance
Finally, we re-applied dense local edge matching in
order to label directly the edgels detected within the
test image that comprise the contour edgels of the ob-
ject of interest, rather than project edgels from the
reference image into the test image, according to the
recovered pose estimation (using a ± 6 pixel search
range). Fig .10 shows examples of the labelling re-
sults we obtained by matching three different group-
ing levels of HexHoG descriptor. When the im-
age background is very cluttered, or the object outer
boundary is not easily distinguished from the back-
ground, missed object boundary detections can re-
sult and background edgels close to the object can
be mis-labelled as belonging to the object. We can
observe in Fig .10 that each level of HexHoG de-
scriptor produces slightly different labellings, making
it difficult to conclude which level HexHoG feature
grouping gives best results. It would appear that the
distraction from the background is greater for larger
higher level descriptors (which straddle both the ob-
ject boundary and the background to a greater de-
VISAPP2014-InternationalConferenceonComputerVisionTheoryandApplications
662
(a) Initial projection (b) Refined projection
(c) Initial projection (d) Refined projection
(e) Initial projection (f) Refined projection
(g) Initial projection (h) Refined projection
Figure 9: Edge projection from the reference objects into the test images according to the initial and refined pose estimation:
the first two rows show the examples with improvement from the refined projection; the last two rows show the examples
which failed to achieve improvement from the refined projection
ContourLocalizationbasedonMatchingDenseHexHoGDescriptors
663
(a) Level1 (b) Level2 (c) Level3
(d) Level1 (e) Level2 (f) Level3
(g) Level1 (h) Level2 (i) Level3
(j) Level1 (k) Level2 (l) Level3
(m) Level1 (n) Level2 (o) Level3
Figure 10: Object edge contour labelling results: from the first column to the third column, edge labelling results by using
L1
HexHoG, L2 HexHoG, L2 HexHoG, are shown respectively.
VISAPP2014-InternationalConferenceonComputerVisionTheoryandApplications
664
gree) while the lower level descriptors have less re-
liability. Therefore, in our future research we pro-
pose to combine the different level features, perhaps
within a coarse-to-fine search framework, in order to
optimize the labelling performance. In this case the
largest grouping would be first matched and then the
search process repeated using successively low-level
groupings which are then matched using increasingly
constrained search bounds.
5 CONCLUSIONS
In this paper, we present a new hexagonally grouped
and rotationally invariant image descriptor, the Hex-
HoG, that can be computed recursively to generate hi-
erarchical features. Hierarchical grouping affordssuf-
ficient discriminability to allow HexHoG descriptors
to be sampled at all detected edgel positions (as op-
posed to only corner locations) in order to match edge
contours between a reference and test image. Given
an initial class and pose for a detected object, we are
then able to apply dense local HexHoG matching, to
both improve the detected object’s pose estimation
and also directly label the edge contours of the object
as they appear in a test image. Therefore our pro-
posed methodology supports segmentation-through-
matching.
Our validation experiments show that matching
HexHoG features, which are based only on appear-
ance information computed at edgel locations, has the
potential to improve the performance of object pose
estimation by approximately a factor of 2. By im-
proving the accuracy of the pose estimation process,
it is then possible to project contours from the refer-
ence image into the test image and annotate the lo-
cation of a detected object with sufficient accuracy
for many practical tasks such as grasping in robotics.
However, improvedpose estimation also improvesthe
search constraints required to match test image edge
contours directly, to allow HexHoG matching to offer
the possibility of recovering the actual edgel labels
detected in the test image that correspond to contour
edgels in the reference image, as described above.
Our results indicate that for purely affine pose
transformations, the proposed scheme can recover a
significant fraction of edgel labellings in the test im-
age. In many situations, where for example the pose
relationship between the target object contained in the
reference and test images is non-affine, e.g. for out-
of-plane rotation or under projective distortion, dense
HexHoG feature matching has the potential to main-
tain pixel-accurate correspondences between the edge
contours detected within the test and reference object
images.
Our future work will focus on incorporating an
improved edge detector, hierarchical approaches to
matching the HexHoG features and improved post-
lablling processing for determining edgel connectiv-
ity and edgel contour shape representation.
ACKNOWLEDGEMENTS
The authors acknowledge financial support from the
Chinese Scholarship Council, China, and the Eu-
ropean Union within the Strategic Research Project
Clopema, Project No. FP7-288553.
REFERENCES
Alahi, A., Ortiz, R., and Vandergheynst, P. (2012). Freak:
Fast retina keypoint. In Computer Vision and Pat-
tern Recognition (CVPR), 2012 IEEE Conference on,
pages 510–517. IEEE.
Borenstein, E. and Ullman, S. (2002). Class-specific, top-
down segmentation. In Computer VisionECCV 2002,
pages 109–122. Springer.
Borji, A. and Itti, L. (2012). Exploiting local and global
patch rarities for saliency detection. In Computer
Vision and Pattern Recognition (CVPR), 2012 IEEE
Conference on, pages 478–485. IEEE.
Brown, M., Hua, G., and Winder, S. (2011). Discriminative
learning of local image descriptors. Pattern Analy-
sis and Machine Intelligence, IEEE Transactions on,
33(1):43–57.
Dalal, N. and Triggs, B. (2005). Histograms of oriented gra-
dients for human detection. In Computer Vision and
Pattern Recognition, 2005. CVPR 2005. IEEE Com-
puter Society Conference on, volume 1, pages 886–
893. IEEE.
Felzenszwalb, P. F., Girshick, R. B., McAllester, D., and
Ramanan, D. (2010). Object detection with discrim-
inatively trained part-based models. Pattern Analy-
sis and Machine Intelligence, IEEE Transactions on,
32(9):1627–1645.
Ferrari, V., Jurie, F., and Schmid, C. (2010). From images
to shape models for object detection. International
Journal of Computer Vision, 87(3):284–303.
Geusebroek, J.-M., Burghouts, G. J., and Smeulders, A. W.
(2005). The amsterdam library of object images. In-
ternational Journal of Computer Vision, 61(1):103–
112.
Kontschieder, P., Riemenschneider, H., Donoser, M., and
Bischof, H. (2011). Discriminative learning of con-
tour fragments for object detection. In BMVC, pages
1–12.
Lazebnik, S., Schmid, C., and Ponce, J. (2006). Beyond
bags of features: Spatial pyramid matching for rec-
ognizing natural scene categories. In Computer Vi-
sion and Pattern Recognition, 2006 IEEE Computer
ContourLocalizationbasedonMatchingDenseHexHoGDescriptors
665
Society Conference on, volume 2, pages 2169–2178.
IEEE.
Leibe, B., Leonardis, A., and Schiele, B. (2008). Robust ob-
ject detection with interleaved categorization and seg-
mentation. International journal of computer vision,
77(1-3):259–289.
Lowe, D. G. (2004). Distinctive image features from scale-
invariant keypoints. International journal of computer
vision, 60(2):91–110.
Mikolajczyk, K. and Schmid, C. (2005). A perfor-
mance evaluation of local descriptors. Pattern Analy-
sis and Machine Intelligence, IEEE Transactions on,
27(10):1615–1630.
Murphy, K., Torralba, A., Eaton, D., and Freeman, W.
(2006). Object detection and localization using local
and global features. In Toward Category-Level Object
Recognition, pages 382–400. Springer.
Rosten, E. and Drummond, T. (2006). Machine learning
for high-speed corner detection. In Computer Vision–
ECCV 2006, pages 430–443. Springer.
Schlecht, J. and Ommer, B. (2011). Contour-based object
detection. In Proceedings of the British Machine Vi-
sion Conference. BVA Press.
Shotton, J., Blake, A., and Cipolla, R. (2005). Contour-
based learning for object detection. In Computer
Vision, 2005. ICCV 2005. Tenth IEEE International
Conference on, volume 1, pages 503–510. IEEE.
Tola, E., Lepetit, V., and Fua, P. (2010). Daisy: An efficient
dense descriptor applied to wide-baseline stereo. Pat-
tern Analysis and Machine Intelligence, IEEE Trans-
actions on, 32(5):815–830.
Xu, Y., Quan, Y., Zhang, Z., Ji, H., Fermuller, C., Nishi-
gaki, M., and Dementhon, D. (2012). Contour-based
recognition. In Computer Vision and Pattern Recogni-
tion (CVPR), 2012 IEEE Conference on, pages 3402
3409. IEEE.
Yu, S. and Shi, J. (2003). Object-specific gure-ground
segregation. In Computer Vision and Pattern Recogni-
tion, 2003. Proceedings. 2003 IEEE Computer Society
Conference on, volume 2, pages II–39. IEEE.
VISAPP2014-InternationalConferenceonComputerVisionTheoryandApplications
666