The patch in the scene is corrected by a transforma-
tion into a frontal view, whereupon it is subjected
to the Gabor filters and a matching of the descrip-
tors or jets can be performed. The following sections
describe how our depth compensation method works
(Sect. 3.1), the details of our descriptor (Sect. 3.2)
and how we use this for matching local image regions
(Sect. 3.3). For an overview of all parameters men-
tioned in the following sections, we refer the reader
to Tab. 1.
3.1 Depth Compensation
The depth compensation method removes the ef-
fects of depth rotations, which causes perspective
distortions. After this correction, there will still be
an unknown in-plane rotation and quantization arti-
facts. Many other methods achieve rotation invari-
ance by estimating the dominant orientation in the
patch (Lowe, 2004; Bay et al., 2008; Rublee et al.,
2011) before computing the descriptor. We have
tested this approach, but have found better results
when we exploited the rotation information in the Ga-
bor jet during matching. We will elaborate on this
particular aspect in Sect. 3.3.
Similar to earlier works (e.g. (Gossow et al.,
2012)), our method utilizes depth information to com-
pensate for depth rotations. Our method is tested on a
dataset captured by a commodity Kinect RGB-D sen-
sor, which provides an aligned depth map along with
the captured image. This allows us to apply the al-
gorithm of (Holzer et al., 2012) for fast extraction of
local surface normals at each surface point. This is
the first step of our algorithm. We include all sur-
face points in a neighborhood of radius r to estimate
the normal. We denote such an oriented surface point
(p, n), where p captures the point information and
n the surface normal. As mentioned previously, we
only consider interest points identified by an external
detector, which constitute a small subset of the total
RGB-D image.
For each interest point, for which we also have a
precomputed normal orientation, we now project the
camera axes x = [1 0 0]
T
and y = [0 1 0]
T
onto the
plane spanned by (p, n). Denote these projections x
n
and y
n
, respectively. These vectors along with the
vector n are then concatenated to provide the full ro-
tation frame R ∈ SO(3) between the actual view and
a virtual frontal view of the local image patch:
R = [x
n
y
n
n] (1)
We now wish to normalize the local depth rotated
patch to a frontal view. We start by defining a fronto-
parallel 3D planar patch by four anchor points. These
are placed at ±r
anchor
in the x and y directions and
c
depth
· d
avg
in the z direction, where d
avg
is computed
as the average depth of all interest points in a scene
(see also Tab. 1).
2
Denote these four points P
planar
.
Using the rotation matrix, R, and the interest points
position, p, each of these points can be placed around
the interest point in the current scene:
P
scene,i
= R · P
planar,i
+ p i ∈ {1, . . . , 4} (2)
We now use the camera matrix K to project both
the frontal anchor points and the anchor points around
the interest point to the image:
p
planar,i
= K · P
planar,i
(3)
p
scene,i
= K · P
scene,i
(4)
The final step of the normalization procedure now
amounts to simply estimating the homography be-
tween the 2D point sets p
planar
and p
scene
. This ho-
mography is applied to the full patch around the inter-
est point in the scene and provides a frontal normal-
ization, which is suitable for description. An example
is shown in the green frames in Fig. 1.
3.2 Descriptor
The descriptor employed in this work is based on Ga-
bor filter responses. Gabor filters (Granlund, 1978;
Daugman, 1985) have a long history in computer vi-
sion, where they have been used for e.g. face recog-
nition tasks (Wiskott et al., 1997). By modulating the
filter parameters, a bank of several Gabor filters can
be realized, providing a ”complete” coverage of the
frequency content of an image. We believe that this
makes Gabor filters very tractable for local feature
matching where the task is to capture the local con-
tent of a patch in a discriminative manner. The con-
crete Gabor filter bank used in this work is inspired
by (Ilonen et al., 2007), giving the function for the 2D
Gabor filter as follows:
G(x, y; θ) =
f
2
0
πσ
2
exp(
− f
2
0
σ
2
(x
02
+ y
02
))exp
i2π f
0
x
0
(5)
x
0
= x cos θ − y sinθ y
0
= x sin θ + y cosθ (6)
where x
0
, y
0
are the x and y coordinates rotated by an
angle of θ, f
0
is the normalized base frequency and σ
represents the standard deviation of the Gaussian en-
velope. The parameter values were chosen with (Ilo-
nen et al., 2007) as a basis with further adjustments
2
The dimensions and distance of this 3D patch are cho-
sen by experimentation, but varying them has little impact
on the performance of our descriptor.
VISAPP 2017 - International Conference on Computer Vision Theory and Applications
160