D
xx
D
xy
D
xσ
D
xy
D
yy
D
yσ
D
xσ
D
yσ
D
σσ
x
y
σ
=
−D
x
−D
y
−D
σ
(6)
The derivatives are approximated by the differ-
ences of Gaussian in the 4-neighborhood of the key-
point. When the value of one component of x is
greater than 0.5 in one of the 3 dimensions, it means
that the extremum is closer to a neighbor of C than to
C itself. In this case C is changed to this new point
and the computation is performed again from its coor-
dinates. By at most five iterations of the process, the
obtained value of the offset will be considered as the
most accurate localization of the candidate keypoint.
It is then possible to compute the value of DoG at the
extremum
ˆ
x.
In order to reject the low contrasted points, the
point is rejected if the value of |D(
ˆ
x)| is less than a
threshold equal to 0.03. For all the process the pixel
values are normalized to the range [0...1].
In the following step, the candidate keypoints lo-
calized on edges of small curvature will be rejected
as their localization along the edge is difficult to es-
timate precisely. Hence, solely corners and points on
highly curved edges, like corners, will be kept as in-
terest points. In this goal, the principle curvatures are
estimated. These are proportional to the eigenvalues
α and β of the Hessian matrix.
Let r be the ratio between the greatest and lowest
eigenvalues α = rβ. The use of the determinant and
trace values of the Hessian matrix avoids the explicit
computation of the eigenvalues. Lowe suggests the
rejection of candidate keypoints having
Tr(H)
2
Det(H)
≥
(r
s
+ 1)
2
r
s
where r
s
= 10. (7)
2.3 Orientation Assignment
The detected keypoints are characterized by their co-
ordinates and the scale under which they were ex-
tracted. It is necessary to assign a consistent orien-
tation to each detected point to obtain an invariance
to rotation. For each keypoint (x, y), the closest scale
factor (σ) is chosen and the associated image L(x, y, σ)
at this scale is used for the computation of the mag-
nitude m(x, y) and the orientation θ(x, y) of the gradi-
ent: An histogram of orientations is established from
the image L, computed over a window centered at
(x, y, σ), of diameter c ∗σ, where c is a constant. Each
point of this windows contributes to the histogram bin
corresponding to the orientation of its gradient, by
adding a value of wm(x, y). This quantity is the mag-
nitude of the gradient weighted by a Gaussian func-
tion of the distance to the keypoint, of standard devi-
ation one and half the scale factor of the keypoint:
wm(x, y) = m(x, y)
1
2π(1.5σ)
2
e
−
dx
2
+dy
2
2(1.5σ)
2
(8)
m(x, y) is the magnitude of the gradient at the location
(x, y), dx and dy are the distances in x and y directions
to the keypoint, and σ is the scale factor of this lat-
ter. The histogram of orientations is subdivided into
36 bins, each covering an interval of 10 degrees. The
bin of maximal value characterizes the main orienta-
tion of the interest point. If other bins have a value
greater than 80% of the maximal value, new interest
points are created and are associated with these orien-
tations. The value of the main orientation is refined
from the peak bin of the histogram by detecting the
maximum of a parabola which fits the main orienta-
tion and its adjacent bins. This maximum is evaluated
as the angle for which the value of the derivative of
the parabola is zero. The histogram is used in a circu-
lar order such as the successor of the last orientation
is the first one. The keypoints are represented by four
values, (x, y, σ, θ), which denote respectively the po-
sition, the scale and the orientation of the keypoint,
granting its invariance to these parameters.
2.4 Descriptor Computation
The computation of a numerical descriptor for every
keypoint is the ultimate step of the SIFT algorithm. A
descriptor is a vector elaborated from the magnitudes
and orientations of the gradients in the neighborhood
of the point. It is computed from the image L(x, y, σ)
at the scale factor at which the point was detected. As
in section 2.3 the gradients magnitudes in the studied
region are weighted (equation 8) by a Gaussian func-
tion of standard deviation 1.5σ. This gives less em-
phasize to gradients far from the keypoint and hence
yields to a certain tolerance to small shifts in the win-
dow position. To grant invariance to rotation, all the
gradient orientations inside the descriptor window are
rotated relatively to the dominant orientation. Practi-
cally, the keypoint orientation is subtracted from ev-
ery gradient orientation to reach this result. Further-
more the descriptor window is rotated in the direction
of the keypoint orientation (Figure 2). This window
has the same size as in section 2.3.
The descriptor window is then subdivided into 16
regions (Figure 2), and an eight bin histogram of
orientations is computed for each. Each point con-
tributes to the bin of the histogram corresponding to
the orientation of its gradient. Its contribution is the
product of its weighted magnitude wm(x, y, σ) (equa-
tion 8) multiplied by an additional coefficient (1 −d),
A COMPREHENSIVE AND COMPARATIVE SURVEY OF THE SIFT ALGORITHM - Feature Detection, Description,
and Characterization
469