significant advances in computer vision. The goal of our project is essentially to clean
up, simplify and improve Lowe’s SIFT algorithm. We intend first to implement the al-
gorithm roughly as Lowe has defined it and then to make changes to it, gauging their
effectiveness in object recognition. Specifically, we intend to improve SIFT’s robust-
ness to illumination changes, which will be judged by recognition accuracy in various
outdoor scenes. We hope in general to improve the effectiveness of SIFT recognition
keys by experimenting with different keypoint-descriptor generation methods, trying to
maximize recognition scores with varying cameras and illuminations. We are creating
an image database along the way to test actual implementation and future changes to
the algorithm.
One of the major problem with SIFT is that the algorithm is not crisply defined and has
lots of free parameters; information provided by the Lowe’s papers is sometimes vague,
and thus leaves lots of implementation details to be filled in.
The SIFT is invariant to image translation, scaling and rotation. SIFT features are also
partially invariant to illumination changes and affine 3D projections. These features
have been widely used in the robot localization field as well as in many other computer
vision fields. The SIFT algorithm has four major stages.
1. Scale-space extrema detection: the first stage searches over scale space using a Dif-
ference of Gaussian (DoG) function to identify potential interest points.
2. Key point localization: the location and scale of each candidate point are determined
and key points are selected based on measures of stability.
3. Orientation assignment: one or more orientations are assigned to each key point based
on local image gradients.
4. Key point descriptor: a descriptor is generated for each key point from information
on local image gradients at the scale found in stage 2.
The first stage is clarified as follows. For each octave in the scale space, the initial image
is repeatedly convolved with Gaussians to produce the set of scale space images. Ad-
jacent Gaussian images are subtracted to produce the DoG images. After each octave,
the Gaussian image is down-sampled by a factor of 2 and the process is repeated. For a
more detailed discussion of the key point generation and factors involved see [6].
In a nutshell, Lowe’s algorithm finds stable features over scale space by repeatedly
smoothing and down sampling an input image and subtracting adjacent levels to create
a pyramid of difference-of-Gaussian images. The features the SIFT algorithm detects
represent minima and maxima in scale space of these difference-of-Gaussian images.
At each of these minima and maxima, a detailed model is fit to determine location, scale
and contrast, during which some features are discarded based on measures of their in-
stability. Once a stable feature has been detected, its dominant gradient orientation is
obtained, and a key point descriptor vector is formed from a grid of gradient histograms
constructed from the gradients in the neighborhood of the feature. Key point matching
between images is performed using a nearest-neighbor indexing method.
There are many points along the course of this algorithm where simplifications and po-
tential improvements can be made. Our current goals, beyond implementing and testing
Lowe’s algorithm, are:
(1) simplify and clean up the algorithm as much as possible,
(2) improve lighting invariance by normalizing potential SIFT difference-of-Gaussian
118