are captured by cameras that have a common center
of projection. This means that the cameras are related
by pure rotation around the optical center (no transla-
tion). The second scenario is when all image points
lie on the same plane in the scene. Possible applica-
tions are, for instance, generating panorama images
and optical text recognition.
The remainder of this paper is organized as fol-
lows. Section 2 gives an overview of some of the
popular existing methods. Section 3 briefly reviews
the SIFT method. Section 4 describes the proposed
method. Section 5 evaluates the proposed method
by representing the experimental results. And finally,
Section 6 concludes this paper.
2 RELATED WORK
There exists a wide variety of approaches for finding
correspondences between digital images. Some ap-
proaches provide a framework for the whole process
(extraction, description, and matching), while others
introduce novel methods for specific steps and use ex-
isting methods for the others. In this section, some of
the most popular methods are introduced.
Harris corner detector (Harris and Stephens, 1988)
is a relatively simple, though widely-used feature de-
tector. This method searches for points with sig-
nificant signal changes in two orthogonal directions.
Such points correspond mostly to physical corners
in the scene. The detection is done by observing a
self-similarity measure while shifting a small window
around a point. The biggest weakness of the Harris
method is the lack of scale invariance.
SIFT (Lowe, 2004) is one of the most prominent
approaches. SIFT features provide scale and rota-
tion invariance in addition to partial illumination and
affine invariance. These strengths come at the price
of high computational complexity, mainly caused by
scale-space processing and high dimensionality of de-
scription vectors. Furthermore, the matching accu-
racy drops drastically in case of changes higher than
about 30 degrees in viewpoint angle (affine transfor-
mation). Nevertheless, due to its solid performance,
SIFT has become a supposed standard for finding im-
age correspondences.
Due to the strengths of SIFT, numerous varia-
tions have been proposed in the recent decade to over-
come its shortcomings. ASIFT (Yu and Morel, 2011),
for instance, extends SIFT with full affine-invariance
by applying various tilts and rotations to the image
to simulate different camera orientations. After the
viewpoint simulation, ASIFT follows the standard
SIFT method. Although ASIFT outperforms SIFT in
scenarios with high viewpoint changes, the complex-
ity caused by the preprocessing increases the compu-
tation time considerably (Wu et al., 2013). PCA-SIFT
(Ke and Sukthankar, 2004) is another SIFT-variant,
which aims at reducing the computational complexity.
This method utilizes the Principle Component Anal-
ysis (PCA) to reduce the descriptor dimension. The
compact descriptor declines the matching time, but
the PCA-processing introduces further costs in the de-
scription step. The overall processing time is reduced
slightly, but the performance is compromised in some
cases (Mikolajczyk and Schmid, 2005), (Wu et al.,
2013). SURF (Bay et al., 2008) is a further approach
that reduces the complexity of SIFT. The lower com-
plexity is due to rough approximations and reduced
descriptor size. SURF has shown to improve the com-
putation efficiency of SIFT significantly while achiev-
ing comparable accuracy (Bay et al., 2008), (Grau-
man and Leibe, 2011), (Wu et al., 2013).
Affine invariant region detectors (Mikolajczyk
and Schmid, 2002), (Mikolajczyk and Schmid, 2004)
achieve limited affine-invariance by iteratively esti-
mating and normalizing the local affine shape of the
features. However, due to the fact that the features are
extracted in a non-affine manner, full affine invariance
cannot be achieved (Lowe, 2004).
Lepetit and Fua (Lepetit and Fua, 2006) redefine
the feature matching problem as a classification prob-
lem, where the features of the reference image are
considers as classes and the features of the test image
are classified based on their appearance. The classifier
is trained by applying random affine transformations
to the reference image to simulate different views of
each feature. The features are matched (classified) in
real-time using randomized trees. With this scheme,
the computational complexity is moved to the extrac-
tion (training) step to enable fast matching phase.
3 REVIEW OF SIFT
As mentioned before, the proposed method is based
on the SIFT approach. Therefore, this section
presents a short review of the different steps of this
method based on (Lowe, 2004).
3.1 Feature Extraction
In order to achieve scale invariance, SIFT exploits
the concept of the scale space, which builds a
3-dimensional space by enhancing the image space
with scale. For this purpose, the image is smoothed
successively with the scale-normalized Gaussian ker-
nel. Each blurred image represents one instance of the
SIFT-EST-ASIFT-basedFeatureMatchingAlgorithmusingHomographyEstimation
505