the idea to detect and describe 2D features in non-
linear scale-space extrema to obtain a better localiza-
tion accuracy and distinctiveness. The Gaussian blur-
ring used in other object recognition algorithms (e.g.
SIFT), does not respect the natural boundaries of ob-
jects since image details and noise are smoothed to
the same degree at all scale levels. To make blurring
adaptive to image features, KAZE makes use of non-
linear diffusion filtering alongside the AOS (Additive
Operator Splitting) method. With this filtering, the
image noise is reduced but the object boundaries are
kept (Andersson and Marquez, 2016).
Because the process of solving a series of PDEs
(required by the method of non-linear diffusion filter-
ing) is computationally costly, an accelerated version
of KAZE was created, called Accelerated KAZE or
AKAZE (Alcantarilla and Solutions, 2011). The al-
gorithm works in the same way as KAZE but there are
some differences (Andersson and Marquez, 2016):
• it uses a faster method to create the non-linear
scale-space called the Fast Explicit Diffusion
(FED)
• it uses a binary descriptor (a modified version of
the Local Difference Binary (LDB) descriptor) to
further increase speed
2.3 Feature Matching
Feature matching is the process of finding corre-
sponding points in different images. It is very de-
pendent on the process of feature extraction. If the
features extracted are not as particular to the image
as possible, some features may match even if they do
not represent the same segment or part of an image. It
is important to find a balance between the number of
features extracted, because the time complexity of the
matching grows with the number of features from the
images.
2.3.1 FLANN
FLANN
1
is a library for performing fast approximate
nearest neighbor searches in high dimensional spaces,
containing a collection of algorithms which work best
for nearest neighbor search and a system for automat-
ically choosing the best algorithm and optimum pa-
rameters depending on the dataset (Muja and Lowe,
2009). This library was used in our system.
2.3.2 Matching Techniques
As the authors motivate in (Tareen and Saleem, 2018),
the choice of the feature-detector descriptor is a criti-
1
https://www.cs.ubc.ca/research/flann/
cal decision in feature-matching applications. They
present a comprehensive comparison and analysis
of SIFT, SURF, KAZE, AKAZE, ORB and BRISK,
which are among the fundamental scale, rotation and
affine invariant feature-detectors, each having a desig-
nated feature-descriptor and possessing its advantages
and disadvantages.
The performance of feature detector-descriptors
on matching was evaluated on the following transfor-
mations: scaled versions (5% to 500%), rotated ver-
sions (0 to 360 degrees), viewpoint changes and affine
invariance.
Regarding the accuracy of image matching, SIFT
was found to be the most accurate, overall. AKAZE
and BRISK are the runner-ups.
Authors of (Pusztai and Hajder, 2016) quantita-
tively compared the well-known feature detectors-
descriptors implemented in OpenCV3. Based on their
analysis the most accurate feature extraction algo-
rithm is SURF, which outperforms the other methods
in all test cases. KAZE/AKAZE are the runner-ups,
which are also very accurate.
2.3.3 Lowe’s Ratio Test
Lowe proposed in (Lowe, 1999) to use a distance ratio
test to eliminate false matches.
The author explains that the best candidate match
for each keypoint is found by identifying its nearest
neighbor in the database of keypoints from training
images. It is possible that some features from an im-
age will not have any correct match in the training
database, leading to invalid or incorrect matches. This
could happen when they come from background clut-
ter or were not detected in the training images.
An effective measure is obtained by comparing
the distance of the closest neighbor to that of the
second-closest neighbor (Lowe, 1999). This mea-
sure achieves reliable matching because the correct
matches should have the closest neighbor much closer
than the closest incorrect match.
Eventually, matches in which the distance ratio is
greater than 0.7-0.8 are rejected, leading to an elimi-
nation of 90% of the false matches while discarding
less than 5% of the correct matches for the dataset
presented (Lowe, 1999).
3 PROPOSED SOLUTION
3.1 Overview
We used 360
◦
panorama images from Google Street
View
TM
to create a dataset. The dataset and the user-
VISAPP 2020 - 15th International Conference on Computer Vision Theory and Applications
270