LOCMAX SIFT
Non-Statistical Dimension Reduction on Invariant Descriptors
Dávid Losteiner
Péter Pázmány Catholic University, H-1083 Budapest, Práter u. 50/a, Hungary
László Havasi and Tamás Szirányi
Distributed Event Analysis Research Group, Hungarian Academy of Sciences, H-1111 Budapest, Kende u. 13-17. Hungary
Keywords: SIFT, Dimension reduction, DTW, Image descriptors.
Abstract: The descriptors used for image indexing - e.g. Scale Invariant Feature Transform (SIFT) - are generally
parameterized in very high dimensional spaces which guarantee the invariance on different light conditions,
orientation and scale. The number of dimensions limit the performance of search techniques in terms of
computational speed. That is why dimension reduction of descriptors is playing an important role in real life
applications. In the paper we present a modified version of the most popular algorithm, SIFT. The
motivation was to speed up searching on large feature databases in video surveillance systems. Our method
is based on the standard SIFT algorithm using a structural property: the local maxima of these high
dimensional descriptors. The weighted local positions will be aligned with a dynamic programming
algorithm (DTW) and its error is calculated as a new kind of measure between descriptors. In our approach
we do not use a training set, pre-computed statistics or any parameters when finding the matches, which is
very important for an online video indexing application.
1 INTRODUCTION
Image descriptors are basic features in video
processing applications. There are several real-life
areas such as object recognition; video indexing or
searching in image databases where stable image
features play the most important role (Lowe, 1999).
An early concept was only to find specific keypoints
of digital images (Mikolajczyk and Schmid, 2005)
but nowadays we use much more stable features, so
called descriptors, to exploit a considerable amount
of usable information from an image area.
Building a huge database from video frames is
very time consuming if the dimensionality is high. In
our case – as one of the most preferred ways – the
Scale Invariant Feature Transformation (SIFT)
produces a very long and accordingly responsible
vector about a point and its environment. However,
when the matching of these points is based on a
simple method - e.g. an Euclidian distance on
vectors of the same size -, in case of large amount of
data some serious problems could emerge. The cost
of these can be reduced using e.g. PCA to decrease
the vectors’ dimension but this means some extra
pre-computation steps on a given patch set to get the
eigenspace (Ke and Sukthankar, 2004). However, it
results in a significantly lower dimension compared
to standard 128 element vectors, but it is also needed
to be executed offline.
We discuss here some other ways to decrease
dimensions and to make the computing effort of
searching much lower.
The motivation was to find an alternative for the
conventional matching method which is required to
determine the two smallest distances in the dataset.
For a small number of descriptors it works perfectly,
but in larger databases this quotient leads to possible
false matches.
We will present lower dimension descriptors
which are based on the structural properties of the
SIFT descriptors to narrow the space where we have
to search. On the other hand, decreasing the
descriptors’ dimension results in the loss of a part of
information so the error distance computation needs
to focus on really relevant SIFT properties. Another
benefit to other solutions that locmax (local
192
Losteiner D., Havasi L. and Szirányi T. (2009).
LOCMAX SIFT - Non-Statistical Dimension Reduction on Invariant Descriptors .
In Proceedings of the Fourth International Conference on Computer Vision Theory and Applications, pages 192-195
DOI: 10.5220/0001799401920195
Copyright
c
SciTePress
maximas on SIFT descriptor) does not need pre-
computed statistics (Hua et al, 2007) or any training
dataset (Mikolajczyk and Schmid, 2001).
The paper is organized as follows. In Section 2
we introduce the locmax SIFT features. Then we
propose a new error-distance on locmax SIFT
parameters by using the DTW method and applying
its effort for matching the most significant peaks.
Finally, we demonstrate the efficiency of the
proposed metric and the new methodology.
2 PROPOSED DESCRIPTOR
The idea came from the analysis of correctly
matched SIFT descriptor vectors (Lowe 2004). An
important property of these is the corresponding
local peaks are close on descriptor vectors. All the
descriptors never fit each other completely; they are
just very similar to each other. For example, the
rotation of an object can produce some changes in
the feature vector because of the discrete
transformation but it will not really act on local
maxima peaks (hereafter called locmax). We will
show that slight changes caused by different maxima
do not affect significantly the matching processes.
In our attempts, when using the local maxima,
only 3 neighboring values are checked so the 128
long descriptor may contain no more than 128/3=42
locmax positions. In practice this number spans from
15 to 32 considering our experiments where on
10000 SIFT descriptor we get around 20 as average
number of peaks (with near Gaussian distribution).
This simple restriction decreases the vector
dimension significantly
Here we exploit the fact that the locmax statistics
of the same point in different instances can be
changed but its structure can be related to the other
in the modified version as well. Our experiments
show that these locmaxes are stable enough for
matching, and reduce the dimension of the search.
The other motivation was to work around the above
mentioned standard distance calculation and thereby
to use a threshold value at finding pairs.
2.1 Extraction of LocMax Values
The only thing to do with the standard SIFT
descriptors is to extract locmax positions and values.
The extraction of locmaxes based on a simple search
of local maxima among the neighboring descriptor
values. The simplest case is to use a 3 element wide
window for this step.
As we mentioned above, the locmax descriptors
yield much lower dimension vector. From our
perspective the important information are the
indexes and the values of these extremes.
As we will show, for the calculation of distances
between two locmax vectors we used the retrieved
positions. This distance will be low in case of
similar maxima positions, but the source SIFT
descriptors also include low maxima values. If these
low values are just slightly higher than its
neighboring ones then there will be also local
maxima.
For this reason we use the values as weighting
factors at a distance calculation, and the matching
does not depend on them directly. The most
dominant values of the descriptor vector take the
same positions (Figure 1) hence if the difference is
low between those positions the current weighted
distance will be also low. In case of a high
difference the current position match is just a
‘casual’ correspondence and we should compensate
the distance among weights as described later. In the
optimal case it means that only really similar
locmaxes have a low distance from their
counterparts.
For the further detailed description of the
weighted distance calculation see Section 2.2.
Figure 1: Standard way for pairing descriptors. Local
maximas take the same positions.
2.2 Comparison of LocMax
Descriptors
The SIFT values are to be normalized according to
the global maxima in the range of [0, 1] for each
descriptor. This is necessary because the weighting
function only uses the structural similarity among
the descriptors.
Using only the position indexes is not enough to
get the correct distance because it is not just the
position but also the weight that is determining the
structural similarity (2). Because of the possible
difference between two locmax vector lengths, we
used the DTW (Dynamic Time Warping) algorithm
LOCMAX SIFT - Non-Statistical Dimension Reduction on Invariant Descriptors
193
(Myers and Rabiner, 1981) to compare the positions
and to get the error distance. The algorithm is
successfully used in signal processing tasks such as
speech recognition and text processing.
Before running the DTW we have to calculate
the distance between position vectors, which
generates a distance matrix D :
12
() ( )
12
(, ) () ( )
wi w j
Di j p i p j e
=− +
(1)
where p
1
and p
2
are position vectors, w
1
and w
2
are
normalized values from SIFT. Next, using the
normalized weights, we will correct the matrix and
increase the distance if necessary:
11
(, ) (, ) (1 () ( ))
D
ij Dij wi w j=⋅+
(2)
Using this compensation, the DTW algorithm is
now ready to compute distances on weighted D
distance matrix. If the weights are different the
above mentioned function will enhance the possibly
low distance and in case of equivalent values it has
no effect really (Figure 2). The DTW follows the
classical algorithm of (Myers and Rabiner, 1981);
just the format of the input has changed, instead of
using vectors we used the compensated D matrix.
This results the distance matrix D. This method was
used to compare two locmax descriptors and get our
own metric on positions and values.
()
D
TW D
dist
k
=
(3)
where k is the number of DTW steps.
Figure 2: Example for weighted matrix created from
correct locmax match.
To get some extra speed up we use a default high
value if the positions are unlikely far from each
other so the DTW never will run on that area. Using
this locmax distance no more effort is needed to
search after the best and second best Euclidian
distance (Lowe, 2004) and we can easily rank the
descriptors.
The mentioned algorithm ends with a simple
nearest neighbor search where there is no need for a
threshold value. The distances tell us a measure of
similarity between SIFT vector structures. The
paired locmaxes were taken from a ground truth set,
of course. In the paper, test images and homography
data are from the data set used for performance
evaluation of the descriptors (Mikolajczyk and
Schmid, 2005), however, our goal is not a
comparative study among different descriptors
(Figure 3).
The complexity of this distance calculation
requires more computation (Table 1). To reduce the
computational steps, we used the above suggested
default high value on positions which are too far
from the main diagonal on D. This default value can
be easily set to a fairly high number. This way the
computation is reduced to a given band along the
diagonal.
The DTW algorithm runs twice on the weighted
distance matrix D: first it determines the distance
field, then finds the minimum route on it that will
also produce the k number of steps (3). Because of
setting up the far positions, the DTW does not need
to deal with uninteresting parts of D distance matrix.
Instead of working on an N-by-M matrix we should
just use the relevant information which cannot be
determined directly, because it depends on position
distances and the limit of allowed distance L
d
on D.
Table 1: Computation costs of locmax distances.
3 EXPERIMENTAL RESULTS
We have shown a novel distance calculus for SIFT
matching for achieving more effective indexing and
retrieval solutions with reduced dimensionality. We
compared the standard SIFT, the descriptors reduced
with PCA algorithm (Joliffe, 1986) using 20
dimension descriptors, and the locmax approach.
The results in Table 2 show the precision as a
percentage value in case of adding two and more
images and in brackets the total number of found
descriptor pairs. The correctness of the matching
was also tested geometrically with the given ground
truth homographies (Mikolajczyk and Schmid,
2005). The higher precision value means the higher
rate of correct matches. These values mean how the
percentage of correctly founded matches change to
the reference descriptor set (number in brackets) if
we add new elements to the database. Our primary
goal was not to overcome any previous SIFT
Calculating D matrix
(
)
()
#,
d
NM L Dij⋅− <
DTW
()
()
#,
d
NM L Dij k
−< +
VISAPP 2009 - International Conference on Computer Vision Theory and Applications
194
Table 2: Number of matches on SIFT and locmax-SIFT descriptors.
Graffiti +Boat +Bark + Bikes
#descr. on image 900+1083 +844 +1255 +597
SIFT 94% (402) 87% (394) 87% (394) 83% (386)
SIFT (PCA) 81% (555) 76% (589) 60% (760) 57% (769)
locmax 89% (411) 83% (415) 78% (426) 74% (432)
Figure 3: Samples from used image dataset [4] (Graffiti, Boat, Bark, Bikes).
method, but to create a lower dimensional descriptor
which is stable enough.
The most of false matches came from textured
regions because the locmax vectors contain only the
most significant parts of SIFT features (e.g. Bark,
see Figure 3). Certainly the dimension reduction
causes loss of information, thus there will be similar
local maxima positions from such areas. Another
problem is the uncertainty of good and false
matches.
In summary, higher precision leads to better
detection rates for object retrieval (Schügerl et al,
2007). The proposed method can improve the
precision rate in the reduced dimensions.
4 CONCLUSIONS
This paper introduced a new type of error-distance
calculation on SIFT descriptors with decreased
dimension. The method uses only a dynamic set of
local maxima of standard feature vectors, and after
calculating a weighted position distance we use the
DTW algorithm for comparing locmax features, and
get a novel metric on descriptors. This makes
possible the unsupervised (non-linear) dimension
reduction which is the key step to construct an
effective search tree in the future.
In future works we will focus on perfecting the
weight function to improve the matching scores,
according to other structural properties of SIFT
descriptors. Another plan is to integrate it in a lower
dimensional tree structure.
REFERENCES
Brown, M. and Lowe, D. G., 2003. Recognizing
Panorama, Proceedings of Ninth IEEE International
Conference on Computer Vision Vol.2 pp. 1218- 1225
Hua, G., Brown, M., Winder, S., 2007. Discriminant
Embedding for Local Image Descriptors, IEEE
International Conference on Computer Vision, Rio de
Janeiro, Brazil, pp. 1-8.
Joliffe, I.T., 1986. Principal Component Analysis.
Springer-Verlag.
Ke, Y., and Sukthankar, R., 2004. PCA-SIFT: A More
Distinctive Representation for Local Image
Descriptors. CVPR 2004. Proceedings of the 2004
IEEE Computer Society Conference on, Vol.2 , pp.
506-513.
Lowe, D., 1999. Object recognition from local scale-
invariant features. In Proceedings of International
Conference on Computer Vision, pp. 1150–1157.
Lowe, D., 2004. Distinctive image features from scale-
invariant keypoints. International Journal of
Computer Vision. pp. 91-110.
Mikolajczyk, K. and Schmid, C., 2001. Indexing based on
scale invariant interest points. In Proceedings of
International Conference on Computer Vision, pp.
525–531.
Mikolajczyk, K. and Schmid, C., 2005. A performance
evaluation of local descriptors. IEEE Transaction on
Pattern Analysis and Machine Intelligence, pp.
27(10):1615–1630.
Myers, C. S. and Rabiner, L. R. 1981. A comparative
study of several dynamic time-warping algorithms for
connected word recognition. The Bell System
Technical Journal, pp. 60(7):1389-1409.
Schügerl, P., Sorschag, R., Bailer, W., Thallinger, G.,
,2007. Object Re-detection Using SIFT and MPEG-7
Color Descriptors, Multimedia Content Analysis and
Mining, Springer-Verlag, pp. 305-314.
LOCMAX SIFT - Non-Statistical Dimension Reduction on Invariant Descriptors
195