approach is that, since the majority of possible basis
choices are discarded, the Recognition stage is signif-
icantly faster than with the standard GH algorithm.
We outline here the details of the proposed
method. For the sake of clarity and conciseness,
we address the case of 2D object recognition under
similarities, though the method can be generalized
straightforwardly to deal with affinities as well as for
3D object recognition (Lamdan and Wolfson, 1988;
Grimson and Huttenlocher, 1990).
As sketched in Fig. 1, the offline stage is analo-
gous to that of the GH algorithm: i.e. for each model,
feature points are detected, then for each feature pair
the feature positions are transformed according to the
current basis and stored in the Hash Table. In addi-
tion, though, we also compute a descriptor for each
feature point and store it for later use. Also, we do not
consider as possible bases those feature pairs whose
distance is either too big or too small: in the former
case transformed feature coordinates would get small
values and become too prone to noise, in the latter
case the Hash Table would become too big and sparse
due to transformed feature coordinates getting large
values.
As for the online (Recognition) stage, given a
model to be sought for, first features are detected and
described into the target image. Then, correspon-
dences between features are established by matching
each target image descriptor to the model descriptors
using as matching measure the euclidean distance. In
particular, for each target image descriptor the ratio
between the most similar model descriptor and the
second-most similar model descriptor is computed.
This nearest-neighbor search can be efficiently im-
plemented using efficient indexing techniques such as
Kd-trees (Beis and Lowe, 1997). Once this is done for
all target image features, they are sorted in increasing
order of match confidence based on this ratio: obvi-
ously, the smaller the ratio value, the higher the prob-
ability that the current feature belongs to the model.
Then, only the first τ features are selected to form pos-
sible bases: all the other features are discarded and
won’t be considered as possible bases. The size of this
subset of features, τ, is a parameter of the algorithm:
in our experiments we have empirically selected the
value of 10. Hence the number of features used to
generate bases, n
b
, is given by:
n
b
= min(τ, n) (1)
Given this subset of features, S
n
b
, in turn each fea-
ture pair is selected from S
n
b
as the current basis. Also
in this case, we adopt the approach of not considering
feature pair whose distance is either too big or too
small. Once a basis is selected, all the other features
extracted (not just those belonging to S
n
b
) are trans-
formed according to the current basis and used for
casting votes as in the original GH algorithm. If votes
are accumulated in one (or more) bin of the Hash ta-
ble, then the current object is found, otherwise an-
other basis is evaluated until either the object is de-
tected or all bases have been evaluated.
It is worth pointing out that the proposed approach
can easily deal with the presence of multiple object
instances into the target image by evaluating all over-
threshold bins in the Hash Table obtained with a par-
ticular basis. As for the computational burden, it is
important to note that although our method requires
additional computations in the Recognition stage in
order to describe and match interest points, efficient
algorithms do exist for both tasks (Lowe, 2004; Bay
et al., 2008)). Moreover, our method notably speeds
up the ”vote casting” process, so that the complexity
of the Recognition stage, which is O(n
3
) in the stan-
dard algorithm, is reduced to O(τ
2
n), where τ (i.e.
the number of features which are allowed to generate
bases) can easily be one order of magnitude smaller
than n. Hence, complexity is linear in the number
of features instead of cubic: this also allows for the
use of a high number of features which, as it will be
shown in the next section, helps improving the perfor-
mance of the algorithm.
3 EXPERIMENTAL RESULTS
This section presents an experimental evaluation
where the proposed approach is compared to the stan-
dard GH algorithm in an object recognition scenario.
In particular, we propose two different experiments
based on two different datasets.
3.1 Experiment 1
In Experiment 1, an object has to be recognized
within a test dataset composed of 40 images. The test
dataset is characterized by object translations, rota-
tions and -quite large- scale changes. Moreover, there
is a strong presence of clutter and occlusions. In each
of the 40 test images the object to be recognized al-
ways appears once. The object model and a few test
images are shown in Fig.2. Correct matches are de-
termined by evaluating the position error between the
ground-truth bounding box around the object and that
found by the algorithm. More specifically, we com-
pare the performance of the two algorithms by means
of Recall vs. Precision curves, by varying the thresh-
old applied on the peaks of the Hash table. To com-
pute the Recall and Precision terms, a True Positive
IMPROVING GEOMETRIC HASHING BY MEANS OF FEATURE DESCRIPTORS
421