mine the category of the current object. Time for NN
search was measured separately during recognition.
Two different codebooks were used: the first contains
inactive jets and can therefore only be handled by lin-
ear search and our method, the second comprises only
parquet graphs of which every jet is active. The latter
codebook is used to compare our method with origi-
nal LSH.
For being able to evaluate the potential of both
systems independent of parameter determination we
run the tests for them once with hand-tuned parame-
ters and once with parameters detected by the meth-
ods. The LSHKIT also offers the possibility of online
adaptation of the number of scanned buckets. This
was tested, too. All tests were run 5 times and the av-
erage search time was recorded. The biggest standard
deviations were 14.5 sec for linear search and 14.34
sec for LSHKIT search with online parameter deter-
mination on the second codebook. All others were
below 7.0 sec and several also below 3.0 sec. The
following tables summarize all test results. The first
column gives search time for all searches, the second
average time per search, the third the percentage of
scanned codebook vectors and the last the hit rate.
Table 1 clearly shows the superiority of our
method compared to linear search. Our nearest
neighbor search is considerably faster than exhaustive
search and needs only 16.13% respectively 15.09% of
linear search time, which fits relatively well with the
according percentages of scanned codebook vectors.
For the second codebook (table 2) the percentages
of search time are (in the order of the table) 11.67%,
10.66%, 21.51%, 16.23% and 9.43%. This is also in
relative good accordance to the amount of scanned
vectors for our method, for LSHKIT the difference
is bigger. It seems that on that codebook LSHKIT
spends relatively more time on hash value computa-
tion than our approach. The second codebook shows
also that our method is potentially not as good as the
original LSH in the LSHKIT-implementation, which
only needs to look at 5.47% of all vectors for man-
ual tuned parameters. For automatic parameter detec-
tion LSHKIT is slower than our method but gives a
higher hit rate (at the cost of scanning more vectors
than necessary for desired hit rate). In case of online
determination the scan amount is lowered to 7.52%,
but search time is worse than for offline determination
with only slightly smaller recognition rate. Thus on-
line determination seems to scan the codebook more
efficiently but does not lead to better approximation to
the desired recognition rate and needs additional time
for parameter adaptation. This may be due to non-
gamma-distributed data in our tests. The superiority
of LSHKIT can highly likely be ascribed to a better
partitioning of the search space than in our method.
In our search scheme this strongly depends on the se-
lected hash vectors. By testing more than 1000 vec-
tors as first hash vector (maybe all codebook vectors)
this can be improved, but time for parameter determi-
nation increases considerably.
Additionally, we tried to increase search speed
by using Intel’s MKL (INTEL, 2011) (version
10.2.6.038) for speeding up matrix multiplication.
This library is optimized for Intel CPUs to do al-
gebraic calculations as fast as possible. To our as-
tonishment it turned out that using the library ac-
tually diminished the performance of our system.
Even when using the special functions for matrix-
vector and vector-vector products the computation
took more time on our test system than when using the
boost ublas product functions (BOOST, 2011). MKL
needs certain sizes of both matrices for being able to
accelerate multiplication. If this condition is fulfilled
it makes a substantial difference. We run an addi-
tional test with linear search on all parquet graphs of
each single image in parallel on the second codebook.
This gave a search time of 1009.02 sec (standard devi-
ation 2.97 sec) for parallel linear search (4.7msec per
search). This result is clearly slower than our method
when aiming at a hit rate of 90 %. But if one compares
it to LSHKIT with offline parameter determination
and a hit rate close to 100 % it is only slightly slower
while being completely exact. Is it therefore recom-
mendable to do parallel linear search if very high hit
rates are needed? To test this we created a third code-
book of 88951 parquet graph jets, almost twice as
big as the others. For this we got the following re-
sults (table reftab:tab3) Parallel linear search (stan-
dard deviation 2.11 sec) was clearly slower than our
method (standard deviation 5.49 sec) and LSHKIT
(standard deviation 2.51 sec). This gives a hint, that
even with optimized matrix-matrix multiplication it
makes sense to use special search methods. It is pos-
sible that parallel linear search would have been faster
when search would have been done on more search
vectors at once, but in most applications one cannot
expect to have all search vectors available from the
beginning. A real time object recognition system can-
not know the search vectors of future images, and
the matrix size up to which parallel matrix-matrix-
multiplication gives a speedup will highly likely de-
pend on the cache size of the CPU, too.
6 CONCLUSIONS
We have presented the first search scheme that is
able to do fast nearest neighbor search on a set of
FAST NEAREST NEIGHBOR SEARCH IN PSEUDOSEMIMETRIC SPACES
673