whether enumeration is used or not, but this is be-
cause the ones selected by the method differ when
there is one with the same priority. In addition, it can
be seen that the number of 16-bit sketches enumerated
in search time is very small. Using the speeding up
by enumeration, the retrieval speed can be increased
about 10 times as compared with using the conven-
tional 32-bit sketches. Since score
1
cannot be speed-
ing up by enumeration method, it achieves only about
4 times faster search, but the highest accuracy.
In the conventional method, it was not possible
to get higher precision by increasing K in order to
achieve faster search speed than other methods such
as R-Tree. However, we can expect that the proposed
method keeps high-speed search, even if higher ac-
curacy is required. The results compared with larger
K with precision exceeding 90% are shown in Table 5
and Table 6. Search using 16-bit sketches needs larger
K than 32-bit ones, but achieves speeding up about 8
times. Also, with score
1
you can achieve high preci-
sion without increasing K too much, and it is fastest
though not using enumeration method.
5 CONCLUDING REMARKS
Changing from 32-bit sketches to narrower 16-bit
sketches, about 10 times faster search is achieved
by efficient first stage search and data management
by the bucket method. When Hamming distance or
score
∞
are used for prioritization for 16-bit sketches,
the first stage search can be done in very short time by
enumerating sketches in order of priorities. As a fu-
ture work, we should consider enumeration algorithm
for score
1
in a similar way.
Using the 16 bit sketch, in order to maintain the
same degree of precision as in the conventional 32
bit case, the number K of first stage candidates is
required to be approximately three times as large.
By sorting data with sketch as a key, second stage
retrieval can be improved about three times faster.
Therefore, it can be expected that the superiority of
the proposed method can be preserved for data of
higher dimension than those used in the experiments.
We need to further investigate the relationship be-
tween database size n and optimal sketch width w. In
this paper we assumed n to be millions, but for larger
databases it may be better to make w greater than 16.
In the experiments in this study, we used the
heuristic method QBP (Higuchi et al., 2018) which
minimizes the collision probability as the evaluation
index for sketch optimization. By using AIR, a kind
of simulated annealing method, a pivot set of sketches
with smaller collision probability than QBP can be
obtained, but search accuracy is not improved (Ima-
mura et al., 2017). However, since the data man-
agement by the bucket method is performed in our
proposed method, as a merit of using a sketch with
smaller collision probability, there is a possibility of
improving the speed by localizing the memory access.
In any case, it seems necessary to further investigate
sketch optimization.
ACKNOWLEDGMENTS
This work was partially supported by JSPS KAK-
ENHI Grant Numbers 16H02870, 17H00762,
16H01743, 17H01788, and 18K11443.
REFERENCES
Ciaccia, P., Patella, M., and Zezula, P. (1997). M-tree: An
efficient access method for similarity search in metric
spaces. In Proc. VLBD’97, pages 426–435.
Dong, W., Charikar, M., and Li, K. (2008). Asymmetric
distance estimation with sketches for similarity search
in high-dimensional spaces. In Proc. ACM SIGIR’08,
pages 123–130.
Guttman, A. (1984). R-trees: A dynamic index structure
for spatial searching. In Yormark, B., editor, Proc.
SIGMOD’84, pages 47–57. ACM Press.
Higuchi, N., Imamura, Y., Kuboyama, T., Hirata, K., and
Shinohara, T. (2018). Nearest neighbor search using
sketches as quantized images of dimension reduction.
In Proc. ICPRAM 2018, pages 356–363.
Imamura, Y., Higuchi, N., Kuboyama, T., Hirata, K., and
Shinohara, T. (2017). Pivot selection for dimension
reduction using annealing by increasing resampling.
In Proc. LWDA 2017, pages 15–23.
Mic, V., Novak, D., and Zezula, P. (2015). Improving
sketches for similarity search. In Proc. MEMICS’15,
pages 45–57.
Mic, V., Novak, D., and Zezula, P. (2016). Speeding up
similarity search by sketches. In Proc. SISAP 2016,
pages 250–258.
M
¨
uller, A. and Shinohara, T. (2009). Efficient similarity
search by reducing i/o with compressed sketches. In
Proc. SISAP’09, pages 30–38.
Shinohara, T. and Ishizaka, H. (2002). On dimension re-
duction mappings for approximate retrieval of multi-
dimensional data. In Progress of Discovery Science,
LNCS 2281, pages 89–94.
Wang, Z., Dong, W., Josephson, W., Q. Lv, M. C., and Li,
K. (2007). Sizing sketches: A rank-based analysis
for similarity search. In Proc. ACM SIGMETRICS’07,
pages 157–168.
Yianilos, P. (1993). Data structures and algorithms for near-
est neighbor search in general metric spaces. In Proc.
SODA 1993, pages 311–321. ACM Press.
Fast Nearest Neighbor Search with Narrow 16-bit Sketch
547