Nearest Neighbor Search using Sketches

as Quantized Images of Dimension Reduction

∗

Naoya Higuchi

, Yasunobu Imamura

, Tetsuji Kuboyama

, Kouichi Hirata

and Takeshi Shinohara

Kyushu Institute of Technology, Kawazu 680-4, Iizuka 820-8502, Japan

Gakushuin University, Mejiro 1-5-1, Toshima, Tokyo 171-8588, Japan

Keywords:

Similarity Search, Sketches, Ball Partitioning, Hamming Distance, Dimension Reduction, Distance Lower

Bound, Quantized Images.

Abstract:

In this paper, we discuss sketches based on ball partitioning (BP), which are compact bit sequences repre-

senting multidimensional data. The conventional nearest search using sketches consists of two stages. The

ﬁrst stage selects candidates depending on the Hamming distances between sketches. Then, the second stage

selects the nearest neighbor from the candidates. Since the Hamming distance cannot completely reﬂect the

original distance, more candidates are needed to achieve higher accuracy. On the other hand, we can regard

BP sketches as quantized images of a dimension reduction. Although quantization error is very large if we use

only sketches to compute distances, we can partly recover distance information using query. That is, we can

compute a lower bound of distance between a query and a data using only query and the sketch of the data.

We propose candidate selection methods at the ﬁrst stage using the lower bounds. Using the proposed method,

higher level of accuracy for nearest neighbor search is shown through experimenting on multidimensional data

such as images, music and colors.

1 INTRODUCTION

A similarity search is one of important and famous

tasks for information retrieval in multidimensional

data. The purpose of the similarity search is to ﬁnd

data near to a query with respect to a given distance.

In order to avoid “the curse of dimensionality,” di-

mension reduction techniques have been developed.

One of the most important properties of dimension

reduction is that the distance between any two data is

not expanded after transformation. In other word, di-

mension reduction is a Lipschitz continuous mapping.

This property establishes the safety of pruning stra-

tegy that excludes any data, which is far away from a

query in the lower dimensional space, from the search

target without examining the distance in the original

space. For example, K-L transformation (or equiva-

lently principal component analysis (PCA)) (Fuku-

naga, 1990) and FastMap (Faloutsos and Lin, 1995)

are known as dimension reductions for a Euclidean

space. Also H-Map (Shinohara et al., 1999) and

∗

This work is partially supported by Grant-in-Aid for

Scientiﬁc Research 17H00762, 16H02870, 16H01743 and

15K12102 from the Ministry of Education, Culture, Sports,

Science and Technology, Japan.

Simple-Map (S-Map, for short) (Shinohara and Is-

hizaka, 2002) are dimension reductions applicable to

any metric space such as L

metric space and strings

with edit distance (Wagner and Fischer, 1974).

As another technique for efﬁcient similarity se-

arch in multidimensional spaces, sketch (M

uller and

Shinohara, 2009; Mic et al., 2016; Dong et al., 2008;

Mic et al., 2015; Wang et al., 2007) has also been de-

veloped. Sketch is a compact bit sequence represen-

ting multidimensional data. Ball partitioning (Uhl-

mann, 1991) (BP, for short) is a method to make ske-

tches by assigning a bit 0 or 1 to a data, such that 0

if it is in a ball and 1 otherwise. BP is also used in

vantage point tree (Yianilos, 1993).

Conventionally, the similarity search using sket-

ches consists of two stages. The ﬁrst stage selects

candidates depending on their Hamming distances be-

tween sketches. Here, we should note that the com-

putational cost of the Hamming distance is very small

if we use efﬁcient bit operations such as exclusive or

and bit count. The second stage selects the nearest

neighbor by comparing the candidates with the query

using distances in the original space. As the Ham-

ming distance cannot preserve the information of dis-

tances between data, similarity search using sketches

356

Higuchi, N., Imamura, Y., Kuboyama, T., Hirata, K. and Shinohara, T.

Nearest Neighbor Search using Sketches as Quantized Images of Dimension Reduction.

DOI: 10.5220/0006585003560363

In Proceedings of the 7th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2018), pages 356-363

ISBN: 978-989-758-276-9

is only an approximation method. One of the most

important tasks for sketches is achieving high accu-

racy with small number of candidates obtained at the

ﬁrst stage.

When the width w of sketches is considered as the

dimensionality, the dimensionality may not be redu-

ced. However, as the size w bit is usually much smal-

ler than the original data, we may consider mapping

to sketches as a quasi-dimension reduction. Nevert-

heless, the Lipschitz continuity of the mapping is not

guaranteed as long as the Hamming distance is used.

On the other hand, since the sketch of w bits is de-

ﬁned by using w BP’s, we can regard w-bits sketches

as quantized images of S-Map (Ohno, 2011). That is,

BP sketches are quantized images of S-Map, where

each axis value is quantized to one bit depending on

whether or not greater than thresholds. Note that the

∞

distance should be used to guarantee the Lipschitz

continuity of S-Map. Any L

∞

distance between ske-

tches is 0 or 1, that is, the quantization error is very

large. As for data in the database, we should use only

sketches at the ﬁrst stage because the original high di-

mensional data are too large. However, as for queries,

we can use the original queries as well as their sket-

ches. Hence, in this paper, we show that, for each ske-

tch bit, a lower bound of distance between a query q

and data x can be calculated using q and the sketch of

x without the original x. If we take score

∞

deﬁned by

the maximum of distance lower bounds as the aggre-

gation like L

∞

distance, the distance estimation using

sketch is not expanded and a BP sketch mapping can

be considered as a quasi-dimension reduction. Simi-

lar idea is also found in asymmetric distance estima-

tion (Dong et al., 2008; Jain et al., 2011; Balu et al.,

2014), where sketches are constructed by generalized

hyperplane partitioning(GHP), some of them assume

the Euclidean distance or cosine distance, and their

estimation may expand the distance.

For w bit sketches, we have w distance lower

bounds. To guarantee distance lower bounds, we have

to use the aggregation by maximum score

∞

. In ap-

proximate nearest neighbor search using sketches, we

propose score

and score

, which are the aggregati-

ons by sum and square sum respectively, which are

no longer distance lower bounds. By experimenting

on images, music and colors databases, we observe

that score

and score

achieve a more accurate nea-

rest neighbor search compared to the Hamming dis-

tance and score

∞

. Na

ıve implementation of aggre-

gation needs more computational cost than the Ham-

ming distance using bit operations. For each query

we can precompute aggregations to construct a ta-

ble function. The cost for aggregation using table

function is almost comparable with the cost for the

Hamming distance using bit operations. We can ig-

nore this preprocess cost if we search large database.

We believe that our contribution lies in the fol-

lowing three points. First, in any metric space, ba-

sed on the observation that BP sketches are quantized

images of dimension reduction S-Map, we point out

that the sketch mapping can be considered as a quasi-

dimension reduction by aggregating distance lower

bounds between the query and a sketch for each bit

of the sketch in L

∞

manner. Similar methods are

used elsewhere (Charikar, 2002; Jain et al., 2011),

where sketches are based generalized hyperplane par-

titioning (GHP) in Euclidean distance and cosine dis-

tance, and aggregated distance lower bounds is just a

distance estimation but lower bound. Here, we also

point out that GHP sketches are quantized images of

H-Map. Thus, our approach is easily extended to be

applicable to any GHP based sketches. Second, we

propose a low cost method to compute aggregations

using precomputed table functions. This contribu-

tion is not very signiﬁcant but important for practi-

cal problems. In our setting, we assume that data are

made by feature extraction function and they are not

very high dimension. Therefore, we cannot ignore the

computational cost for the ﬁrst stage. Third, we pro-

pose sum or square sum aggregations of distance lo-

wer bounds for the priority at the ﬁrst stage. Such

aggregations are no longer distance lower bound. Ne-

vertheless they are more useful than maximum ag-

gregation. Similar techniques are found elsewhere,

where GHP based sketches are considered in Eucli-

dean distance or cosine distance. Our method is ap-

plicable to BP sketches in any metric space and easily

extended to GHP.

2 PRELIMINARIES

Here, we brieﬂy introduce some necessary concepts

for our discussion.

2.1 Dimension Reduction and

Simple-Map

We assume two metric spaces (U, D) and (U

′

, D

′

where D and D

′

are distance functions satisfying tri-

angle inequality. Let dim(x) for a data x denote the

dimensionality of x. Then, we say that a mapping

ϕ : U → U

′

is a dimension reduction if it satisﬁes the

following conditions for every x, y ∈ U:

dim(ϕ(x)) < dim(x) (1)

′

(ϕ(x), ϕ(y)) ≤ D(x, y) (2)

Nearest Neighbor Search using Sketches as Quantized Images of Dimension Reduction

357

Condition (1) means that ϕ reduces the dimensio-

nality of data. However, it is not handled very strictly

in this paper. For example, the size required to re-

present data is regarded as the dimensionality. Con-

dition (2) means that D

′

provides the lower bound of

a distance D(x, y), which guarantees to ignore a data

without computing D in similarity search. For exam-

ple, if D

′

(ϕ(q), ϕ(x)) exceeds the current search di-

ameter of a query q, then x can be ignored without

computing D(q, x).

A Simple-Map (S-Map, for short) (Shinohara and

Ishizaka, 2002) is a kind of Fr

echet embedding (Ma-

tousek, 2002) that any ﬁnite metric space of n points

can be embedded isometrically into n-dimensional L

∞

normed space. A similar idea has also been proposed

by Hjaltason and Samet (Hjaltason and Samet, 2003).

For a point p ∈ U called a pivot, we deﬁne an S-Map

of x ∈ U with p as follows.

(x) = D(p, x).

By triangle inequality, the following inequality holds

for every x, y ∈ U:

|ϕ

(x) − ϕ

(y)| ≤ D(x, y).

Furthermore, for a set P = {p

, . . . , p

} of pivots, we

deﬁne S-Map ϕ

with P as follows.

(x) = (ϕ

(x), . . . , ϕ

(x)).

Suppose that we give D

′

as follows:

′

(ϕ

(x), ϕ

(y)) =

max

i=1

|ϕ

(x) − ϕ

(y)|

In other words, if the projected space U

′

is considered

as an L

∞

metric space, then, an S-Map is a dimension

reduction.

In H-Map (Shinohara et al., 1999), using a pair of

two points (p

, p

) ∈ U as a pivot, we deﬁne an H-

Map ϕ

)

of x ∈ U as follows.

)

(x) =

D(p

, x) − D(p

, x)

Then, H-Map is also a dimension reduction.

2.2 Nearest Neighbor Search using

Sketches

We assume that data in the given database are indexed

by natural numbers 1 to n. Thus, let db = {x

, ··· , x

}

be the given database of n data. The dissimilarity

between two data x

and x

is deﬁned as distance

D(x

, x

). The nearest neighbor search for a query

q is to ﬁnd x ∈ db such that D(q, x) ≤ D(q, y) for all

y ∈ db. Let s be a function which maps data to its

sketch. We can realize the k-nearest neighbor search

using sketches as follows, where K > k.

1. Preparation stage:

Calculate all the sketches s(x

), . . . , s(x

2. First stage (Filtering using the Hamming distan-

ces of sketches):

Calculate the sketch s(q) of query q.

Calculate all the Hamming distances of sketches

from s(q).

Select the closest K sketches s(x

), . . . , s(x

) to

s(q).

3. Second stage (Nearest neighbor search using ac-

tual distances):

Select the k nearest neighbor data from the candi-

dates x

, . . . , x

Sketches are relatively small structures with re-

spect to their original feature data. For example, we

use sketches of 32 bits for image feature data of 64 by-

tes in our experiments. At the ﬁrst stage of searching

process, we use the Hamming distances, which can be

more easily calculated using bit operations than the

actual distances between features. However, sketches

cannot preserve all the distance relation. Therefore,

we use them as a ﬁlter. The larger K of the number of

candidates at the ﬁrst stage achieves a more accurate

but slower search. Thus, one of the most important

subjects on sketch is to achieve higher accuracy with

smaller K, or equivalently, to speed up search within

acceptable error.

2.3 Sketches based on Ball Partitioning

In this paper, we consider sketches based on ball par-

titioning (BP). A pair (p, r) of a point and a radius is

called a pivot. A ball partitioning BP

(p,r)

is deﬁned as

follows:

(p,r)

(x) =

{

0 if D(p, x) ≤ r

1 otherwise

A BP based sketch function s

of width w is deﬁ-

ned by a set of w pivots P = {(p

, r

), ..., (p

, r

)} as

follows:

(x) = BP

)

(x)...BP

)

(x)

Consider 4 points A, B,C, D on a Euclidean plane

in Figure 1. Using a set of two pivots P =

{(p

, r

), (p

, r

)}, their sketches are s

(A) = 01,

(B) = 00, s

(D) = 11.

Let q be any query outside of both balls. Since

(q) = 11, Hamming distances between sketches of

q and A, B,C, D are 1, 2, 1, 0, respectively. The order

ICPRAM 2018 - 7th International Conference on Pattern Recognition Applications and Methods

358



























Figure 1: 2 bit sketches by two balls.

of conventional priority at the ﬁrst stage is D < A =

C < B. Note that A and C cannot be distinguished by

Hamming distances from q.

2.4 Pivot Selection for Sketches using

Binary Quantization

The pivot selection for sketch is a very important

problem but outside of our scope. Here, we brieﬂy

introduce a heuristic algorithm SELECTPIVO TQBP,

which is not the best search algorithm but can ﬁnd a

relatively good set of pivots within a small computa-

tion time.

Binary quantization (BQ) selects one of the 2

corners of n dimensional feature space, which are too

many to efﬁciently be selected. BQ randomly choo-

ses a data from database and quantizes it to a corner

according with the median.

In our experiments, we adopt probability of colli-

sions between sketches as the evaluation function of

pivots. If two different data x and y have the same

sketch, we say a collision occurs. From experience,

sketches with less collisions provide more accurate

nearest neighbor search. The algorithm SELECTPI-

VOTQBP in Algorithm 15 illustrates the algorithm

for selecting pivots of QBP, where D is the distance

function and eval is the evaluation function for pivot.

3 DISTANCE LOWER BOUNDS

BETWEEN QUERIES AND

SKETCHES

In this section, we introduce a method to compute

distance lower bounds between queries and sketches.





































Figure 2: Distance lower bound between query and sket-

ches.

First, let us explain the idea using an example. Consi-

der four points A, B,C, D and a query q in Figure 2.

The sketch s

(q) has 1 at the leftmost bit while

(A) has 0. Therefore,

D(p

, A) ≤ r

By triangle inequality,

D(q, A) ≥ D(p

, q) − D(p

, A).

From these two inequalities,

D(q, A) ≥ D(p

, q) − r

The lower bound of distance between q and any point

whose leftmost sketch bit is 0 is given by (as e

Figure 2),

(q, 0∗) = D(p

, q) − r

Similarly, the lower bound of distance between q and

C is

(q, ∗0) = D(p

, q) − r

Here, note that the above lower bounds can be calcu-

lated only from D(p

, q), D(p

, q), r

and r

without

D(q, A) and D(q,C).

More formally, we can compute the lower bounds

as explained below. Let p

and r

be the center and ra-

dius of i-th pivot. Assume i-th sketch bits for a query

q and data x are different from each other, that is,

D(p

, q) > r

and D(p

, x) ≤ r

(or D(p

, q) ≤ r

and

D(p

, x) > r

). This difference is extremely quanti-

zed as 1 in their Hamming distance. Calculating the

distances D(p

, x) for all x in database require more

computational costs. On the other hand, the distance

D(q, x) is partly recovered using D(p

, q) only.

When

D(p

, q) > r

and D(p

, x) ≤ r

Nearest Neighbor Search using Sketches as Quantized Images of Dimension Reduction

359

/* M : the dimension of data, N : the number of data */

/* db[N][M] : database, med[M ] : the median of feature values */

/* T : the number of random trials for each axis */

/* MIN : the minimum feature value, MAX : the maximum feature value

1 procedure SELECTPIVOTQBP(p[W ][M], r[W ],W )

/* p[W ][M], r[W ] : the centers and radiuses of pivots */

/* W : the width of sketches */

/* eval : evaluation of pivots to be minimized */

2 for i = 1 to W do

3 best ← ∞;

4 for t = 1 to T do

5 c ← random();

/* binary quantization */

6 for j = 1 to M do

7 if db[c][ j] ≤ med[ j] then

8 p[i][ j] ← MIN;

9 else

10 p[i][ j] ← MAX;

11 r[i] ← the median of D(p[i], db[1]), . . . , D(p[i], db[N]);

12 current ← eval(p[1], r[1], . . . , p[i], r[i]);

13 if current < best then

14 best ← current; temp ← p[i]; rtemp ← r[i];

15 p[i] ← temp; r[i] ← rtemp;

Algorithm 1: SELECTPIVOTQBP.

by triangle inequality, as shown in the left of Figure

D(q, x) ≥ D(p

, q) − D(p

, x) ≥ D(p

, q) − r

Similarly, when

D(p

, q) ≤ r

and D(p

, x) > r

we have, as shown in the right of Figure 3,

D(q, x) ≥ D(p

, x) − D(p

, q) ≥ r

− D(p

, q).

Thus, we have the lower bound e

(q, s(x)) of D(q, x)

(q, s(x)) =

{

0 if s

(q) = s

(x)

|D(p

, q) − r

| if s

(q) ̸= s

(x)

We propose priorities using the distance lower

bound e

(q, x) as the criteria to select candidates at

the ﬁrst stage. When we use as the priority

score

∞

(q, s(x)) =

max

i=1

(q, s(x))

which is the maximum lower bound, we can safely

prune some of candidates because it is really a dis-

tance lower bound.

On the other hand, the Hamming distance is the

sum of the differences. We also examine the sum and

the square sum of the distance lower bounds

score

(q, s(x)) =

∑

i=1

(q, s(x))

score

(q, s(x)) =

∑

i=1

(q, x))

as the priorities. Although they are no longer theore-

tical lower bounds, experiments show that they bring

a more accurate nearest neighbor search.

In the conventional sketch search, the Hamming

distance is used as the priority at the ﬁrst stage, and it

is advantageous that high speed calculation is possible

by bit operations. In the proposed method, since the

distance lower bounds are obtained for a large number

of data for each question, the cost increase can be re-

duced by preparing a table function. The cost increase

can almost be ignored.

4 EXPERIMENTS

In this section, we report experiments using several

data, which are images, music and colors, as follows:

ICPRAM 2018 - 7th International Conference on Pattern Recognition Applications and Methods

360

































Figure 3: Distance Lower Bound.

• images: about 70 million 2D frequency spectrums

of 64 dimension data extracted from 2,900 videos.

• music: about 70 million mel-frequency spectrums

of 96 dimension data extracted from 1,400 music

CD.

• colors: about 110,000 data of 112 dimension from

SISAP database.

We adopted 32 bit as the width of sketches. 32 pi-

vots are selected by SELECTPIVOTQBP with TRIAL

= 1000. For each database, we selected 10 different

sets of pivots. The experimental results of precision

show the average of these 10 sets of pivots.

Randomly generated data are not appropriate for

experiments of nearest neighbor search, because in

higher dimensional spaces it is rare to ﬁnd near data.

Therefore, we prepare ﬁve types of queries: very-

near, near, middle, far, very-far which are genera-

ted from randomly selected pairs from database with

mixing noise ratio of 5 − 10%, 15 − 20%, 25 − 30%,

35 − 40%, 45 − 50%, respectively. For example, a

very-near query q is a weighted sum of randomly se-

lected data x and y from database with weight 10%

and 90%, respectively. For each noise level, we pre-

pare 100 queries. The average of nearest neighbor

distances for queries are shown in Figure 4.













         





  

Figure 4: The average of nearest neighbor distances.

First, we compare run times using priorities of the

Hamming distance, score

∞

, score

, and score

. Ta-

ble 1 illustrates the computer environment.

Table 1: The computer environment.

CPU Intel(R) Xeon(R) CPU E5-2640

2.5 GHz

memory 64GBytes

Table 2 shows the average run time per query in

millisecond. From this table, no signiﬁcant difference

between methods is observed. Thus, computational

costs for distance lower bounds and their aggregation

are almost the same as that for the Hamming distance.

In this table, only the results for images and music

databases are shown, because the run time for colors

is very small.

Table 2: Run time.

DB images music

K 0.10% 1.0% 0.10% 1.0%

Hamming 19.5 71.5 21.1 87.5

score

16.8 85.5 23.0 111

score

22.5 89.1 23.4 121

score

∞

15.2 63.4 16.9 74.2

The precision of the search is deﬁned as the pro-

vability that the top K candidates of the ﬁrst stage in-

clude the exact nearest neighbor. We report the results

for K of three sizes 0.01%, 0.1% and 1.0% relative to

database size.

Figure 5, 6 and 7 summarize the precisions of se-

arch for databases images, music and colors, respecti-

vely, where score

is omitted, because it is very simi-

lar to score

5 CONCLUDING REMARKS

From the experiments, we conﬁrmed that the preci-

sion is improved by the proposed methods using the

Nearest Neighbor Search using Sketches as Quantized Images of Dimension Reduction

361















































  

Figure 5: Precisions for images.















































  

Figure 6: Precisions for music.















































  

Figure 7: Precisions for colors.

distance lower bounds, their maximum score

∞

, their

total sum score

, and their total square sum score

compared to using the Hamming distance. One of the

most important tasks for the future work of this rese-

arch may be to examine why score

and score

give

better priorities than score

∞

and Hamming distance.

Since all priorities in these experiments have a simi-

lar tendency to any of the three databases of image,

music and colors, we think that the dependency of da-

tabase is unlikely. We should consider the inﬂuence

of the width w of sketches, because score

∞

will im-

prove by the larger w, but score

and score

may get

worse. There is a possibility that the same phenome-

non may also occur in S-Map, where L

∞

is used in

projected space. This is also a future issue.

In the experiments we reported, we optimized ske-

tches by random trials using binary quantization. We

can apply optimization techniques such as local se-

arch and simulated annealing. We run annealing by

increasing resampling (AIR) introduced by Imamura

et.al (Imamura et al., 2017) on pivot selection for ske-

tches to minimize collisions. Although sketches with

smaller collision probability are found by using them,

the search precision is not improved. Therefore, we

have to consider other evaluation function for sket-

ches rather than the probability of collision. For ex-

ample, it is conceivable to use the correlation coefﬁ-

cient of distances before and after projection between

sample pairs and the distance preservation ratio of the

distance lower bounds as the evaluation function.

In this paper, we consider sketches based on ball

partitioning, which can be considered as quantized

images of S-Map. Generalized hyperplane partiti-

oning (Uhlmann, 1991) (GHP) can also be used to

make sketches. GHP based sketches can be conside-

red as quantized images of H-Map. We can calculate

ICPRAM 2018 - 7th International Conference on Pattern Recognition Applications and Methods

362

distance lower bounds between queries and sketches

even for GHP based sketches. We should also investi-

gate the effectiveness of proposed methods for GHP.

We compare aggregation methods score

and

score

in addition to score

∞

. In our experiments, their

performances are better than the traditional method

using the Hamming distances. The score

∞

gives a

theoretical distance lower bound. Therefore, for K

objects selected at the ﬁrst stage based on score

score

, safe pruning based on score

∞

is applicable.

At the second stage, any candidate with score

∞

lar-

ger than the actual distance of the provisional nearest

neighbor can be removed without computing the ac-

tual distance. We also have to consider score

aggre-

gation for p other than 1 and 2, such as 1.5.

REFERENCES

Balu, R., Furon, T., and J

egou, H. (2014). Beyond “project

and sign” for cosine estimation with binary codes. In

Proc. ICASPP 2014, pages 6884–6888. IEEE.

Charikar, M. (2002). Similarity estimation techniques from

rounding algorithms. In Proc. STOC, pages 380–388.

ACM.

Dong, W., Charikar, M., and Li, K. (2008). Asymmetric

distance estimation with sketches for similarity search

in high-dimensional spaces. In Proc. ACM SIGIR’08,

pages 123–130.

Faloutsos, C. and Lin, K. (1995). Fastmap: A fast algo-

rithm for indexing, data-mining and visualization of

traditional and multimedia datasets. In Proc. ACM

SIGMOD’95, volume 24, pages 163–174.

Fukunaga, K. (1990). Statistical pattern recognition. (se-

cond edition). Academic Press.

Hjaltason, G. and Samet, H. (2003). Properties of em-

bedding methods for similarity searching in metric

space. IEEE Transactions on Pat. Anal. Mach. Intel.,

25(5):530–549.

Imamura, Y., Higuchi, N., Kuboyama, T., Hirata, K., and

Shinohara, T. (2017). Pivot selection for dimension

reduction using annealing by increasing resampling.

In Proc. LWDA 2017.

Jain, M., J

egou, H., and Gros, P. (2011). Asymmetric ham-

ming embedding: taking the best of our bits for large

scale image search. In Proc. AMC Multimedia 2011,

pages 1441–1444.

Matousek, J. (2002). Lectures on discrete geometry. Sprin-

ger Verlag.

Mic, V., Novak, D., and Zezula, P. (2015). Improving ske-

tches for similarity search. In Proc. MEMICS’15, pa-

ges 45–57.

Mic, V., Novak, D., and Zezula, P. (2016). Speeding up

similarity search by sketches. In Proc. SISAP 2016,

pages 250–258.

uller, A. and Shinohara, T. (2009). Efﬁcient similarity

search by reducing i/o with compressed sketches. In

Proc. SISAP’09, pages 30–38.

Ohno, S. (2011). A study on quantization dimension re-

duction mapping for similarity search in multidimen-

sional databases. Master’s thesis, Kyushu Institute of

Technology, (in Japanese).

Shinohara, T., Chen, J., and Ishizaka, H. (1999). H-map: A

dimension reduction mapping for approximate retrie-

val of multi-dimensional data. In Proc. DS’99, LNAI

1721, pages 299–305.

Shinohara, T. and Ishizaka, H. (2002). On dimension re-

duction mappings for approximate retrieval of multi-

dimensional data. In Progress of Discovery Science,

LNCS 2281, pages 89–94. Springer Berlin / Heidel-

berg.

Uhlmann, J. (1991). Satisfying general proximity/similarity

queries with metric trees. Inform. Process. Let.,

40(4):175–179.

Wagner, R. and Fischer, M. (1974). The string-to-string

correction problem. J. ACM, 21:168–178.

Wang, Z., Dong, W., Josephson, W., Q. Lv, M. C., and Li,

K. (2007). Sizing sketches: A rank-based analysis for

similarity search. In Proc. ACM SIGMETRICS’07, pa-

ges 157–168.

Yianilos, P. (1993). Data structures and algorithms for nea-

rest neighbor search in general metric spaces. In Proc.

SODA 1993, pages 311–321. ACM Press.

Nearest Neighbor Search using Sketches as Quantized Images of Dimension Reduction

363