is only an approximation method. One of the most
important tasks for sketches is achieving high accu-
racy with small number of candidates obtained at the
first stage.
When the width w of sketches is considered as the
dimensionality, the dimensionality may not be redu-
ced. However, as the size w bit is usually much smal-
ler than the original data, we may consider mapping
to sketches as a quasi-dimension reduction. Nevert-
heless, the Lipschitz continuity of the mapping is not
guaranteed as long as the Hamming distance is used.
On the other hand, since the sketch of w bits is de-
fined by using w BP’s, we can regard w-bits sketches
as quantized images of S-Map (Ohno, 2011). That is,
BP sketches are quantized images of S-Map, where
each axis value is quantized to one bit depending on
whether or not greater than thresholds. Note that the
L
∞
distance should be used to guarantee the Lipschitz
continuity of S-Map. Any L
∞
distance between ske-
tches is 0 or 1, that is, the quantization error is very
large. As for data in the database, we should use only
sketches at the first stage because the original high di-
mensional data are too large. However, as for queries,
we can use the original queries as well as their sket-
ches. Hence, in this paper, we show that, for each ske-
tch bit, a lower bound of distance between a query q
and data x can be calculated using q and the sketch of
x without the original x. If we take score
∞
defined by
the maximum of distance lower bounds as the aggre-
gation like L
∞
distance, the distance estimation using
sketch is not expanded and a BP sketch mapping can
be considered as a quasi-dimension reduction. Simi-
lar idea is also found in asymmetric distance estima-
tion (Dong et al., 2008; Jain et al., 2011; Balu et al.,
2014), where sketches are constructed by generalized
hyperplane partitioning(GHP), some of them assume
the Euclidean distance or cosine distance, and their
estimation may expand the distance.
For w bit sketches, we have w distance lower
bounds. To guarantee distance lower bounds, we have
to use the aggregation by maximum score
∞
. In ap-
proximate nearest neighbor search using sketches, we
propose score
1
and score
2
, which are the aggregati-
ons by sum and square sum respectively, which are
no longer distance lower bounds. By experimenting
on images, music and colors databases, we observe
that score
1
and score
2
achieve a more accurate nea-
rest neighbor search compared to the Hamming dis-
tance and score
∞
. Na
¨
ıve implementation of aggre-
gation needs more computational cost than the Ham-
ming distance using bit operations. For each query
we can precompute aggregations to construct a ta-
ble function. The cost for aggregation using table
function is almost comparable with the cost for the
Hamming distance using bit operations. We can ig-
nore this preprocess cost if we search large database.
We believe that our contribution lies in the fol-
lowing three points. First, in any metric space, ba-
sed on the observation that BP sketches are quantized
images of dimension reduction S-Map, we point out
that the sketch mapping can be considered as a quasi-
dimension reduction by aggregating distance lower
bounds between the query and a sketch for each bit
of the sketch in L
∞
manner. Similar methods are
used elsewhere (Charikar, 2002; Jain et al., 2011),
where sketches are based generalized hyperplane par-
titioning (GHP) in Euclidean distance and cosine dis-
tance, and aggregated distance lower bounds is just a
distance estimation but lower bound. Here, we also
point out that GHP sketches are quantized images of
H-Map. Thus, our approach is easily extended to be
applicable to any GHP based sketches. Second, we
propose a low cost method to compute aggregations
using precomputed table functions. This contribu-
tion is not very significant but important for practi-
cal problems. In our setting, we assume that data are
made by feature extraction function and they are not
very high dimension. Therefore, we cannot ignore the
computational cost for the first stage. Third, we pro-
pose sum or square sum aggregations of distance lo-
wer bounds for the priority at the first stage. Such
aggregations are no longer distance lower bound. Ne-
vertheless they are more useful than maximum ag-
gregation. Similar techniques are found elsewhere,
where GHP based sketches are considered in Eucli-
dean distance or cosine distance. Our method is ap-
plicable to BP sketches in any metric space and easily
extended to GHP.
2 PRELIMINARIES
Here, we briefly introduce some necessary concepts
for our discussion.
2.1 Dimension Reduction and
Simple-Map
We assume two metric spaces (U, D) and (U
′
, D
′
),
where D and D
′
are distance functions satisfying tri-
angle inequality. Let dim(x) for a data x denote the
dimensionality of x. Then, we say that a mapping
ϕ : U → U
′
is a dimension reduction if it satisfies the
following conditions for every x, y ∈ U:
dim(ϕ(x)) < dim(x) (1)
D
′
(ϕ(x), ϕ(y)) ≤ D(x, y) (2)
Nearest Neighbor Search using Sketches as Quantized Images of Dimension Reduction
357