Figure 4: Characteristics of s = 1000, d
s
[U0(DB)]
n
=
line 1, d
s
[U1(DB)]
n
= line 2, d
s
[U2(DB)]
n
= line 3,
d
s
[U3(DB)]
n
= line 4 and d
s
[U4(DB)]
n
= line 5. Line 5
represents ε.
(12288 · 20+ 3072· 95+ 768· 315+ 192· 835+
+48· 1000+ 4 = 987844
which is 12.4 times less complex than a list matching
which requires 12288· 1000 operations. Further op-
timization of our results could be achieved by better
quantization training (clustering algorithms).
3 CONCLUSIONS
We propose hierarchical product quantization for vec-
tor retrieval with no error for vector based databases.
Through quantization by hierarchical clustering the
distribution of the points in the high dimensional vec-
tor space can be estimated. Our method is exact and
not approximative. It means we are guaranted to find
the most similar vector according to a distance or sim-
ilarity function. We demonstrated the working prin-
ciples of our model by empirical experiment on one
thousand gray images which correspond to 12288 di-
mensional vectors.
ACKNOWLEDGEMENTS
This work was supported by Fundac¸˜ao para a Ciˆencia
e Tecnologia (FCT): PTDC/EIA-CCO/119722/2010.
REFERENCES
Andoni, A., Dater, M., Indyk, P., Immorlica, N., and Mir-
rokni, V. (2006). Locality-sensitive hashing using sta-
ble distributions. In MIT-Press, editor, Nearest Neigh-
bor Methods in Learning and Vision: Theory and
Practice, chapter 4. T. Darrell and P. Indyk and G.
Shakhnarovich.
Ciaccia, P. and Patella, M. (2002). Searching in metric
spaces with user-defined and approximate distances.
ACM Transactions on Database Systems, 27(4).
Faloutsos, C. (1999). Modern information retrieval. In
Baeza-Yates, R. and Ribeiro-Neto, B., editors, Mod-
ern Information Retrieval, chapter 12, pages 345–365.
Addison-Wesley.
Faloutsos, C., Barber, R., Flickner, M., Hafner, J., Niblack,
W., Petkovic, D., and Equitz, W. (1994). Efficient and
effective querying by image content. Journal of Intel-
ligent Information Systems, 3(3/4):231–262.
Jegou, H., Douze, M., and Schmid, S. (2011). Product quan-
tization for nearest neighbor search. IEEE Transac-
tions on Pattern Analysis and Machine Intelligence,
33(1):117–128.
Olafsson, A., Jonsson, B., and Amsaleg, L. (2008). Dy-
namic behavior of balanced nv-trees. In Interna-
tional Workshop on Content-Based Multimedia Index-
ing Conference Proceedings, IEEE, pages 174–183.
Paolo Ciaccia, Marco Patella, P. Z. (1997). M-tree: An ef-
ficient access method for similarity search in metric
spaces. In VLDB, pages 426–435.
Sakurai, Y., Yoshikawa, M., Uemura, S., and Kojima, H.
(2002). Spatial indexing of high-dimensional data
based on relative approximation. VLDB Journal,
11(2):93–108.
Wang, J., Li, J., and Wiederhold, G. (2001). Simplicity:
Semantics-sensitive integrated matching for picture li-
braries. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 23(9):947–963.
Wichert, A. (2008). Content-based image retrieval by hier-
archical linear subspace method. Journal of Intelligent
Information Systems, 31(1):85–107.
Wichert, A. (2009). Image categorization and retrieval. In
Proceedings of the 11th Neural Computation and Psy-
chology Workshop. World Scientific.
Wichert, A., Teixeira, P., Santos, P., and Galhardas, H.
(2010). Subspace tree: High dimensional multimedia
indexing with logarithmic temporal complexity. Jour-
nal of Intelligent Information Systems, 35(3):495–
516.
ICEIS2012-14thInternationalConferenceonEnterpriseInformationSystems
92