search – SVS, to dynamically index a stream of high-
dimensional vectors and facilitate similarity search.
SVS is continuous in the sense that it does not depend
on having the full set of vectors available beforehand,
but adapts to the vector stream.
This family of algorithms provides an elegant solu-
tion to the vector stream similarity search problem that
does not depend on updating the underlying vector in-
dexing method, which is usually expensive, as pointed
out in the related work section. Indeed, the original
contribution of the paper stems from the observation
that a stream of vectors that become obsolete over
time requires an approach different from static vector
indexing methods or updating such data structures.
The paper discussed two sets of experiments to
assess the performance of SVS. The first set of ex-
periments used an IVFADC implementation and the
same setup as in (J
´
egou et al., 2011), and the second
set adopted an HNSW implementation over real data.
These experiments suggested that the SVS implemen-
tations do not incur significant overhead and achieve
reasonable search quality. However, SVS can support
unbounded vector streams.
The paper concluded with a brief description of a
proof-of-concept implementation of a classified ad re-
trieval tool, based on Jina and Redis with HNSW. The
tool allows testing different configurations by varying
the embedding dimensions, the type of the indices,
the distance metrics adopted, and some optimization
parameters.
As future work, we plan to conduct further experi-
ments with the proof-of-concept retrieval tool, using
much larger datasets collected from the classified ad
platform and larger sets of realistic queries.
REFERENCES
Bengio, Y., Courville, A., and Vincent, P. (2013). Repre-
sentation learning: A review and new perspectives.
IEEE transactions on pattern analysis and machine
intelligence, 35(8):1798–1828.
Beyer, K., Goldstein, J., Ramakrishnan, R., and Shaft, U.
(1999). When is “nearest neighbor” meaningful? In
International conference on database theory, pages
217–235. Springer.
Cai, Y., Ji, R., and Li, S. (2016). Dynamic programming
based optimized product quantization for approximate
nearest neighbor search. Neurocomputing, 217:110–
118.
Costa Pereira, J., Coviello, E., Doyle, G., Rasiwasia, N.,
Lanckriet, G., Levy, R., and Vasconcelos, N. (2014).
On the role of correlation and abstraction in cross-
modal multimedia retrieval. Transactions of Pattern
Analysis and Machine Intelligence, 36(3):521–535.
Datar, M., Immorlica, N., Indyk, P., and Mirrokni, V. S.
(2004). Locality-sensitive hashing scheme based on
p-stable distributions. In Proceedings of the twentieth
annual symposium on Computational geometry, pages
253–262.
Fu, C., Xiang, C., Wang, C., and Cai, D. (2019). Fast
approximate nearest neighbor search with the nav-
igating spreading-out graph. Proc. VLDB Endow.,
12(5):461–474.
Ge, T., He, K., Ke, Q., and Sun, J. (2013). Optimized product
quantization for approximate nearest neighbor search.
In 2013 IEEE Conference on Computer Vision and
Pattern Recognition, pages 2946–2953.
Gionis, A., Indyk, P., Motwani, R., et al. (1999). Similarity
search in high dimensions via hashing. In Proc. 25th
International Conference on Very Large Data Bases,
page 518–529, San Francisco, CA, USA. Morgan Kauf-
mann Publishers Inc.
Guo, R., Sun, P., Lindgren, E., Geng, Q., Simcha, D., Chern,
F., and Kumar, S. (2020). Accelerating large-scale
inference with anisotropic vector quantization. In Proc.
37th International Conference on Machine Learning,
ICML’20, page 10. JMLR.org.
Hameed, I. M., Abdulhussain, S. H., and Mahmmod, B. M.
(2021). Content-based image retrieval: A review of
recent trends. Cogent Engineering, 8(1):1927469.
Jegou, H., Douze, M., and Schmid, C. (2008). Hamming
embedding and weak geometric consistency for large
scale image search. In Computer Vision – ECCV 2008,
pages 304–317.
J
´
egou, H., Douze, M., and Schmid, C. (2011). Product
quantization for nearest neighbor search. IEEE Trans-
actions on Pattern Analysis and Machine Intelligence,
33(1):117–128.
Johnson, J., Douze, M., and Jegou, H. (2021). Billion-scale
similarity search with gpus. IEEE Transactions on Big
Data, 7(03):535–547.
Li, X., Yang, J., and Ma, J. (2021). Recent developments of
content-based image retrieval (cbir). Neurocomputing,
452:675–689.
Liu, C., Lian, D., Nie, M., and Hu, X. (2020). Online
optimized product quantization. In 2020 IEEE Inter-
national Conference on Data Mining (ICDM), pages
362–371.
Malkov, Y. A. and Yashunin, D. A. (2020). Efficient and
robust approximate nearest neighbor search using hi-
erarchical navigable small world graphs. IEEE Trans.
Pattern Anal. Mach. Intell., 42(4):824–836.
Muja, M. and Lowe, D. G. (2009). Fast approximate near-
est neighbors with automatic algorithm configuration.
VISAPP (1), 2(331-340):2.
Xu, D., Tsang, I. W., and Zhang, Y. (2018). Online product
quantization. IEEE Transactions on Knowledge and
Data Engineering, 30(11):2185–2198.
Yang, W., Li, T., Fang, G., and Wei, H. (2020). Pase:
Postgresql ultra-high-dimensional approximate near-
est neighbor search extension. In Proc. 2020 ACM
SIGMOD International Conference on Management of
Data, page 2241–2253.
ICEIS 2023 - 25th International Conference on Enterprise Information Systems
42