Large-scale Image Retrieval based on the Vocabulary Tree
Bo Cheng, Li Zhuo, Pei Zhang
and Jing Zhang
Signal & Information Processing Laboratory, Beijing University of Technology, Beijing, China
Keywords: Vocabulary Tree, Large-scale Image Retrieval, Optimized SIFT, Local Fisher Discriminant Analysis.
Abstract: In this paper, vocabulary tree based large-scale image retrieval scheme is proposed that can achieve higher
accuracy and speed. The novelty of this paper can be summarized as follows. First, because traditional
Scale Invariant Feature Transform (SIFT) descriptors are excessively concentrated in some areas of
images, the extraction process of SIFT features is optimized to reduce the number. Then, combined with
optimized-SIFT, color histogram in Hue, Saturation, Value (HSV) color space is extracted to be another
image feature. Moreover, Local Fisher Discriminant Analysis (LFDA) is applied to reduce the dimension of
SIFT and color features, which will help to shorten feature-clustering time. Finally, dimension-reduced
features are used to generate vocabulary trees which will be used for large-scale image retrieval. The
experimental results on several image datasets show that, the proposed method can achieve satisfying
retrieval precision.
1 INTRODUCTION
Image retrieval has been an active research topic in
recent years due to its potentially large impact on
both image utilization and organization. Researchers
aim to seek ways that have greater promptness and
accuracy.
Content-Based Image Retrieval (CBIR) is
currently considered as the mainstream method
because of desirable processing speed and
objectivity. It detects and extracts visual features of
image (e.g. global feature and local feature)
automatically by means of image processing and
computer vision algorithm. In most of cases, a
retrieval system takes visual features of a query
image given by a user, and then the features are
compared with the features stored in a database. As
a result, the user will receive images that have
similar features with the query image.
CBIR mainly includes two key parts: feature
extraction and similarity comparison. The features
usually can be divided into two kinds: global
features and local features. The most commonly
used local features contain Scale Invariant Feature
Transform (SIFT, Lowe D. G., 2004), Principle
Component Analysis-SIFT (PCA-SIFT, Ke Y. et al.,
2004), Speeded Up Robust Features (SURF, Bay
H. et al., 2008), and Gradient Location-Orientation
Histogram (GLOH, Mikolajczyk K. et al., 2005) as
well. Relying on grey information of images, SIFT
features which are adopted to operate accurate and
speedy image retrieval from large-scale database can
be invariant to changes of image scaling, rotation,
illumination, and others. PCA-SIFT performs well in
terms of image rotation, blur and illumination
changes, while not at scaling and affine
transformations. Moreover, projection matrix of
PCA-SIFT needs a series of typical images, which is
only appropriate for the specific type. SURF runs
three times faster than SIFT concerning
computational complexity. It also works more robust
than SIFT when blurred images are processed.
Nonetheless, SURF does not operate as well as SIFT
in dealing with images affected by scaling, rotation
and illumination changes. As the extension of SIFT,
GLOH can improve robustness and discrimination
performance of the descriptors.
Establishing an effective index mechanism is
another critical aspect to fulfill fast retrieval in the
large-scale image database. Currently, there are
three kinds of methods: K-D Tree (Böhm C. et al.,
2001), LSH (Gionis A. et al., 1999), and vocabulary
tree (Nist´er D. et al., 2006). K-D tree uses the
nearest neighbour search to build the index of the
images. Its search accuracy is higher in low
dimensional space while the performance of K-D
tree drops rapidly when dimensions are increasing.
LSH can be used to reduce dimensions with multiple
299
Cheng B., Zhuo L., Zhang P. and Zhang J..
Large-scale Image Retrieval based on the Vocabulary Tree.
DOI: 10.5220/0004661802990304
In Proceedings of the 9th International Conference on Computer Vision Theory and Applications (VISAPP-2014), pages 299-304
ISBN: 978-989-758-004-8
Copyright
c
2014 SCITEPRESS (Science and Technology Publications, Lda.)