A NEW DESCRIPTOR BASED ON 2D DCT
FOR IMAGE RETRIEVAL
Cong Bai, Kidiyo Kpalma and Joseph Ronsin
Université Européenne de Bretagne, Rennes, France
INSA de Rennes, IETR, UMR 6164, F-35708, Rennes, France
Keywords: CBIR, DCT, Face Recognition, Texture Retrieval.
Abstract: Content-based image retrieval relies on feature comparison between images. So the selection of feature
vector is important. As many images are compressed by transforms, constructing the feature vector directly
in transform domain is a very popular topic. We propose a new feature vector in DCT domain. Our method
selects part of DCT coefficients inside each block to construct AC-Pattern and use DC coefficients between
neighboring blocks to construct DC-Pattern. Two histograms are formed and parts of them are used to build
a descriptor vector integrating features to do image retrieval. Experiments are done both on face image
databases and texture image database. Compared to other methods, results show that we can get better
performance on both face and texture database by using the proposed method.
1 INTRODUCTION
The rapid increase in the digital image collections
gives more and more information. However,
difficulty for an efficient use of this information is
growing, unless we can browse, search and retrieve
it easily. Content-based image retrieval (CBIR) has
been an active research field in pattern recognition
and computer vision for decades. As much of images
are stored in compressed format with different kinds
of transforms, image retrieval in transform domain
have been widely developed in many researches.
Discrete Cosine Transform (DCT) is used in
JPEG compression standard. In another aspect, DCT
is also used as a efficient tool to extract features in
image retrieval. Consequently in last decades,
several researches appeared in DCT-based image
retrieval. Composing coefficients for a feature vector
that represents the image leads to different solutions.
In (Tsai et al., 2006), the upper left DCT coefficients
are categorized into four groups, one is DC
coefficients and other three includes the coefficients
which have vertical, horizontal and diagonal
information. And these four groups compose the
feature vectors. In (Zhong and Defée, 2005), 2 DCT
patterns are generated from the DCT blocks and then
their histograms are constructed and combined to do
retrieval. In (Bai et al., 2011), an improved
histogram descriptor is obtained by using zig-zag
scan and observing adjacent patterns on coefficients.
The wide use of DCT in image compression and
image retrieval comes from its capability to compact
the energy. It means that much of the energy lies in
low frequency coefficients, so that high frequency
can be discarded without visible distortion. In other
words, only a reduced part of DCT coefficients can
efficiently represent the image contents. In
comparison to the use of all of the coefficients, this
reduces the complexities and redundancies of the
features vectors applied for image retrieval. Our
method proposed in this paper is inspired from this
consideration and previous works (Tsai et al., 2006)
(Zhong and Defée, 2005) (Bai et al., 2011).
In this paper, we present a simple but effective
approach to construct a descriptor from DCT
coefficients for image retrieval. Experimental results
show that our method can apply both on face
database and texture database, corresponding to
different structure of image contents and moreover
can achieve better performance than many classical
methods and state-of-art approach.
The rest of the paper is organized as follows.
Constructing patterns and descriptor vectors are
described in section 2. Section 3 presents the
analysis of experimental results both on face and
texture databases and finally a conclusion is given in
section 4.
714
Bai C., Kpalma K. and Ronsin J..
A NEW DESCRIPTOR BASED ON 2D DCT FOR IMAGE RETRIEVAL.
DOI: 10.5220/0003858907140717
In Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP-2012), pages 714-717
ISBN: 978-989-8565-03-7
Copyright
c
2012 SCITEPRESS (Science and Technology Publications, Lda.)
2 DESCRIPTION OF METHOD
2.1 General Description
In this study, we use 44× block DCT transform. So
we get 1 DC coefficient and 15 AC coefficients for
each block. For each block, we select 9 AC
coefficients to construct AC-Pattern, and use DC
coefficients of the block itself and DC coefficients
of its 8 neighboring blocks to build DC-Pattern. We
generate the histogram of pattern as number of
appearance of patterns in the image. Finally, we use
the concatenation of AC-Pattern histogram and DC-
Pattern histogram as the descriptor of the image to
do retrieval.
2.2 Pre-Processing
We adopt the luminance normalization method
presented in (Zhong and Defée, 2005) as pre-
processing steps to eliminate the effect of luminance
variations. From these pre-processed coefficients,
histograms will be observed considering the
occurrence of their contents.
2.3 AC-Pattern and Its Histogram
As mentioned before, subsets of coefficients can
represent the image content. So we will select at
most 9 coefficients in each block to construct the
AC-Pattern. This selection gathers these 9
coefficients into 3 groups: horizontal, vertical and
diagonal, as shown in Fig.1.
Figure 1: AC-Pattern.
This selection is retained because of its ability
represent local structure of content block. For
efficiency we calculate the sum of 2 or 3 coefficients
in each group and use these respective 3 summations
to form the AC-Pattern. We use parameter Nc to
represent the number of coefficients that we used in
each group, thus Nc=2 or Nc=3. According to our
experimental results, we use Nc=2. Thus we use
only 6 coefficients to construct the AC-Pattern, and
the dimension of this AC feature vector is 3.
Compared with the method presented in (Zhong and
Defée, 2005) (Bai et al., 2011), this selection can
reduce the complexities of the feature vector
obviously.
From the original histogram of AC-Pattern, we
can make two observations: the first is that there is
only part of AC-Patterns that appears in large
quantities and a large number of AC-Patterns that
appear rarely (Zhong and Defée, 2005). So in
consideration of time-consuming and efficiency, we
just select some of AC-Patterns which have higher
frequency to construct the histogram. We use
parameter ACbins to represent the number of AC-
Patterns that are selected. For constructing the AC-
Pattern histogram of an image, we just calculate the
number of appearance of these AC-Patterns in this
image, and then we get the AC-Pattern
histogram
AC
H . The second observation is that the
first AC-Pattern inside the histogram is very
dominant. This AC-Pattern mainly corresponds to
blocks of image background and we will not
consider this pattern in the AC-Pattern histogram by
eliminating it.
So from these two observations the histogram of
AC-Patterns that we will use for retrieval is as
shown in Fig.2. In this histogram, we select first 70
(ACbins) high frequencies AC-Patterns.
Figure 2: Histogram of selected AC-Pattern.
2.4 DC-Pattern and Its Histogram
In complement to previous features that describes
the local structures inside each block, we will
observe, for each block and its neighbours, more
global structures features by using gradients between
blocks. To do so, DC-DirecVec (Zhong and Defée,
2005) is defined and used as feature for DC-Patterns.
Like the same observations can be done in AC-
Pattern histogram, we select those dominant DC-
A NEW DESCRIPTOR BASED ON 2D DCT FOR IMAGE RETRIEVAL
715
Patterns to construct DC-Pattern histogram.
2.5 Feature Vector Descriptor and
Similarity Measurement
We use the concatenation of AC-Pattern and DC-
Pattern histogram to do image retrieval. In this
context, the descriptors are defined as follows:
],)1[(
DCAC
HHD ××=
α
α
(1)
α
is a weight parameter that controls the impact
of AC-Patterns and DC-Patterns histogram.
To measure the similarity between two
descriptors we use the Manhattan distance:
=
=
m
k
jiji
kDkDDis
1
,
)()(
(2)
where
k demonstrates the components of the
descriptor and
ji,
demonstrate the different
descriptors,
m indicates the total number of
components in the descriptor, the dimension of
descriptor, for example.
3 EXPERIMENTAL RESULTS
We perform retrieval experiments on ORL face
database (AT&T Laboratories Cambridge) and also
on VisTex texture database (Media Laboratory
MIT).
3.1 Experiments on Face Database
The ORL database includes 10 different images of
40 peoples. For tests, we use first 6 images as image
database and remaining 4 images as test images for
retrieval. Therefore, the total number of images in
the database is 240 and that of query images is 160.
For evaluating the performance we use Equal
Error Rate (EER) (Bolle et al., 2000). If a value is
used to express the similarity between query images
and images in the database, so given a certain
threshold, an input image of certain class A, may be
recognized falsely as class B. Then the ratio of how
many images of class A have been recognized as
other class is called FRR (False Rejected Rate),
while the ratio of how many images of other classes
have been recognized into class A is call FAR (False
Accept Rate). When both rates take equal values, an
equal error rate (EER) is got. The lower the EER is,
the better is the system’s performance, as the total
error rate is the sum of FAR and FRR.
We show the result of the experiment in which
we use the concatenation of histogram of AC-Pattern
and histogram of DC-Pattern to do image retrieval.
In all the experiments, the histogram of DC-Pattern
is the same, but the approach to construct AC-
Pattern is different. We name the method presented
in (Zhong and Defée, 2005) as ‘linear scan’, the
method presented in (Bai et al., 2011) as ‘Adjacent
zig-zag’. For both methods, we tested different sets
of parameters to find the one that can assure the best
performance. For our proposal, we use Nc=2,
ACbins=70 and QPAC=30. And we adjust the
weight parameter
α
to see the global comparison of
the performance. The curves of the performance are
as follows:
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.04
0.05
0.06
0.07
0.08
0.09
0.1
0.11
0.12
0.13
a
EER
0.0864
0.05
0.0439
Linear scan
Adjacent zig-zag
Our Proposal
Figure 3: Performance of different method.
It can be observed that the performance of
retrieval can be improved by using the proposed
approach to construct the descriptor.
We further compare the performance of the
proposed method with those of other methods:
Principal Component Analysis (Naz et al., 2006), 2D
Principal Component Analysis (Xu et al., 2009) and
Linear discriminate analysis (Goudelis et al., 2007).
Table 1 shows the lowest EER reported in these
methods and the lowest EER of our method. So we
can conclude that our proposal outperforms all the
other referenced methods.
Table 1: Comparison of EER with other methods.
Method PCA 2D PCA LDA Proposal
EER 0.095 0.165 0.11 0.0439
3.2 Experiments on Texture Database
As we want to extend our solution to a wider
application field, we work with a selection of images
from the popular MIT Vision Texture Database
(VisTex), consisting of 40 textures which have
VISAPP 2012 - International Conference on Computer Vision Theory and Applications
716
already been extensively used in texture image
retrieval literature (Do et al., 2002) (Kokare, 2005)
(Huajing et al., 2008) (Kwitt et al., 2010). The
512x512 pixels colour versions of the textures are
divided into 16 nonoverlapping subimages (128x128
pixels) and converted to gray scale images, thus
creating a database of 640 images belonging to 40
class textures, each class with 16 different samples.
In retrieval experiments, a query image is each
one of 640 images in our database. The relevant
images for each query are all the subimages from the
same original texture. We use the average retrieval
rate (ARR) to evaluate the performance. For a given
query image, the retrieval rate is defined as the
percentage of the number of correct texture images
retrieved in the same class as the query texture
observed in the total number of retrieved images.
For comparison purpose, we retrieve 16 images for
each query. We use every subimage in the database
as query to do retrieval and get the average retrieval
rate finally.
Table 2 provides a quantized comparison. RCWF
indicates Rotated Complex Wavelet Filters method
proposed in (Kokare, 2005). CWT represents the
Complex Wavelet Transform method presented in
(Kingsbury et al., 1999). CWT+RCWF is also
presented in (Kokare, 2005). PTR (Kwitt, 2010) is a
probabilistic texture retrieval method based on dual-
tree complex wavelet transform. It can be observed
that our proposal outperforms other methods.
Table 2: Comparison of ARR with other methods.
Method RCWF CWT PTR CWT+RCWF Proposal
ARR(%) 75.78 80.78 81.73 82.34 83.64
4 CONCLUSIONS
In this paper we have presented a simple and
effective approach for constructing descriptor using
2D DCT coefficients intended to image retrieval.
Unlike other CBIR methods that usually focus on
one kind of image database, our approach is suitable
for different kind of image database. We evaluate
our method both in widely used face database and
texture database. From the point of view of
recognition rate or average retrieval rate, the
experimental results show higher performance
compared to classical and state-of-art methods.
REFERENCES
AT&T Laboratories Cambridge, ORL Database. http://
www.cl.cam.ac.uk/research/dtg/attarchive/facedatabas
e.html.
Bai Cong, Kpalma Kidiyo and Ronsin Joseph, 2011.
Analysis of histogram descriptor for image retrieval in
DCT domain. Proceeding of the 4
th
International
Conference on Intelligent Interactive Multimedia
Systems and Services, Vol. 11, 227-235.
Bolle R. M.,Pankanti S. and Ratha N. K., 2000. Evaluation
techniques for biometrics-based authentication
systems (FRR). Proc. International Conf. on Pattern
Recognition, Vol. 2, 831-837.
Do M. N. and Vetterli M., 2002. Wavelet-based texture
retrieval using generalized Guassian density and
Kullback-leibler distance. IEEE Trans. Image Process,
Vol. 11, No. 2, 146–158.
Goudelis, G., Zafeiriou, S. and Tefas, A.,Pitas, I., 2007.
Class-Specific Kernel-Discriminant Analysis for Face
Verification. IEEE Transactions on Information
Forensics and Security, Volume 2, 570 – 587.
Kim Yong-Ho, Lee Seok-Han and Lee Sangkeun, Kim
Tae-Eun, Choi Jong-Soo, 2010. A Novel Image
Retrieval Scheme Using DCT Filter-Bank of
Weighted Color Components. 2010 International
Conference on Information Science and Applications,
1-8.
Kokare M., Biswas P. K. and Chatterji, B. N., 2005.
Texture image retrieval using new rotated complex
wavelet filters. IEEE Transactions on Systems, Man,
and Cybernetics, Part B: Cybernetics, Vol. 35, Issue:
6, 1168 - 1178.
Kwitt, R.; Uhl, A., 2010. Lightweight Probabilistic
Texture Retrieval. IEEE Transactions on Image
Processing, Vol.19, Issue: 1, 241 – 253.
Media Laboratory, MIT. VisTex Database of textures.
http://vismod.media.mit.edu/vismod/imagery/VisionT
exture/
Naz, E., Farooq, U., and Naz, T., 2006. Analysis of
Principal Component Analysis-Based and Fisher
Discriminant Analysis-Based Face Recognition
Algorithms. 2006 International Conference on
Emerging Technologies, 121 – 127.
Tsai Tienwei, Huang Yo-Ping, and Chiang Tei-Wei, 2006.
Image Retrieval Based on Dominant Texture Features.
2006 IEEE International Symposium on Industrial
Electronics, 441-446.
Xu Zhijie, Zhang Jianqin, and Dai Xiwu, 2009. Boosting
for Learning a Similarity Measure in 2DPCA Based
Face Recognition. 2009 World Congress on Computer
Science and Information Engineering, Vol. 7, 130 –
134.
Zhong D., Defée I. 2005. DCT histogram optimization for
image database retrieval. Pattern Recognition Letters,
Vol. 26, 2272-2281.
A NEW DESCRIPTOR BASED ON 2D DCT FOR IMAGE RETRIEVAL
717