et.al, 2000) based on performance statistics reported
as Cumulative Match Scores (CMS), which are
plotted on a graph. The horizontal axis of the graph
is retrieval rank and the vertical axis is the
probability of identification (PI) (or percentage of
correct matches). Simply, a higher curve reflects
better performance.
The FERET database provides some tools for
preprocessing of the face images. We utilized some
of these tools in the preprocessing stage of our
evaluation. First, the images were cropped to the
same size, which roughly contain the face area. They
are subsequently aligned and adjusted by
illumination normalization. No mask is applied to
the images.
6.2 Training and Retrieval Process
Our image database retrieval problem is formulated
as follows. Each probe image from probe set FB has
its corresponding image in gallery set FA. We use
the feature vector histograms of images and
similarity measure defined above to find out the
image in FA which gives minimum distance from
the probe image. If the found gallery image
represents the same person as the probe image, this
retrieval will be defined as a correct one.
However, before this can be done the parameters
used for the calculation of histograms and similarity
measure need to be found using training database
set. This set can be selected as a small subset of the
database. Knowing the correct responses for the
training database allows us to tune the parameters to
achieve best retrieval results. The optimal parameter
set which will be found out during training process
includes: the quantization scalar and length of
histogram. The optimal parameter set is identified as
the one which is maximizing the retrieval
performance over training database. The resulted
optimal parameter set is applied to the whole
database to evaluate the actual system performance.
Figure 3: Training process based on five different small
sets.
In order to show that the selection of different
training set has insignificant impact over final
performance, the retrieval process is repeated five
times; each time using a different training set
containing 50 images, and the remaining 942 images
is the testing set. The final CMS curve is the average
of the five CMS curves resulted from above five
training sets. This process is shown in Figure 3.
6.3 Experiments and Results
We conducted three retrieval tests: A, B and C. They
are defined as below. Within each test, performances
of histogram based on DC-TFV, AC-TFV and their
combinations are evaluated separately.
Test-A: Histograms are generated from the whole
image.
Test-B: 512 subimages are randomly defined,
covering everywhere of the image. Their
sizes are varied a lot. Only one of them is
used to generate the histograms.
Test-C: Two of above 512 subimages are used to
generate the histograms. The total number
of tested combinations is 216. They come
from two different areas (eyes, nose and
mouth), in another word, they are non-
overlapping.
The result of Test-A serves as the reference for
the evaluation of the performances of Test-B and
Test-C. The corresponding CMS results are shown
in Table 1. The Rank-1 CMS is used here to
represent the retrieval accuracy (i.e., the CMS at the
first rank). On should notice that the performance of
DC-TFV has already reached a saturation area, the
improvement is relatively small; while significant
improvement can be found in the AC-TFV.
Since the subimage is randomly selected and
used, we presented the mean of performance of all
the subimages or combinations, in order to prevent
from any possible bias due to the usage of specific
subimage. From here one can see, although the
subimages cover less area than the whole image, the
performance gets improved. The reason for this is
that the division of image emphasizes some key
areas containing critical information for retrieval. In
addition, based on the block transform, TFV and
subimage, the local visual information is efficiently
organized by a three-layer hierarchical system.
Statistical information is represented by histogram,
and involving certain amount of structural
information, which finally leads to a good
performance.
A THREE-LAYER SYSTEM FOR IMAGE RETRIEVAL
211