Table 1: Comparison of recognition performance (AP) on
VOC 2007 using best performing filter and original images.
In the fourth column the difference of AP between filtered
and original images are given.
class filter original diff filter name
aeroplane 64.9 64.4 +0.5 bilateral
bicycle 56.2 52.9 +4 wls
bird 43 37 +6 bilateral
boat 55.5 52.5 +3 colorboost
bottle 19 14.3 +4.7 bilateral
bus 43.4 43.1 +0.3 colorboost
car 69.4 68 +1.4 bilateral
cat 45.4 46.4 -1 colorboost
chair 42.4 41.6 +0.8 bilateral
cow 23.9 21.8 +2 wls
table 31.9 29.5 +2.4 bilateral
dog 35.8 36.1 -0.3 colorboost
horse 64.6 65.2 -0.7 colorboost
motorbike 52.6 49 +3.6 wls
person 78.7 77.8 +0.9 bilateral
plant 22.6 18.6 +4 bilateral
sheep 26.6 28 -1.4 bilateral
sofa 33.7 32.6 +1.1 blur
train 64.3 63.2 +1.1 bilateral
tv 39.9 39.2 +0.7 colorboost
For 16 out of 20 classes in Tab. 1 filtered images
produce better results than the original ones. Gradi-
ent suppression (e.g. bilateral or wls filters) in partic-
ular improves the AP performance by up to 6%. This
can be explained by the elimination of weak, noisy
gradients using abstraction filters such as bilateral fil-
tering. For instance, many of the images in the class
“bird” were captured with background such as vegeta-
tion and nature, which contain many fine detailed gra-
dients that are irrelevant for the classification. Focus-
ing the descriptors on dominant gradients (e.g. stems
from trees and not the leaves, bird shape and not the
feathers) helps to discriminate these images. Again
we note that the an automatic choice of the best per-
forming filter would be required for practical applica-
tions. However, in this experiment we are more in-
terested on the quantitative performance differences,
which indicate how much mAP can be gained by a
good choice of image filtering for preprocessing. The
filter parameters were manually chosen prior to all
experiments without focussing on increasing the per-
formance but purely on visual appearance to achieve
clearly visible filtering effects.
4.2 Image Retrieval
For reasons mentioned above, we collect our own
benchmark dataset with images that present particu-
lar challenge to the descriptors due to various render-
ing methods (e.g. logo is painted on a wall or carved
out of metal) which introduces more appearance vari-
ations (see Fig. 2 and Fig. 1 for some examples).
In such cases, image filtering is especially expected
to aid the matching process. The dataset consists
of 30 random logos classes from well known brands
(e.g. Coca Cola). For each logo 10 random images
were pooled out of 1000 images downloaded from
www.flickr.com using the logo name as the search
query. For all 300 images of the dataset the oc-
curences of the logos are labeled. The retrieval task is
to use each labeled logo and retrieve all the other ones
with the same label. We use the same protocol for
the generation of the index and evaluation of the re-
trieval performance as in (Sivic and Zisserman, 2003).
Similarly to the evaluation of scene classification all
filter settings were constant for all images and were
chosen prior to running the experiments. In Tab. 2
the summarized mean-average-precision (mAP) val-
ues are listed separately for the two interest point de-
tectors (Harris-Laplace and Hessian-Laplace) used in
the experiment. For each query image an AP value
(Everingham et al., 2010) is generated which is then
averaged (mAP value) across all queries belonging to
the same logo label. We further average these mAP
values over all logo labels to generate a single score
for each filter. We can observe that gradient suppres-
sion filters, in particular median and wls, improve the
retrieval by up to 8%. The performance gain depends
on the type of interest point detector, but the general
tendency is the same. It is important to note, that
the overall performance of mAP ≈ 45% is not very
high compared to systems with geometric verifica-
tion or query expansion (Philbin, 2010). However,
in this experiment we are interested in relative per-
formance differences between filtered and unaltered
images. Although the overall performance across a
collection of 30 very different logos consistently im-
proves by using wls filtering, we noticed that certain
logo types benefit more than the others. Car logos
(e.g. Porsche) which do not vary as much in their
rendering form (e.g. car logos are usually printed on
badges and not other material like T-Shirts) improve
by 58.1% (mAP for “Porsche” logo using original im-
ages is 36.2% and 94.3% using wls filtering).
5 CONCLUSIONS
The results from the evaluation indicate that image fil-
tering significantly improves the matching and clas-
sification performance. Furthermore the amount of
improvement and the type of best performing filter
PerformanceEvaluationofImageFilteringforClassificationandRetrieval
489