0.42
0.44
0.46
0.48
0.5
0.52
0.54
0.56
0.58
Mean Average Precision (MAP)
α value
Figure 5: Contribution of words and phrases.
words in the similarity matching (α = 0) are taken
into consideration. However, the combination of both
yields better results than using words or phrases sep-
arately.
The explanation is that there are some images,
which are not texture-rich like human face, stop sign
or umbrella pictures, which leads to detect a small
number of interest points. From this study, we con-
clude that visual phrase alone can not capture all
the similarity information between images, the visual
word similarity is still required.
5 CONCLUSIONS
A new spatial weighting technique has been devel-
oped which enhances the basic bag-of-visual-words
approach by using spatial relations. We also devised
methods to construct visual phrases based on the as-
sociation rule technique. Our experimental studies
showed that a combined use of words and phrases
could perform better than using them separately. It
also showed good performance when compared to
similar recent approaches.
In our future work, we will perform more stud-
ies about the interrelationship between different vi-
sual words in order to further investigate the higher
representation level. This will improve the discrimi-
nation power of the visual words.
REFERENCES
Agrawal, R., Imielinski, T., and Swami, A. N. (1993). Pro-
ceedings of the 1993 acm sigmod international confer-
ence on management of data, washington, d.c., may
26-28, 1993. In Buneman, P. and Jajodia, S., edi-
tors, Mining Association Rules between Sets of Items
in Large Databases. ACM Press.
Baeza-Yates, R. A. and Ribeiro-Neto, B. A. (1999). Modern
Information Retrieval. ACM Press / Addison-Wesley.
Bay, H., Ess, A., Tuytelaars, T., and Gool, L. J. V. (2008).
Speeded-up robust features (surf). Computer Vision
and Image Understanding, 110(3):346–359.
Belongie, S., Carson, C., Greenspan, H., and Malik, J.
(1998). Color- and texture-based image segmentation
using the expectation-maximization algorithm and its
application to content-based image retrieval. In ICCV,
pages 675–682.
Belongie, S., Malik, J., and Puzicha, J. (2002). Shape
matching and object recognition using shape contexts.
IEEE Trans. Pattern Anal. Mach. Intell., 24(4):509–
522.
Bilmes, J. A. (1997). A gentle tutorial on the em algorithm
and its application to parameter estimation for gaus-
sian mixture and hidden markov models.
Canny, J. (1986). A computational approach to edge de-
tection. IEEE Trans. Pattern Anal. Mach. Intell.,
8(6):679–698.
Chen, X., Hu, X., and Shen, X. (2009). Spatial weighting
for bag-of-visual-words and its application in content-
based image retrieval. In PAKDD ’09, pages 867–874.
Fei-Fei, L., Fergus, R., and Perona, P. (2007). Learning gen-
erative visual models from few training examples: An
incremental bayesian approach tested on 101 object
categories. Comput. Vis. Image Underst., 106(1):59–
70.
Jurie, F. and Triggs, B. (2005). Creating efficient codebooks
for visual recognition. In ICCV, pages 604–610.
Lowe, D. G. (2004). Distinctive image features from scale-
invariant keypoints. International Journal of Com-
puter Vision, 60(2):91–110.
Martinet, J. and Satoh, S. (2007). A study of intra-modal
association rules for visual modality representation. In
CBMI ’07.
Salton, G., Wong, A., and Yang, C. S. (1975). A vector
space model for automatic indexing. Commun. ACM,
18(11):613–620.
Sivic, J. and Zisserman, A. (2003). Video google: A text
retrieval approach to object matching in videos. In
ICCV, pages 1470–1477. IEEE Computer Society.
Viola, P. and Jones, M. (2001). Rapid object detection using
a boosted cascade of simple features. In CVPR 2001,
volume 1, pages I–511–I–518 vol.1.
Willamowski, J., Arregui, D., Csurka, G., Dance, C. R., and
Fan, L. (2004). Categorizing nine visual classes using
local appearance descriptors. In In ICPR Workshop on
Learning for Adaptable Visual Systems.
USING ASSOCIATION RULES AND SPATIAL WEIGHTING FOR AN EFFECTIVE CONTENT BASED-IMAGE
RETRIEVAL
117