BRINGING ORDER IN THE BAG OF WORDS
Shihong Zhang, Rahat Khan, Damien Muselet and Alain Tr´emeau
Universit´e de Lyon, F-42023, Saint-
´
Etienne, France
CNRS, UMR 5516, Laboratoire Hubert Curien, F-42000, Saint-
´
Etienne, France
Universit´e de Saint-
´
Etienne, Jean-Monnet, F-42000, Saint-
´
Etienne, France
Keywords:
Bag-of-words, Object Categorization, Spatial Information.
Abstract:
This paper presents a method to infuse spatial information in the bag of words (BOW) framework for object
categorization. The main idea is to account the local spatial distribution of the visual words. Rather than
finding rigid local patterns, we consider the visual words in close spatial proximity as a pouch of words and
we represent the image as a bag of word-pouches. For this purpose, sub-windows are extracted from the
images and characterized by local bags of words. Then a clustering step is applied in the local bag of words
space to construct the word-pouches. We show that this representation is complementary to the classical BOW.
Thus a concatenation of these two representations is used as the final descriptor. Experiments are conducted
on two very well known image datasets.
1 INTRODUCTION
In this paper, we deal with the problem of category-
level classification in the images. This is a chal-
lenging problem in computer vision and one of the
successful solutions is the Bag-of-Words (BOW) ap-
proach (Csurka et al., 2004), which employs the his-
togram of particular image patterns (the visual words)
in a given image. However, one major limitation of
the BOW model is that, it does not retain the spa-
tial relationship among the visual words. Different
methods have been proposed to take advantage of
the spatial distribution of visual words to improve
classification accuracy. For example, Lazebnik et
al. employed the pyramid match kernel proposed
by (Grauman and Darrell, 2005) into BOW frame-
work to account the global distribution of the visual
words among the image and achieved very high clas-
sification accuracy (Lazebnik et al., 2006). Among
local approaches, Zhang et al. (Zhang and Mayo,
2008) improved the classification performance of the
BOW model by discovering intermediate represen-
tations for each object class. Specifically, their ap-
proach includes the spatial relationships between all
the frequent and informative image keypoints in the
smaller regions of the image. A group of works in-
tents to model the co-occurrence patterns of visual
words. Among them, (Sivic et al., 2005) extended the
BOW model using spatial information in their work.
The spatial information, which they term as ”dou-
blets”, is formed from spatially neighboring word
pairs. In (Bhatti and Hanbury, 2010), Bhatti et al.
introduced the pair-wise relations between image fea-
tures. In their work, the image is represented by a con-
catenation of independentvisual words with pair-wise
visual words. Yuan et al. (Yuan et al., 2007) defined
co-occurrence pattern occurring in local proximity as
visual phrase and use this information for classifica-
tion.
Most of the local approaches only consider pairs
of visual words and we argue that we should not
restrict the number of words accounted in the local
neighborhoods. Unfortunately, increasing the number
of words considered in each neighborhoodtends to in-
crease the dimension of the final descriptor. Hence,
we propose an alternative that considers the visual
words in close spatial proximity as a pouch of words
and represents the image as a bag of word-pouches.
The originality of this approach is that it applies a
clustering step in the BOW space in order to ex-
tract the most representative pouches. Bag of word-
pouches is also an orderless representation but in-
terestingly it encodes some spatial information be-
cause each pouch is representative of a group of words
which reside close to each other in the image space.
Unlike the classical methods that introduce spatial in-
formation in the BOW, our approach accounts the spa-
tial distribution of the visual words without increasing
the dimension of the final descriptor. Furthermore our
method, detailed in next Section, is complementary
723
Zhang S., Khan R., Muselet D. and Trémeau A..
BRINGING ORDER IN THE BAG OF WORDS.
DOI: 10.5220/0003859307230726
In Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP-2012), pages 723-726
ISBN: 978-989-8565-03-7
Copyright
c
2012 SCITEPRESS (Science and Technology Publications, Lda.)