tering of descriptors should be jointly based on two
criteria: distribution density in feature space; distri-
bution across images. We utilize co-clustering which
optimally clusters on the joint distribution of these cri-
teria to discover topics from words.Some of the earli-
est work on co-clustering can be found in (Hartigan,
1972). It was further developed in the field of bio-
informatics for gene sequence clustering described in
(Cheng and Church, 2000). It was introduced as a
tool for data mining in (Dhillon et al., 2003) and in
(Dhillon, 2001). It was utilized in a computer vision
application in (Liu and Shah, 2007).
Our main contributions to this work are:
• We propose a novel approach to discovering vi-
sual topics from visual words using information
theoretic co-clustering.
• We thoroughly evaluate our approach against the
standard BoW approach and show consistent and
statistically significant improvement in classifica-
tion performance.
• We analyze the relevance of codebook size to dis-
cover an optimal size for the datasets considered.
• We explored the relation between our approach
and different types of visual categories to discover
special applicability to a specific type of category.
Figure 2: Visual Categorization System: data acquisition,
feature extraction, clustering, learning a word codebook,
co-clustering, learning a topic codebook, and classification.
Blue box has modules of the BoW approach, red box has
modules of our approach using co-clustering.
2 APPROACH
We first state the motivation for our novel approach
and then describe the system and the algorithm. We
have formulated our approach based on these insights:
• BoW is based entirely on feature space descrip-
tor density distribution to build a codebook and
ignores the distribution density of a codebook el-
ement across images.
• Unlike a visual word (particular instance of an cat-
egory part) a visual topic will almost always occur
in a positive sample of the category.
• Feature vectors sourced from a category part are
not completely scattered but exist in cliques in
feature space.
• Visual words from the same category have similar
occurrence distribution statistics across images.
Combining these insights, an algorithm that si-
multaneously considers: distribution in feature space;
distribution across images, when clustering feature
vectors should provide a codebook of topics we in-
tend to build. We arrange these two distributions as
a image-word data matrix, where the rows are im-
ages and columns are words. Co-clustering optimally
and simultaneously clusters the rows and column of
the data matrix. Therefore we employ it to discover
topics. The modules of a typical visual categoriza-
tion system is shown in figure 2. The modules in the
blue box are used in the traditional approach while
the red box contains modules implementing our ap-
proach. The remaining modules are common to both
approaches. This allows us an effective comparison
for both approaches. The stpng in our algorithm are:
1. Over-partition feature space: build a huge set of
words using an unsupervised clustering technique
based purely on descriptor density distribution in
feature space.
2. Compute the occurrence histogram of codebook
elements. In other words, compute their order-
less distribution in the image, for all images in the
dataset.
3. Analyze the joint distribution of images and
words. We utilize information theoretic co-
clustering to optimally cluster both images and
words. This translates to creation of blocks in the
image-word data matrix. The blocks tell us which
words are clustered together.
4. Combine the clustered words into topics and cre-
ate the topic codebook.
3 LEARNING CODEBOOK
In this section we discuss how a codebook is learned
by the BoW approach and by our approach.
3.1 Codebook-of-Words
The mathematical formulation of the BoW model: It
encodes visual data in high-dimensional space Ξ ⊆
R
d
by a set of code-words Q = {ψ
1
, ψ
2
, . . . , ψ
N
}.
These code-words are also called visual-words or
simply words, φ
i
[i]
N
1
∈ Ξ. The visual data is a set
of feature descriptor vectors {υ
1
, υ
2
, . . . , υ
M
}, where
UNITY IN DIVERSITY: DISCOVERING TOPICS FROM WORDS - Information Theoretic Co-clustering for Visual
Categorization
629