minimize
A,Γ
kX− ΦAΓk
2
F
subject to
∀i kΓ
i
k
0
0
≤ t
∀ j ka
j
k
0
0
≤ p, kΦa
j
k
2
= 1.
(2)
In this expression, the columns of Γ are the sparse K-
SVD representations of the corresponding columns of
the dataset X, and the function k.k
0
0
counts the non-
zero entries of a vector. Solving this problem is car-
ried out by alternating sparse-coding and dictionary
update steps for a fixed number of iterations. The
sparse-coding step can be efficiently implemented
using orthogonal matching pursuits (OMP) (Davis
et al., 1997). The sparse dictionary model strikes a
balance between complexity (via the choice of the
base dictionary Φ) and adaptability (via the training
of the sparse dictionary A). As well, training the
sparse dictionary is less time-consuming, less prone
to noise and instability, and more computationally ef-
ficient than that of explicit dictionaries. This what
makes this model appealing for pattern recognition
tasks.
2.2 Varma-Zisserman Texture Classifier
Varma and Zisserman introduced a texture classifier
based on image patch exemplars (Varma and Zisser-
man, 2009). This approach can be summarized as fol-
lows. Firstly, all images are made zero-mean and unit-
variance. Secondly, image patches of N × N window
size are taken and reordered in N
2
-dimensional fea-
ture space. Thirdly, image patches are contrast nor-
malized using Weber’s law
F(x) ← F(x)[log(1+ L(x)/0.03)]/L(x)
(3)
where L(x) = kF(x)k
2
is the magnitude of the patch
vector at that pixel x. Fourthly, all of the image
patches from the selected training images in a tex-
ture class are aggregated and clustered using the k-
means algorithm. The set of the cluster centres from
all classes form the texton dictionary. Fifthly, training
(and testing) images are modelled by the histogram of
texton frequencies. Finally, novel image classification
is achieved by nearest-neighbour matching using the
χ
2
statistic. This classifier is known as the joint clas-
sifier. One important variant of the joint classifier is
the MRF classifier which explicitly models the joint
distribution of the central pixels and their neighbours.
Refer to (Varma and Zisserman, 2009) for further de-
tails.
3 SPARSE K-SVD
TEXTON-BASED TEXTURE
CLASSIFIER
Our goal is to build a texture classification system us-
ing a combination of sparse-coding and texton tech-
niques. A cross-functional flowchart of the system is
shown in Figure 1. The flowchart consists of the fol-
lowing stages:
3.1 Parameter Setup
Several parameters need to be specified for the sparse
K-SVD algorithm. These parameters include the
block (or window) size, the base dictionary size and
model, the sparse dictionary size, the sparsity of the
sparse dictionary with respect to the base dictionary,
the sparsity of the image patch K-SVD representa-
tion with respect to the sparse dictionary, the sparse-
coding criteria (sparsity-based or error-based), the
number of iterations for estimating the sparse dictio-
nary. Refer to (Rubinstein et al., 2010) for further
information on these parameters. As well, for classi-
fication with textons, we need to specify the window
size (which must be the same as the K-SVD block
size), the size of the texton dictionary, and the num-
ber bins for the MRF variant of the Varma-Zisserman
classifier.
3.2 Building the Base Dictionary
In this work, we focus on separable base dictionaries
since theyhavecompact representation, sub-quadratic
implementation, and memory-efficient computation.
A separable dictionary is the Kronecker product of
several one-dimensional dictionaries. We consider
here two types of these dictionaries: analytic and
data-driven dictionaries. Analytic dictionaries are de-
fined using standard signal bases like the DCT, over-
complete DCT, and wavelet dictionaries. Data-driven
dictionaries are generated from the given image data
using some clustering schemes such as k-means clus-
tering, or median-based clustering. Specifically, for
each category c,1 ≤ c ≤ C where C is the number
of texture classes, we sample image blocks from all
of training images, convert the blocks into columns,
apply Weber’s law to contrast-normalize the columns
(Varma and Zisserman, 2009), and then we apply the
clustering scheme to find the class-specific base dic-
tionary Φ
c
. Then, we concatenate the class-specific
base dictionaries to get the overall base dictionary Φ
Φ = [Φ
1
|Φ
2
|...|Φ
C
].
(4)
VISAPP 2011 - International Conference on Computer Vision Theory and Applications
188