OBJECT DETECTION USING PICTORIAL STRUCTURE OF
GABOR TEMPLATE
Babak Saleh and Mohammad Rastegari
Computer Vision Group, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran
Keywords:
Gabor wavelet, Deformable model, Spanning tree, Dynamic programming, Pictorial structure.
Abstract:
Object detection methods are divided into two main branches: In the global approach one extracts low level
features and uses machine learning techniques. In the part-based approach one uses deformable templates. We
present a Hybrid approach for constructing a deformable template for modeling and detection. Initially one
applies Gabor wavelet filters to extract low level features and constructs graphs which resemble shock graphs.
A minimum spanning tree (MST) is extracted and is called the pictorial graph. It is used for matching. The
pictorial graph is suitable for preserving the visual appearance of the shape of the object and for accommo-
dating shape variances. In this hybrid approach we maintain the generality of the global and the efficiency of
part-based approaches. Our algorithm has been applied to a set of test cases and the result shows improved
performance as compared to standard object detection methods that do not rely on human intervention.
1 INTRODUCTION
There are two approaches to object detection namely
the global and the part-based ones. In the Global ap-
proach to object detection features are extracted from
raw images and machine learning is used to make
models for particular objects (Zhang et al., 2008; Wu
et al., 2009; Amira and Farrell, 2005). In the part-
based approach (Ramanan and Sminchisescu, 2006;
Ioffe and Forsyth, 2001; Kohandani et al., 2006;
E. Yen, 2005; Lowe, 1991) an object is represented as
a set of its parts, and it is represented as a graph with
various parts connected by the edges of the graph.
In the global approach gabor wavelets are applied
globally to extract a template from a class of objects
(e.g., bicycles, cars etc.) For object detection the tem-
plates are applied to the given image. A propitious
feature of the global approach is that it lends itself
to learning algorithms well. A difficulty with this
approach is that it is very rigid and cannot accom-
modate shape variances. The part-based approach
improves the performance and alleviates the rigidity
problem of the global method but the templates had
to be designed separately and rely strongly on human
intervention. Computational speed is another issue in
the global approach to object detection. The graph
models are usually simple trees in the part-based ap-
proach and economize significantly on the compu-
tational time as compared to the global one. Dy-
namic programming, belief propagation algorithms
can be applied to accelerate the computation in the
part-based method (Kohandani et al., 2006).
In our hybrid approach, we initially apply gabor
wavelets to extract templates as in the global ap-
proach. A key new feature is that we modify the
template by first assigning a graph to it and then ex-
tracting a minimum spanning tree (MST) from the
graph. With the MST we can apply a method sim-
ilar to that of the part-based approach for the detec-
tion of an object in a given image. It should be noted
that the tree structure that was extracted from the tem-
plate allows for the incorporation of shape variances
in a given class and ameliorates the corresponding
problem which one normally encounters in the global
method. Our hybrid approach is computationally ef-
ficient since we are essentially dealing with a tree
structure as in the part-based method. Furthermore
the learning processes in the proposed algorithm do
not suffer from the shortcomings of the part-based
method and are easily implementable as in the global
approach.
In Section 2 the template extraction method that
we used is described and is based on (Wu et al., 2009).
The extraction of the MST corresponding to a given
class is given in Section 3. The object identification
process is described in Section 4 and the empirical re-
sults are presented in Section 5.
396
Saleh B. and Rastegari M. (2010).
OBJECT DETECTION USING PICTORIAL STRUCTURE OF GABOR TEMPLATE.
In Proceedings of the International Conference on Computer Vision Theory and Applications, pages 396-400
DOI: 10.5220/0002834203960400
Copyright
c
SciTePress
2 TEMPLATE EXTRACTION
We extract a generative template consisting of small
components. These small elements are special type of
wavelets. Mathematically a gabor function is one of
the form:
G(x,y) e
{−[(
x
σ
x
)
2
+(
y
σ
y
)
2
]}
e
ix
(1)
The general form of gabor wavelets are obtained
by the translation, rotation and dilation of G; In fact
representing a rotation and translation in the form:
˜x = (x
0
x)cosα (y
0
y)sinα,
˜y = (x
0
x)sinα (y
0
y)cosα.
Then the general gabor wavelet with (x,y) as the cen-
tral position, s the scale parameter and orientation α
is given by:
B
(x,y,s,α)
=
G( ˜x/s, ˜y/s)
s
2
In Figure 1 an artificial image and its gabor transform
are shown.
(a) (b)
Figure 1: An artificial image and its gabor transform. (a)An
artificial input image. (b)Output field, resulted by applying
gabor function.
For template extraction we first fix a category con-
sisting of one class of objects, e.g. passenger cars.
The samples have the same parameters such as pose,
size, orientation, etc. Then we apply gabor filters in
every possible combinations of location and orienta-
tion with a fixed scale on the training image set. The
output is a set of coefficients which are computed in
the familiar way as an inner product. A template is
selected by a voting algorithm, more precisely known
as the Shred Sketch Algorithm, applied to the outputs
of gabor filters as in (Wu et al., 2009). A selected
output is referred to as a component of a generative
template.
3 EXTRACTING MST
STRUCTURE
The template graph is a weighted complete graph with
nodes (or vertices) corresponding to the wavelet coef-
ficients. The weight assigned to an edge is the ”Eu-
clidean distance” between the centers of the wavelets
whose coefficients correspond to the vertices of the
edge. This is a very complex graph we extract a mini-
mum spanning tree (MST) by which we mean a span-
ning tree such that the sum of the weights is a mini-
mum. In general specifying an MST has O(n
2
) com-
plexity. To reduce the amount of calculation we con-
sider the Delaunay triangulation of the plane with the
given nodes. This is no longer a complete graph and
we retain the weights for the surviving edges. By a
standard method (Kruskal, 1956) we extract an MST
from this graph as shown Figure 2. The complexity
of this operation is O(n logn).
Having modified the active template structure to
that of an MST we tested the algorithm on several
shape categories to ensure that it provides a faithful
representation of the class of objects in the category.
Five examples are shown in Figure 2. Notice that the
MST’s are “good” representations of the objects.
4 MATCHING ALGORITHM
In this section we are going to explain how we can
find the best matching of the extracted template (or
MST) to an object in a given image. In order to
find this best matching we will modify a well-known
method (P.F. Felzenszwalb, 2000) for the efficient
matching of pictorial structures. A pictorial structure
(Fischler and Elschlager, 1986) is a template given
by a number of parts with connections between them
(Figure 3(a)). This connection is elastic like a spring
and shrinks or dilates according to the movements of
the parts (Figure 3(b)). This form of connection gives
the parts the capability to change their location with
respect to each other in a deformation in a smooth
manner.
To quantify the “goodness” of a match we intro-
duce several quantities. Let l
i
be the 4-vector describ-
ing the location, scaling and orientation of a wavelet
coefficient. Given an image I we want to define a cost
function (Global Matching Function) for a given pic-
torial structure which is a sum of both the cost for
fitting this pictorial structure to an object in the im-
age and a deformation cost with respect to parts’ po-
sitions, their distances, rotations, and scales (Felzen-
szwalb and Huttenlocher, 2005). In other words the
cost function measures both how well each element
fits the image data and how well the relative positions
of the elements agree with the model. Mathematically
the global matching function or cost on an image I is
expressed as
OBJECT DETECTION USING PICTORIAL STRUCTURE OF GABOR TEMPLATE
397
Figure 2: Extracted Templates and the corresponding MST. The first row shows input templates consisting of gabor wavelets.
The second row is the result of applying MST on these templates.
(a) (b)
Figure 3: (a) Pictorial structure of Human body, different
parts and their connection are illustrated this picture de-
signed by (Borgefors, 1984) (b) Spring connection in a Pic-
torial Structure. It shows that in pictorial structure of a face
different parts can slightly change their positions. This pic-
ture designed by (Felzenszwalb and Huttenlocher, 2005).
L(I) = min
(v
i
,v
j
)E
d
i j
(l
i
,l
j
) +
v
i
V
m
i
(I, l
i
)
, (2)
where E is the edge set of the MST, d
i j
(l
i
,l
j
) is the
Euclidean distance between l
i
s, m
i
(I, l
i
) represents
the distance between the wavelet l
i
and the wavelet
transform of the corresponding point in the image I,
and the min is taken over all possible matching.
The naive exhaustive search computation of L is
very inefficient and its complexity is of the order
O(h
4
), where h is the number of possible location. In
(P.F. Felzenszwalb, 2000) the authors showed that if
the given pictorial structure is a tree, then one can ap-
ply a dynamic programming algorithm which reduces
the complexity to as low as O(h). The tree structure
helps represent the deformation cost of each element
as a function of its children’s costs. This method re-
sults in a recursive algorithm for the minimization
problem as follow: For every leaf node of this tree
it is sufficient to find the best location which has the
minimum matching cost. For other nodes one finds
the best place constrained by the children’s locations
which have been determined in the preceding itera-
tion. This recursive algorithm is summarized by the
equation 3
B
j
(l
i
) = min
l
j
d
i j
(l
i
,l
j
) + m
j
(I, l
j
) +
v
c
C
j
B
c
(l
j
)
(3)
where C
j
is the set of children of j, B
c
(l
j
) is the
best location for the children of j (represented by
the v
j
), which B
j
(l
i
) is the best location of the j
th
part given its parent location l
i
. This recursive algo-
rithm reduces the complexity to O(h
2
). To achieve
a lower complexity we make use of the generalized
distance transform (GDT) (P.F. Felzenszwalb, 2004)
on the tree. GDT is a weighted version of usual dis-
tance transform (Borgefors, 1984; Borgefors, 1986).
We used expanded GDT in four dimensions: (x, y)
-components of the center of the gabor, s-scale pa-
rameter and r-orientation. Using the naive approach
finding the best configuration has complexity O(h
n
),
and the dynamic programming approach (A. Amini,
1990) reduces it to O(h
2
n), where h is the number of
possible locations for each part and n is the number of
parts. In our application h is large (and n is relatively
small) which makes the computation infeasible. But
state of the art generalized distance transform reduces
the complexity to O(hn) or O(h) since n is a small
number independent of h.
5 RESULTS
In this section we present some experimental results
of our approach and a comparison is made with the
Active Basis methods (Wu et al., 2009). We imple-
mented our method with MATLAB on a PC with 1.8
GHz CPU and 512 MB RAM and selected 60 gabor
VISAPP 2010 - International Conference on Computer Vision Theory and Applications
398
Figure 4: Qualitative results. The first and fourth rows show input test images for different categories, taken from (Wu et al.,
2009), the second and fifth rows are the results of the proposed method and the third and sixth rows show the results attained
by (Wu et al., 2009).
wavelets for the template. We used gabor bases at
8 orientations and 10 scales. Our experiments were
carried out on the database provided by (Wu et al.,
2009). Figure 4 shows a qualitative comparison where
the results of four test cases obtained by the pro-
posed and the Active Basis (Wu et al., 2009) methods
are shown. As one can see our method (the second
and fifth rows) has qualitatively superior performance
relative to the Active Basis one (the third and sixth
rows). Figure 5 demonstrates a quantitative compar-
ison. The RoC curves of the two test cases for three
methods are shown. The RoC curve based on the pro-
posed method showed significant improvement rela-
tive to Active Basis and the Texture Mask methods
(Wu et al., 2008). We did not compare with the part-
based method because the latter approach requires hu-
man intervention.
6 CONCLUSIONS
A new method for object detection was presented.
The proposed algorithm is a hybrid method that com-
bines a generative template similar to the global ap-
proach to object detection and a fast algorithm for fit-
ting this template into images in order to achieve ob-
ject detection. This matching algorithm is based on
dynamic programming algorithm.
The novelty and advantages of the proposed
method are
1. It has the flexibility of accommodating visual
variances as in the part-based approach.
2. It lends itself easily to learning algorithms as in
the global approach.
3. It does not require human intervention as in the
part-based approach.
4. Its computational complexity is low as in the part-
based approach.
ACKNOWLEDGEMENTS
The authors would like to thank Prof. Mehrdad Mir-
shams Shahshahani for his great support, valuable
discussions and supervision.
OBJECT DETECTION USING PICTORIAL STRUCTURE OF GABOR TEMPLATE
399
(a)
(b)
Figure 5: (a) RoC curves for Car samples, in the front
viewpoint,(b) RoC curves correspondent to Motorbike sam-
ples.
REFERENCES
A. Amini, T. Weymouth, R. J. (1990). Using dynamic pro-
gramming for solving variational problems in vision.
In Vol. 12, No. 9, pp. 855-867. PAMI.
Amira, A. and Farrell, P. (2005). An automatic face recog-
nition system based on wavelet transforms. In ISCAS
(6), pages 6252–6255.
Borgefors, G. (1984). Distance transformations in arbitrary
dimensions. In Vol. 27, No. 3, pp. 321-345. CVGIP.
Borgefors, G. (1986). Distance transformations in digital
images. In Vol. 34, No. 3, pp. 344-371. CVGIP.
E. Yen, A. S. (2005). Image recognition via deformable
template. statistical methodology. In pp. 213-225. Sta-
tistical Methodology.
Felzenszwalb, P. F. and Huttenlocher, D. P. (2005). Pic-
torial structures for object recognition. International
Journal of Computer Vision, 61(1):55–79.
Fischler, M. and Elschlager, R. (1986). The representation
and matching of pictorial structures. In Vol. 22, No. 1,
pp. 67-92. IEEE Trans. On Computers.
Ioffe, S. and Forsyth, D. A. (2001). Mixtures of trees for
object recognition. In CVPR (2), pages 180–185.
Kohandani, A., Basir, O. A., and Kamel, M. S. (2006). A
fast algorithm for template matching. In ICIAR (2),
pages 398–409.
Kruskal, J. (1956). On the shortest spanning subtree of a
graph and the traveling salesman problem. In Pro-
ceedings of the American Mathematical Society.
Lowe, D. G. (1991). Fitting parameterized three-
dimensional models to images. IEEE Trans. Pattern
Anal. Mach. Intell., 13(5):441–450.
P.F. Felzenszwalb, D. H. (2000). Efficient matching of pic-
torial structures. In IEEE Conference on Computer
Vision and Pattern Recognition.
P.F. Felzenszwalb, D. H. (2004). Distance transforms of
sampled functions. In Cornell Computing and Infor-
mation Science Technical Report TR2004-1963.
Ramanan, D. and Sminchisescu, C. (2006). Computer Vi-
sion and Pattern Recognition, 2006 IEEE Computer
Society Conference on, volume 1.
Wu, Y., Si, Z., Gong, H., and Zhu, S. (2009). Learning ac-
tive basis model for object detection and recognition.
In IJCV.
Wu, Y. N., Si, Z., Gong, H., and Zhu, S.-C. (2008). Active
basis model, shared sketch algorithm, and sum-max
maps. IJCV.
Zhang, L., Gong, H., Wu, T., and Dong, J. (2008).
Deformable template combining alignable and non-
alignable sketches. In ICPR, pages 1–4.
VISAPP 2010 - International Conference on Computer Vision Theory and Applications
400