2 TEMPLATE EXTRACTION
We extract a generative template consisting of small
components. These small elements are special type of
wavelets. Mathematically a gabor function is one of
the form:
G(x,y) ∝ e
{−[(
x
σ
x
)
2
+(
y
σ
y
)
2
]}
e
ix
(1)
The general form of gabor wavelets are obtained
by the translation, rotation and dilation of G; In fact
representing a rotation and translation in the form:
˜x = (x
0
− x)cosα − (y
0
− y)sinα,
˜y = (x
0
− x)sinα − (y
0
− y)cosα.
Then the general gabor wavelet with (x,y) as the cen-
tral position, s the scale parameter and orientation α
is given by:
B
(x,y,s,α)
=
G( ˜x/s, ˜y/s)
s
2
In Figure 1 an artificial image and its gabor transform
are shown.
(a) (b)
Figure 1: An artificial image and its gabor transform. (a)An
artificial input image. (b)Output field, resulted by applying
gabor function.
For template extraction we first fix a category con-
sisting of one class of objects, e.g. passenger cars.
The samples have the same parameters such as pose,
size, orientation, etc. Then we apply gabor filters in
every possible combinations of location and orienta-
tion with a fixed scale on the training image set. The
output is a set of coefficients which are computed in
the familiar way as an inner product. A template is
selected by a voting algorithm, more precisely known
as the Shred Sketch Algorithm, applied to the outputs
of gabor filters as in (Wu et al., 2009). A selected
output is referred to as a component of a generative
template.
3 EXTRACTING MST
STRUCTURE
The template graph is a weighted complete graph with
nodes (or vertices) corresponding to the wavelet coef-
ficients. The weight assigned to an edge is the ”Eu-
clidean distance” between the centers of the wavelets
whose coefficients correspond to the vertices of the
edge. This is a very complex graph we extract a mini-
mum spanning tree (MST) by which we mean a span-
ning tree such that the sum of the weights is a mini-
mum. In general specifying an MST has O(n
2
) com-
plexity. To reduce the amount of calculation we con-
sider the Delaunay triangulation of the plane with the
given nodes. This is no longer a complete graph and
we retain the weights for the surviving edges. By a
standard method (Kruskal, 1956) we extract an MST
from this graph as shown Figure 2. The complexity
of this operation is O(n logn).
Having modified the active template structure to
that of an MST we tested the algorithm on several
shape categories to ensure that it provides a faithful
representation of the class of objects in the category.
Five examples are shown in Figure 2. Notice that the
MST’s are “good” representations of the objects.
4 MATCHING ALGORITHM
In this section we are going to explain how we can
find the best matching of the extracted template (or
MST) to an object in a given image. In order to
find this best matching we will modify a well-known
method (P.F. Felzenszwalb, 2000) for the efficient
matching of pictorial structures. A pictorial structure
(Fischler and Elschlager, 1986) is a template given
by a number of parts with connections between them
(Figure 3(a)). This connection is elastic like a spring
and shrinks or dilates according to the movements of
the parts (Figure 3(b)). This form of connection gives
the parts the capability to change their location with
respect to each other in a deformation in a smooth
manner.
To quantify the “goodness” of a match we intro-
duce several quantities. Let l
i
be the 4-vector describ-
ing the location, scaling and orientation of a wavelet
coefficient. Given an image I we want to define a cost
function (Global Matching Function) for a given pic-
torial structure which is a sum of both the cost for
fitting this pictorial structure to an object in the im-
age and a deformation cost with respect to parts’ po-
sitions, their distances, rotations, and scales (Felzen-
szwalb and Huttenlocher, 2005). In other words the
cost function measures both how well each element
fits the image data and how well the relative positions
of the elements agree with the model. Mathematically
the global matching function or cost on an image I is
expressed as
OBJECT DETECTION USING PICTORIAL STRUCTURE OF GABOR TEMPLATE
397