the descriptor-based category, the rough scheme is to
first determine discriminative “high level” features,
extract from these feature points surrounding discrim-
inative descriptors, and to establish correspondence
between model and search image by classifying the
descriptors. The big advantage of this scheme is that
the runtime of the algorithm is independent of the de-
gree of the geometric search space. Recent prominent
examples, which fall into this category, are (Belongie
et al., 2002; Lowe, 2004; Berg et al., 2005; Pilet et al.,
2005; Bay et al., 2006). While showing outstanding
performance in several scenarios, they fail if the ob-
ject has only highly repetitive texture or only sparse
edge information. The feature descriptors overlap in
the feature space and are not discriminating anymore.
In the template matching category, we subsume algo-
rithms that perform an explicit search. Here, a simi-
larity measure that is either based on intensities (like
SAD, SSD, NCC and mutual information) or gradi-
ent features is evaluated. Using intensities is popular
in optical flow estimation and medical image registra-
tion, where a rough overlap of source and target image
is assumed (Horn and Schunck, 1981; Modersitzki,
2004). However, the evaluation of intensity-based
metrics is computationally expensive. Additionally,
they are typically not invariant against nonlinear illu-
mination changes, clutter, or occlusion.
For the case of feature-based template matching, only
a sparse set of features between template and search
image is compared. While extremely fast and ro-
bust if the object undergoes only rigid transforma-
tions, these methods become intractable for a large
number of degrees of freedom, e.g. when an ob-
ject is allowed to deform perspectively or arbitrar-
ily. Nevertheless, one approach for feature-based de-
formable template matching is presented in (Gavrila
and Philomin, 1999), where the final template is cho-
sen from a learning set while the match metric is eval-
uated. Because obtaining a learning set and applying
a learning step is problematic for many scenarios, we
prefer to not rely on training data except for the origi-
nal template. Another approach is to use a template
like (Felzenszwalb, 2003) or (Zhang et al., 2004).
Here an adapting triangulated polygon model is rep-
resenting the outer contour. Unlike this representa-
tion, our model is a set of edge points allowing us to
express arbitrarily shaped objects e.g. curved or com-
posite objects. In (Jain et al., 1996) and (Gonzales-
Linares et al., 2003) a deformable template model is
adapted while tracking object hypotheses down the
image pyramid. Here, for each match candidate a
global deformation field represented by trigonometric
basis functions is optimized. Unfortunately, this rep-
resentation of the deformations is global, so that small
adaptations in one patch of the model propagate to all
areas, even where the object remains rigid. In contrast
to this, we preserve local neighborhood, and therefore
do not encounter this problem. However, we note that
these works are the closest approaches to ours and in-
spired us in several ways.
1.2 Main Contributions
This paper makes the following contributions: The
first contribution is a deformable match metric that al-
lows for local deformations, while preserving robust-
ness to illumination changes, partial occlusion and
clutter. While we found a match metric with normal-
ized directed edge points in (Olson and Huttenlocher,
1997; Steger, 2002) for rigid object detection, and
also for articulated object detection in (Ulrich et al.,
2002), its adaptation to deformable object detection
is new.
The second contribution is an efficient deformation
model, allowing a dense unwarping, even though the
template contains only a sparse set of points. There-
fore, we first propagate the deformation into regions
between the points and then back-propagate these de-
formations into the original model. Hence, we ob-
tain a reprojected smooth displacement field from
the original deformation. The proposed forward-
backward harmonic inpainting does not have the
problems of folding typically encountered with the
popular thin-plate splines (TPS) (Bookstein, 1989).
Additionally, the manipulation of our model only de-
pends on the size of the enclosing rectangle, but not
on the number of model points. To the best of our
knowledge these appealing properties have not yet
been exploited in the field of deformable object de-
tection.
2 DEFORMABLE SHAPE-BASED
MATCHING
In the following, we detail the deformable shape-
based model generation and matching algorithm. The
problem that this algorithm solves is particularly dif-
ficult, as in contrast to optical flow, tracking, or med-
ical registration, we assume neither temporal nor lo-
cal coherence. While the location of deformable ob-
jects is determined with the robustness of a template
matching method, we avoid the necessity of expand-
ing the full search space as if it was a descriptor-based
method.
VISAPP 2008 - International Conference on Computer Vision Theory and Applications
76