parameter (translation) fails to correspond to the lo-
cation of the face (delineated by the the black dot in
the figure), see fig. (1.b). Second, many local minima
may be found. Even if a gradient descent algorithm
begins close to the correct solution, the occurrence of
local minima is likely to divert convergence from the
desired solution.
The aim of this paper is to explore the use of a new
technique, Filtered Component Analysis (FCA). FCA
learns a multiband representation of the image that re-
duces the number of local minima and improves gen-
eralization relative to PCA. Fig. (1.c) shows the main
goal of the paper. By building a multiband representa-
tion with FCA, we are able to locate the minimum in
the right location (black dot) and eliminate most local
minima close to the optimal one.
2 PREVIOUS WORK
This section reviews work on subspace tracking and
the role of representation in subspace analysis.
2.1 Subspace Detection and Tracking
Subspace trackers build the object’s appearance/shape
representation from the PCA of a set of training sam-
ples. Let d
i
∈ℜ
d×1
(see notation
1
) be the i sample of
a training set D ∈ ℜ
d×n
and B ∈ ℜ
d×k
the first k prin-
cipal components. B contains the directions of maxi-
mum variation of the data. The principal components
maximize max
B
∑
n
i=1
||B
T
d
i
||
2
2
= ||B
T
B||
F
, with the
constraint B
T
B = I. The columns of B form an ortho-
normal basis that spans the principal subspace. If the
effective rank of D is much less than d, we can ap-
proximate the column space of D with k << d princi-
pal components. The data d
i
can be approximated as
a linear combination of the principal components as
d
i
≈Bc
i
where c
i
= B
T
d
i
are the coefficients obtained
by projecting the training data onto the principal sub-
space.
Once the model has been learned (i.e. B is
known), tracking is achieved by finding the para-
meters a of the geometric transformation f(x, a) that
1
Bold capital letters denote a matrix D, bold lower-case
letters a column vector d. d
j
represents the j column of the
matrix D. d
i j
denotes the scalar in the row i and column
j of the matrix D and the scalar i-th element of a column
vector d
j
. All non-bold letters will represent variables of
scalar nature. ||x||
2
=
√
x
T
x designates Euclidean norm of
x. The vec(D) operator transforms D ∈ ℜ
d×n
into an dn-
dimensional vector by stacking the columns. ◦ denotes the
Hadamard or point-wise product. ⊗ denotes convolution.
1
k
∈ ℜ
k×1
is a vector of ones. I
k
∈ ℜ
k×k
is the identity.
aligns the data w.r.t. the subspace. Given an image d
i
,
subspace trackers or detectors find a and c
i
that min-
imize: min
c
i
,a
||d
i
(f(x, a)) −Bc
i
||
2
2
(or some normal-
ized error). In the case of an affine transformation,
f(x, a) =
a
1
a
2
+
a
3
a
4
a
5
a
6
x −x
c
y −y
c
where
a = (a
1
, a
2
, a
3
, a
4
, a
5
, a
6
) are the affine parameters and
x = (x
1
, y
1
, ···, x
n
, y
n
) is a vector containing the coor-
dinates of the pixels to track. If a = (a
1
, a
2
) is just
translation, the search can be done efficiently over
the whole image using the Fast Fourier Transform
(FFT). For a = (a
3
= a
6
, a
5
= a
4
), that is, for simi-
larity transformation, the search also can be done ef-
ficiently in the log-polar representation of the image
with the FFT.
2.2 Representation in Subspace
Analysis
Most work on AM uses some sort of normalized
graylevel to build the representation. However, re-
gions of graylevel values can suffer from large am-
biguities, camera noise, and changes in illumination.
More robust representation can be achieved by local
combination of pixels through filtering. Filtering of
the visual array is a key element of the primate visual
system (Rao and Ballard, 1995).
Representations for subspace recognition were ex-
plored by Bischof et al. (Bischof et al., 2004). In
the training stage, they built a subspace by filter-
ing the PCA-graylevel basis with steerable filters. In
the recognition phase, they filtered the test images
and performed robust matching, obtaining improved
recognition performance over graylevel. Cootes et. al
(Cootes and Taylor, 2001a) found that a non-linear
representation of edge structure could improve the
performance of model subspace matching and recog-
nition. De la Torre et al. (de la Torre et al., 2000)
found that subspace tracking was improved by using
a multiband representation created by filtering the im-
ages with a set of Gaussian filters and its derivatives.
Our work differs in several aspects from previous
work. First, we explicitly learn an optimal set of spa-
tial filters adapted to the object of interest rather than
using hand-picked ones. Once the filters are learned,
we build a multiband representation of the image that
has improved error surfaces with which to fit AM. We
evaluate quantitatively the properties of the error sur-
faces and show how FCA outperforms current meth-
ods in appearance based detection.
VISAPP 2007 - International Conference on Computer Vision Theory and Applications
208