be precomputed (Hager and Belhumeur, 1998), (Del-
laert and Collins, 1999). One may also get a rid
of the inverse Hessian computation by switching the
roles of the template and image (Baker and Matthews,
2004). Nevertheless we always need to compute the
image gradients and in general case also the Jacobian
and inverse Hessian of the warp. An alternative for
template tracking are regression-based methods (Ju-
rie and Dhome, 2002), (Zimmermann et al., 2009a).
They avoid the computation of image gradient, Ja-
cobian and inverse Hessian by learning a regression
matrix from training examples. Once learned they es-
timate the tracking parameters directly from the im-
age intensities. If the regression function is linear, it
is called linear predictor. The training phase is the
biggest disadvantage of linear predictors, because the
tracking cannot start immediately. Nevertheless, the
regression matrix (function) may be estimated only
from one image in a short time (few seconds). The
training examples are generated by random warpings
of the object template and collecting image intensi-
ties. This regression matrix may be updated by ad-
ditional training examples during tracking (Hinter-
stoisser et al., 2008).
Recently, it has been shown (
¨
Ozuysal et al.,
2010), (Hinterstoisser et al., 2008), that taking advan-
tage of the learning phase, greatly improves the track-
ing speed and makes the tracker more robust with re-
spect to large perspective deformations. A learned
tracker is able to run with fragment of processing
power and estimates object position in complicated
or not yet seen poses. However, once the tracker gets
lost it may not recover the object position.
To fulfill the real-time requirements, we propose a
combination of a robust detector and a very efficient
tracker. Both, the detector and the tracker, are trained
from image data. The tracker gets updated during the
tracking. The tracker performance is extremely fast
and as a result of that, faster than real-time tracking
allows for multiple object tracking.
1.1 Related Work
We use a similar approach to (Hinterstoisser et al.,
2008), who also use a fern object detector and a linear
predictor with incremental learning for homography
estimation. The detector is used for object localiza-
tion and also for a rough estimation of patch transfor-
mation. The initial transformation is further refined
by the linear predictor, which predicts full 2D homog-
raphy. The precision of the method is validated by
inverse warping of the object patch and correlation-
based verification with the initial patch. The detec-
tor is run in every frame of the sequence of 0.3 Mpx
images processing 10 frames per second (fps). This
approach however, would not be able to perform in
real-time on 12 Mpx images. We use the fern de-
tector to determine tentative correspondences and we
run RANSAC on detected points to estimate the affine
transformation. After a positive detection we apply
the learned predictor in order to track the object for as
many frames as possible. (Hinterstoisser et al., 2008)
use an iterative version of linear predictor similar to
the one proposed by (Jurie and Dhome, 2002), while
we use SLLiP version. The SLLiP proved (Zimmer-
mann et al., 2009a) to be faster than the iterative ver-
sion, while keeping the high precision of the esti-
mation. Our tracker is incrementally updated during
tracking (Hinterstoisser et al., 2008; Li et al., 2010).
We validate the tracking by the updated tracker it-
self (see Section 2.2), which is more precise, than
correlation-based verification by a single template in
case of varying object appearance.
Recently (Holzer et al., 2010) used adaptive linear
predictors for real-time tracking. Adaptation is done
by growth or reduction of the tracked patch during
tracking and update of the regression matrices. How-
ever, this approach is not suitable for our task, because
of the need to keep in memory the large matrix with
training examples, which is needed for computation
of the template reduction and growth. This training
matrix grows with additional training examples col-
lected for on-line learning, which is undesirable for
long-term tracking.
(Li et al., 2010) use linear predictors in the form
of locally weighted projection regressors (LWPR) as
a part of self-tuning particle filtering (ISPF) frame-
work. They approximate a non-linear regression by a
piece-wise linear models. In comparison we use a se-
quence of learnable linear predictors (SLLiP) similar
to (Zimmermann et al., 2009b), which uses the result
of previous predictors in sequence as the starting point
for another predictor in a row. In (Li et al., 2010) the
partial least-squares is used for data dimension reduc-
tion. We use a subset of template pixels spread over
the object in regular grid, which proved to be suffi-
cient for dimensionality reduction, while keeping the
high precision and robustness of tracking.
The rest of this paper is organized as follows. In
Section 2 you find the formal descriptions of used
ferns detector and sequential predictor tracker and in
Section 2.3 the outline of our algorithm. In Section 3
we present the general evaluation of our algorithm.
A detailed evaluation of the detector and tracker are
given in Sections 3.1 and 3.2. In the last two sections
we discuss the computational times of the algorithm
and conclude the paper.
VISAPP 2011 - International Conference on Computer Vision Theory and Applications
522