ues for all of the parameters used in the implementa-
tion are presented in the Appendix.
2 TEMPLATE MATCHING
Template matching is used to determine the location
of the target in each frame of the video sequence. The
main assumptions made with this technique are that
there are no significant occlusions of the tracked tar-
get and the target’s appearance changes gradually.
The objects in the video change appearance over
time. The initial template selected by the operator
in the first frame will become inaccurate after a pe-
riod of time. A simple approach to try and overcome
this problem is to update the template after a certain
amount of frames have been processed. The main dif-
ficulty here is to find the proper update rate. If it is too
high, the template is being updated too often, then the
error will accumulate quickly resulting in a drift off of
the object originally selected by the operator. On the
other hand when the update rate is too slow, the tar-
get will be lost due to dissimilarities between its cur-
rent appearance and the template. There were several
techniques proposed in order to determine when and
how to update the template (Matthews et al., 2004).
Here the template is simply replaced every p frames
by the best matched region. The update rate,
1
p
, is de-
termined experimentally.
Let I(x, y, t) denote a pixel intensity at location
(x, y) in the video frame of size N × M pixels at time
t, where x ∈ [0, N − 1] and y ∈ [0, M − 1]. The tem-
plate is initialized by the operator at time t = 0. Let
S(i, j, l) be a pixel intensity at location (i, j) of a tem-
plate extrated from (lp)
th
frame, I(x, y, lp), where l
and p are nonnegative integers. The size of the tem-
plate S(i, j,l) is K × K pixels, so that i, j ∈ [0, K − 1].
To eliminate any ambiguities in determining the cen-
ter of S(i, j, l), K is picked to be an odd positive inte-
ger.
Using Sum of Squared Difference (SSD) error
measure, the error between the template S(i, j, l) and
the image I(x, y,t) at point (a, b) can be written as fol-
lows:
e =
K−1
∑
i=0
K−1
∑
j=0
(S(i, j, l) − I(a−
K
2
+ i, b−
K
2
+ j,t))
2
(1)
where
K
2
is rounded down to the nearest integer.
By computing the error in equation (1) for every
pixel of the frame at time t, and finding the minimum
error, the best matched region could be determined.
This approach would result in a longer computation
time and poor localization of the target due to false
positive matches. One solution is to define a region
R
s
of size W ×W pixels, centered at the most proba-
ble location of the target. Then the search for the best
match would be carried out only within R
s
. In the im-
plementation W ranges betweenW
min
and W
max
and is
determined by how fast the target moves in the image.
3 KALMAN FILTER
In order to approximate the next location of the target
in the video a Kalman filter is employed (Welch and
Bishop, 2004). A basic Kalman filter was found to
be sufficient for this application. When the operator
initializes the template in the first frame, by select-
ing a point inside the target region, a Kalman filter
is also initialized. The point, originally selected by
the operator, will be tracked through the video. The
discrete-time state-space representation of the linear
process, the state of which is being estimated, can be
expressed as:
x
k
= Ax
k−1
+ Bu
k−1
+ w
k−1
(2)
z
k
= Hx
k
+ v
k
(3)
In equation (2) the state vector, x ∈ R
4
, contains the
position and the velocity of a 2D point being tracked.
State transition matrix, A ∈ R
4×4
, relates the previ-
ous state, x
k−1
, to the current state, x
k
, where, k, is a
nonnegative integer. Control input matrix, B ∈ R
4×2
,
provides coupling between the control input vector,
u ∈ R
2×1
, and the state. The control input is the 2D
global displacement of the image. In equation (3),
z ∈ R
2
, is the measurement vector. The measurement
is a 2D position of the target’s point. The observation
matrix, H ∈ R
2×4
, relates the state to the measure-
ment. The random variable, w, represents the process
noise and the random variable, v, represents the mea-
surement noise. They are independent, white and nor-
mally distributed:
w ∼ N(0, Q) (4)
v ∼ N(0, R), (5)
where Q ∈ R
4×4
is the process noise covariance ma-
trix and R ∈ R
2×2
is the measurement noise covari-
ance matrix. Both Q and R are assumed to be con-
stant.
A Kalman filter consists of two stages: the pre-
diction and the correction. In the prediction stage an
estimate of the current state, ˆx
k
, and estimate of the
current error covariance matrix,
ˆ
P
k
, are made:
ˆx
k
= Ax
k−1
+ Bu
k−1
(6)
ˆ
P
k
= AP
k−1
A
T
+ Q, (7)