3.2 The Canonical Motions
The central step in our algorithm is the approxima-
tion of the optical flow field by a combination of three
canonical camera motion flows p, t and z. These
fields are the distinctive optical flows that ideally
would result from a static scene being images which
represent pan, tilt, or zoom motion, with a specific
speed.
The canonical pan flow p, by definition, has
p(u) = (1, 0) at every point u of the image’s do-
main. Similarly, the canonical tilt flow t and zoom
flow z are defined, respectivelly, as t(u) = (0,1) and
z(u) = 2u for all u (see figure 2). We use an im-
age coordinate system whose origin is at the center
of each frame, with the x axis pointing left and y
pointing up. The unit of measurement is such that the
image domain D is the rectangle [−0.500, +0.500] ×
[−0.375,+0.375].
Figure 2: The canonical camera motion flows: pan p(u), tilt
t(u), zoom z(u), sampled at a 7×5 grid of points.
The canonical pan flow corresponds to a rotation
of the camera around the local vertical axis that causes
the aim point to sweep horizontally from right to left,
just fast enough to completely replace the field of
view from one frame to the other. Assuming an an-
gular field of view fairly small.
3.3 Analyzing the Optical Flow
The next step in our algorithm is to approximate the
optical flow f between the two given frames by a lin-
ear combination
˜
f of the canonical flows, namely
˜
f(u) = P∗p(u) + T ∗t(u) + Z ∗z(u) (2)
for every u ∈ D.
The coefficients P, T and Z, to be determined, will
indicate the amount of pan, tilt and zoom, respec-
tively, that seem to have occurred between two con-
secutive frames. Note that a negative value for a coef-
ficient means that the apparent motion is opposite to
the corresponding canonical movement (that is, a pan
to the right, a tilt-up, or a zoom-out, respectively).
We compute the coefficients P, T,Z by a straight-
forward weighted least squares procedure. For that
purpose, we define the scalar product of two flows a
and b, with a weight function w, as
h
a|b
i
=
D
w(u)a(u)b(u)du
D
w(u)
(3)
The discrete version of this formula, assuming that
the images are sampled at points u
1
,u
2
,.. .,u
n
is
hh
a|b
ii
=
∑
n
i=1
w
i
a
i
b
i
∑
n
i=1
w
i
(4)
As usual, we also define the norm of a (sampled) flow
f as
k
f
k
=
p
hh
f|f
ii
. Formulas (3) and (4) obviously
satisfy the definitions of scalar product and norm, as
long as the weights w
i
are all positive.
We seek P, T and Z that minimize the discrep-
ancy between the given flow f and the ideal flow
˜
f of
equation ( 2). The discrepancy is the flow d = f −
˜
f,
and its overall magnitude can be measured by the
square error Q(P, T, Z) =
k
d
k
2
=
f −
˜
f|f −
˜
f
. As
in standard least-squares fitting, the values of P,T,Z
that minimize Q are found by solving the system of
linear equations
hh
p|p
ii hh
p|t
ii hh
p|z
ii
hh
t|p
ii hh
t|t
ii hh
t|z
ii
hh
z|p
ii hh
z|t
ii hh
z|z
ii
P
T
Z
=
hh
f|p
ii
hh
f|t
ii
hh
f|z
ii
(5)
3.4 Weight Adjustment for Vectors
The least-squares method (5) works fine if the scene
is stationary. Moving objects change the optical flow,
and therefore introduce errors in the fitted parame-
ters P, T, and Z, as the least-squares procedure yields
some average of the two flows. This is not a signifi-
cant problem if the moving objects cover a small frac-
tion of the image and/or their speed is small compared
to the camera motion flow. However, if the scene con-
tains fast moving objects, their flow may easily dom-
inate the fitted flow
˜
f.
In order to alleviate this problem, we define the
weights w
i
as being the reliability weights ω
i
pro-
vided by the optical tracking procedure, divided by
the length of the corresponding flow vectors f
i
, that is
w
i
=
ω
i
p
|f
i
|
2
+ ε
2
(6)
where ε is a small constant bias, introduced to avoid
division by zero or very small numbers.
Note that this formula increases the relative
weight of small flow vectors, while reducing that of
large vectors. The justification for this correction is
that small flow vectors are indeed more significant,
statistically, than large ones. If the sampled optical
flow f contains a significant number of very small
vectors mixed with some large ones, the explanation
is that the camera is stationary, and the set K of points
with small vectors is part of the background.