The core of the algorithm is given by the
tracking method thought by Lucas and Kanade. The
main element of this technique is the definition of
the comparison measure between characteristic,
fixed dimension windows, in the two frames that
have to be compared, like the square of the
difference of the window intensity. Different points
in a window can have different behaviours. They can
have different intensity variation or, when the
window is on an edge, different speed or appear and
disappear. So, in the window, transformations from
an image to another happen very often. We have to
represent the situation using the affine motion field:
AX D con A
xxy
yx yy
aa
aa
δ
⎡⎤
=+ =
⎢⎥
⎢⎥
⎣⎦
(1)
where δ is in general the displacement, A is the
deformation matrix and D is the translation vector
(Shi and Tomasi, 1994).
To verify we are tracking always the same
window, we check the dissimilarity every steps; it is
defined in the following way:
[]
2
() ( )
w
IX JAX D wdX∈= − +
∫
(2)
where I(X) e J(X) are image functions of the two
frames in exam. If dissimilarity is too high the
window is discarded. We obtain as result the
displacement that optimizes this sum.
If we consider frames so temporally close to
make the shift very short, we will have a
deformation matrix very small. In this case, it can be
considered null. Mistakes on the displacement can
be due to the determination of these parameters in
these conditions. If the aim is the shift determining,
it is better to determine just the spatial components
of the movement. The result is even more simple
and fast to calculate.
The affine model is useful in features
monitoring. In the features selection mistakes can be
made; so monitoring is important to do not obtain
contradictory results.
In small shifts we consider a linearization
according to Taylor series of the image intensity; it
allows using Newton-Raphson method for
minimization.
The window feature can be rounded to the
simple translation of the same in the previous frame.
Besides, for the same reason, the intensity of the
translated window can be written as the original one
adding a dissimilarity term depending almost
linearly from the displacement vector. Starting from
the based solution, a few iterations are enough to
make the system converge.
The Lucas-Kanade‘s algorithm is well known to
be very accurate and robust. These two
characteristics are in contradiction: considering the
tracked window size we can assume that, to increase
accuracy, we should use a smaller as possible
window so that we do not loose too many details. On
the other hand if we want to maintain a particular
robustness during alterations of light and of the size
of displacement, in particular for wide movements
we have to use a bigger window.
In this paper, we use the pyramidal
representation (Bouguet, 2000). In this way it is
easier follow wide pixel movements that, at the level
of the main image, are larger than the integration
window while, in a lower level, can be confined in
it. The pyramidal representation halves image
resolution in each level. The algorithm starts the
analysis from higher level, small images and with
few details, to go down to the next level so that the
accuracy will increase. This technique gives the
advantage to follow wide movements; dissimilarity
of displacement vector remains very small not
breaking the hypothesis of Taylor series.
It is possible improve the convergence range,
doing a suppression of the high spatial frequency,
doing then the smoothing of the image. In this way
the details will be reduced and then the accuracy: if
the smoothing window is bigger than the size of the
subject that you want to track, this will be totally
erased. Smoothing will be applied to higher level
image, the one with low resolution, so that the
information will not be lost.
For feature selection, we noted not all parts of
image contain complete information about
movement.
This is the aperture problem: along a straight
border, we can determinate only the border
orthogonal component. This is typical problem of
Lucas-Kanade algorithm; it derives from lack of
constraints for the main equation of optical flow.
Generally strategy to overcome these problems is to
use only regions with enough rich texture. In this
paper we follow Tomasi and Kanade approach
(Tomasi and Kanade, 1991). They use a choice
criterion to optimized tracking performance. In other
words, a feature is defined “good” if it can track
“well”. So we considered only features with gradient
of eigenvalues enough big.
Tracking algorithm presented stop when a
feature is declared “lost”. This can happen because
the system can’t recognized it enough well. It is
necessary recovery of that feature. The idea is to
SIGMAP 2008 - International Conference on Signal Processing and Multimedia Applications
366