the movements of the camera system in the opposite
direction. Such hardware solution exceeds the scope
of this article. Nevertheless, the hardware computing
unit for digital image stabilization is described in the
third chapter. This hardware unit is composed of two
DSP processors with connected FPGAs and
SDRAMs. The purpose of this unit is to ensure suf-
ficient computational power for the digital image
stabilization algorithm, which is introduced in the
following chapter.
2 IMAGE STABILIZATION
ALGORITHMS
Image stabilization algorithms mostly consist of
three main parts: motion estimation, motion smooth-
ing and motion compensation. Main task of the first
block is to estimate a several local motion vectors
and on the basis of these local estimates calculate a
global motion vector. The second block deals with
filtering, integration (respectively), of the estimated
global motion vector. The main purpose of this stage
is to smooth the calculated value and prevent large
and undesirable differences between motion vectors
calculated in past. The last block shifts the acquired
image in inverse direction according to the global
motion vector. This block can take into account
more sophisticated transformations like rotation or
warping.
A lot of various approaches exist nowadays. The
main difference lies in the resultant accuracy of
global motion vector and algorithm used for estima-
tion of local motion vector. We distinguish between
pixel and sub-pixel resolution. Second approach is,
however, complicated and more time consuming
than the previous one because an interpolation me-
thod is needed. So it is rarely used in real-time ap-
plications. Some algorithms consider rotation or
more complex warping in addition to translation.
Table 1 summarizes several basic algorithms and
their parameters. Other algorithms are described e.g.
in (Sachs et al., 2007).
Table 1: Some variants of stabilization algorithms.
Algorithm Accuracy Transformation Ref.
Parametric
Block Matching
sub-pixel translation, rotation
(Vella et
al., 2002)
Gray-Coded Bit
Plane Matching
pixel translation
(Ko et al.,
1999)
Block Matching pixel translation
(Brooks,
2003)
We will concentrate on algorithms that use trans-
lation with pixel accuracy only. In the following
section is described a simply plain matching algo-
rithm and some basic ideas of the stabilization. The
next section will be devoted to one promising algo-
rithm modification of which we used.
2.1 Plain Matching Algorithm
As stated above, the algorithms that deal with stabi-
lization are based on estimation of a motion
represented by a motion vector. The straightforward
solution leads to use of a discrete correlation, cross-
correlation respectively (Brooks, 2003). The discrete
correlation produces a matrix of elements. The ele-
ments with a high correlation (value) correspond to
the locations where a chosen pattern and image
match well. It means that the value of element is a
measure of similarity in relevant point and we can
find locations in an image that are similar to the pat-
tern.
The input is formed by a pattern (in the form of
an image) F and an input image I. It is not necessary
to search the whole image, thus a smaller area
(search window N×N) is defined. At the same time,
this area specifies the maximal shift in vertical and
horizontal direction and is chosen in this manner.
Eq. (1) represents a discrete 2D correlation func-
tion
),( yxIF D
.
∑∑
−=−=
++=
N
Nj
N
Ni
jyixIjiFyxIF ),(),(),(D
(1)
Note that matching according to the first defini-
tion is problematic. Correlation can also be high in
locations where the image intensity is high, even if it
doesn’t match the pattern well. Better performance
can be achieved by a normalized correlation
(Brooks, 2003):
()
∑∑∑∑
∑∑
−=−=−=−=
−=−=
++
++
N
Nj
N
Ni
N
Nj
N
Ni
N
Nj
N
Ni
jiFjyixI
jyixIjiF
2
2
),(),(
),(),(
(2)
Figure 2 shows an input image, pattern (red
small area) and correlation matrix obtained by the
normalized correlation. We defined the search win-
dow (green big area) and the pattern is searched
within this window. The result matrix has the same
dimensions as the search window (M×M). The pixel
with the maximum value determines position of the
pattern in the search window. Hence, it determines
VISAPP 2009 - International Conference on Computer Vision Theory and Applications
622