in real time remains a very challenging task. The
approaches can be classified according to the
complexity of the model. Moreover, we can
distinguish local vs global methods and direct vs
feature-based approaches. Regarding model
complexity, Bhat and al. (Bhat 00) use a simple
translation motion models for motion segmentation
with a PTZ camera. However, this assumption is
only fulfilled for small tilt angles. More complex
motion models are thus generally proposed, such as
rigid, affine (Szeliski 97, Brown 03) or general
projective models (Bevilacqua 05). In addition, most
cameras deviate from a real pin-hole model due to
radial distortion which becomes more prominent for
shorter focal lengths, and some approaches (Sinha
04) propose to compensate it.
Local approaches aim at determining the model's
parameters for each couple of successive frames,
and consists in a frame to frame (or pairwise)
registration. They are computationally efficient but
this strategy introduces small alignment errors to
accumulate. In particular, these errors become more
evident when a video sequence returns to a
previously captured location (problem known as
"looping path"). Global approaches (Szeliski 97,
Brown 03) formulate the registration problem in
order to solve for all of the camera parameters
jointly, i.e. by requiring that the ends of a panorama
should join up. These kinds of exact optimization
schemes are most of the time not compatible with
real-time purpose, thus making global methods
suitable mainly for batch computation.
Direct (or intensity-based) methods (Szeliski 97 ,
Sinha 04) attempt to iteratively estimate the camera
parameters by minimizing an error function based
on the intensity difference in the area of overlap.
This can be achieved by computing the sum square
difference (SSD) or ZSSD, the correlation
coefficient (CC), the mutual information (MI) and
the correlation ratio (RC). Szeliski and Shum
(Szeliski 97) propose to estimate the registration
homography by iteratively updating a correction
matrix using the SSD. They use an affine model, but
claim that their general strategy can be followed to
obtain the motion parameters associated with any
other motion models (perspective or even including
radial distortion). In addition, they apply global
alignment to the whole sequence of images, which
results in an optimal image mosaic. Direct methods
have the advantage that they use all of the available
data and hence can provide very accurate
registration, but they depend on the fragile
"brightness constancy" assumption, and being
iterative require initialization. Feature-based
methods (Bevilacqua 05, Brown 03) start by
establishing correspondences between points, lines
or other geometrical entities for estimating the
camera parameters. For example, Bevilacqua et. al
(Bevilacqua 05) suggest to match current frame
features (corners) to the background mosaic using
the KLT tracker. They make use of a generic
projective model, and propose to overcome the
"looping path" problem with a feedback registration
correction compatible with real-time requirements.
In their approach no a priori information regarding
the camera parameters or signals (pan/tilt angular
movements). Thus, they use a histogram
specification technique (Azzari 06) to manage
automatic camera exposure adjustments (e.g. AGC)
and environmental illumination changes (e.g.
daytime changes). Brown and Lowe (Brown 03)
propose to match SIFT features between all of the
input images to form the panorama. They make use
of an affine transformation model that they justify
by the partially invariance of SIFT descriptors under
affine change. They use a RANSAC algorithm as a
probabilistic model for image match verification, in
order to discard outliers for the parameters
estimation. Finally, they use bundle adjustment
(Triggs 00) as a global registration scheme to solve
for all of the camera parameters jointly. Although
the approach is efficient, and is able to automatically
images being part of the mosaic, the panorama
computation requires 83 seconds on a 2GHz PC.
2.2 Registration Problem Formulation
Mapping the current frame into a common reference
coordinate system consists in determining the
transformation between the acquired image I and the
previously built panorama P, i.e. finding the
homography between I and P. An homography is
defined as a transformation between two projective
planes. An exhaustive review of the projective
transforms is beyond the scope of the paper, and the
reader can refer to (Faugeras 93).
Projection Model. Using homogeneous
coordinates, the homography corresponds to a linear
transform that can be represented using a 3 × 3
matrix multiplication H. Denoting X =(u,v,1)
T
the
coordinates of a point P
t
in the current image I, the
homography H maps P
t
to P’
t
∈ P, whose
coordinates are X'= (u', v',w')
T
:
VISAPP 2009 - International Conference on Computer Vision Theory and Applications
610