the discontinuities of the boundaries. The distortion
is then modelled by a piecewise set of planes,
similar to a faceted surface.
When using a network of CPs, it may not be
necessary to warp all the triangle areas with the
same number of iterations. For instance, intuitively it
could be expected that the ocean areas in Figure 2,
which are smooth with low detail features, would
require fewer warping iterations than the building
regions.
Since the (x,y) pixel coordinates of the warped
image will no longer be integer values, new integer
pixels must be estimated by an interpolation process.
There exist many interpolation methods, the most
common being the nearest neighbour, bilinear, cubic
convolution and splines techniques.
The spatial and local interpolation technique
considered here was the nearest neighbour
interpolation. Although bilinear interpolation and
cubic convolution may yield more visually pleasing
results the nearest neighbour approach is generally
used when radiometric fidelity is at a premium
(Russ, 2007).
Once all the low-resolution images are processed
and warped to a common orientation using the
methodology described above, the next step is to
register or match all the low resolution images to a
common reference frame and thereby determine the
value of the sub-pixels shifts existing (if any) among
them.
3 IMAGE REGISTRATION
In an idealised scene registration, two different
images of the same object are assumed to be
essentially identical except for an x and y shift. In
practice, with distorted and multi-temporal video
frames, the two images will generally exhibit
substantial differences beyond this assumption.
These differences can be classified as:
• Intensity differences, - e.g. the images are taken
at different times or under different lighting
conditions;
• Structural differences, - e.g. between the taking
of the two images the common objects may have
altered; and
• Geometric differences, - e.g. the motion of the
camera may cause geometric differences such as
rotation, aspect and scale in the object.
Even though two images may both be of the
same scene, these differences of intensity, structure,
and geometry will often be sufficient to produce
erroneous registrations. If the differences of intensity
and structures are very small, the reference image
can be thought of as an exact map of the object
scene and scene matching can be characterised as
map matching.
On the other hand, when the differences between
two images are large, the reference information may
no longer be a map but somewhat like directions
given to a lost tourist: ‘Turn left at the set of lights,
follow the road past the church and then turn right
after the park’. With this knowledge, the tourist can
effectively perform the scene-matching function and
find his way to his intended destination.
Analogously, when there is a considerable
difference between two images, simple matching
algorithms will not work and so some iterative
warping of one image relative to the other must take
place before the images can appear similar and be
combined into a higher resolution composite.
The registration problem can be stated as finding
the transformation T (the warping transform) that,
when applied to one image F(x, y), will ultimately
bring the object detail into registration with the
corresponding detail in another image G(r, s), such
that:
* F(
,y) = G(r,s) (3)
where the symbol = means equivalence of object
detail.
The result of applying the warping
transformation T to all the low-resolution images is
used to carry out a preliminary alignment of all the
low-resolution images. The alignment assumes that
there is only a global translation among the images
and, as a preliminary step, this alignment is carried
out within the integer range. This initial step is
referred to as the pixel shift estimator and is based
on normalized cross-correlation techniques.
After the low-resolution images are aligned
within a pixel, the second step is to compute the real
fractional shift between each image. The method for
estimating this fractional shift is based on Taylor
series and can achieve sub-pixel accuracies of
approximately 0.1 pixels. The reader is referred to
Pilgrim (1991) for the theory and formulation behind
this methodology.
For a correct detection of the shifts or offsets
between two images, the images must contain some
features that make it possible to register or match
them.
Very sharp edges and small details are most
affected by aliasing, so they are not reliable to be
used to estimate these shifts. Uniform areas are
useless, since they are translation invariant
SIGMAP 2009 - International Conference on Signal Processing and Multimedia Applications
52