variant) detectors (and descriptors) as in (Lowe, 2004;
Tuytelaars, 2000). We define the (squared) blocks
B
1
on image I
i
around the detected feature points.
These blocks are matched with blocks B
2
from im-
age I
i+1
using the weighted zero-mean normalized
cross-correlation (CC):
CC =
i
w
i
(B
1,i
− B
1
)(B
2,i
− B
2
)
i
w
i
(B
1,i
− B
1
)
2
i
w
i
(B
2,i
− B
2
)
2
(3)
where
B
1
and B
2
are denoted as the mean values
of respectively blocks B
1
and B
2
. This correlation
measure is illumination invariant, i.e. blocks with a
biased illumination change will yield the same corre-
lation as blocks with no biased illumination change.
The weights w
i
are chosen to favour the central part of
the window (for example with a Gaussian function).
Higher (subpixel) accuracy is obtained by fitting the
neighbourhood of the highest correlation coefficient
to a second degree polynomial model.
In the next step, we have to estimate the parame-
ters of our transformation model m using the matched
pairs. The influence of the worst matches (outliers)
should be minimized. A robust estimate of these
parameters can be achieved with Hough transforms,
RANSAC, LMeds, M-estimation, bootstrap methods,
etc. (Rousseeuw and Leroy, 1987). Based on the gen-
eralized maximum likelihood and least squares for-
mulation, we will use M-estimators. In particular, the
M-estimate of a
is
a
= arg min
a
i
ρ(r
i,a
) (4)
where ρ is a robust loss function and r
i
is the
scale normalized residual. A good (robust) initial-
ization is crucial for the success of M-estimation,
otherwise it would yield poor results due its low
breakdown point (Stewart, 1999). A robust initializa-
tion is achieved using a coarse-to-fine multiresolution
framework. In the coarsest level, we can use temporal
information from the registration between I
i−1
and
I
i
, which also additionally reduces the computation
time. Using Kalman or particle filtering could result
in a better prediction (Doucet et al., 2000). But in this
case, we keep it simple: we use the previous estima-
tion as the new prediction. Solving this robust regres-
sion problem leads to W-estimators and the iterative
reweighted least squares (IRLS) algorithm (Stewart,
1999). In each iteration, the weights of each pair are
adapted in function of their residuals and a weighted
least squares (WLS) algorithm is applied until conver-
gence is reached. In order to recover numerical stable
parameters, singular value decomposition is used to
solve the linear system in the WLS algorithm. We
initialize the weights of the IRLS algorithm with CC
information: if a matched pair has a high correlation
(hence is more reliable), then it should have more in-
fluence on the parameter estimation. After applying
IRLS, we do not only have an estimate for parameters
θ
i+1
, but also the final output weights which represent
the importance of each contributing pair. With this in-
formation we can exclude bad registered regions (typ-
ically caused by moving objects) in all levels of the
hierarchical framework.
The combination of the transformation parame-
ters θ
i+1
, which are obtained from the registration
between subsequent images, and the parameters θ
i
,
which are obtained between the previous image I
i
and
the panorama P , form a good initial estimation for the
parameters θ
i+1
from the registration between I
i+1
and P . We correct the parameters θ
i+1
using the same
previously described algorithm and update the provi-
sional mosaic with image I
i+1
. Since the next im-
age I
i+2
has the most similar features as image I
i+1
(taking the spotlight into account), more weights are
assigned to the last image when blending it into the
provisional mosaic using an averaging scheme. The
whole process is now repeated for image I
i+2
.
3.2 Robust Image Fusion
After transformation and resampling of the images I
i
(using the 8-point windowed Blackman-Harris sinc
function), we have a vector of candidates for each
pixel of the panorama P . Simple averaging will cre-
ate severe artifacts due to non-uniform illumination
conditions, moving objects and possible misregistra-
tion. We can tackle this illumination problem by as-
signing weights to each candidate pixel proportional
to the weights W (x
i
). Since we are interested in a
panorama in good lightning conditions, the weights
W (x
i
) for dark regions will tend to zero.
Moving objects can be modeled as a non-zero mean
Gaussian distribution and we classify misregistration
to the noise N. With these considerations, each can-
didate vector is observed as a weighted mixture of
Gaussians. Since we are only interested in the single
Gaussian density which represents the background,
we want to suppress the influence of other densities
by lowering the weights of the candidates which are
part of the moving objects. Similar to background
subtraction techniques (Radke et al., 2005), we cal-
culate the (weighted) average of all candidates. Af-
terwards we compare this average to all candidates,
if the absolute difference exceeds a certain threshold
(typically a number of standard deviations from the
mean background model), then the candidate belongs
most likely to an object and its weight is set to zero.
The new weighted average is a good first estimate,
VISAPP 2006 - IMAGE ANALYSIS
308