ACCURATE IMAGE REGISTRATION BY COMBINING

FEATURE-BASED MATCHING AND GLS-BASED MOTION

ESTIMATION

∗

Raul Montoliu and Filiberto Pla

Computer Vision Group. Jaume I University. 12071 Castellon. Spain

Keywords:

Image Registration, Motion Estimation, Generalized Least-Squares Estimation, SIFT.

Abstract:

In this paper, an accurate Image Registration method is presented. It combines a feature-based method, which

allows to recover large motion magnitudes between images, with a Generalized Least-Squares (GLS) motion

estimation technique which is able to estimate motion parameters in an accurate manner. The feature-based

method gives an initial estimation of the motion parameters, which will be reﬁned using the GLS motion

estimator. Our approach has been tested using challenging real images using both afﬁne and projective motion

models.

1 INTRODUCTION

Image registration (Brown, 1992) is a key problem in

many applications in computer vision and image pro-

cessing. We refer to Image Registration as the process

of ﬁnding the correspondence between all the pixels

of two images of the same scene captured using differ-

ent time, sensors or viewpoints. In the case of having

more than two images, this problem is closely related

to the creation of panoramic images (Szeliski, 2004).

In the literature of computer vision and image

processing there can be found two main research

directions in Image Registration: feature-based and

optimization-based.

The main limitation of the feature-based meth-

ods is the high dependence about how the detection

and extraction of the features from the images are

performed. This can affect to the accuracy of the

registration in the case of using interest point detec-

tors with a low repeatability rate. However, impor-

tant advances have been reached in this area. Many

researchers have developed interest point detectors

and descriptors invariant to large rotations, changes

of scale, illumination changes and even partially in-

variant afﬁne changes. See (Mikolajczyk et al., 2005)

∗

This paper has been partially supported by project

ESP2005-07724-C05-05 from Spanish CICYT.

and (Mikolajczyk and Schmid, 2005) for a compara-

tive study of scale and afﬁne invariant interest point

detectors and local descriptors, respectively. Szeliski

in (Szeliski, 2004) maintains that if the features are

well distributed over the image and the descriptors are

reasonably designed for repeatability, enough corre-

spondences to permit image registration can usually

be found. This is the case when using the feature de-

tectors and descriptors reported at (Mikolajczyk et al.,

2005; Mikolajczyk and Schmid, 2005), which allow

to register images with large deformations.

On the other hand, optimization methods, which

use directly the grey level of all pixels, are based on

estimating a vector of parameters that minimize (or

maximize) an objective function. The main advan-

tage of optimization methods is their estimation ac-

curacy because of the huge volume of data implies

that parameter estimation for image registration are

heavily over-constrained. Therefore, methods based

on optimization techniques can be very accurate since

a small number of parameters (6 for the afﬁne mo-

tion model) are estimated using a large number of

constraints. However, they suffer from initialization

problems due to its iterative nature: the initial param-

eters must not be very far from the solution in order

to avoid falling in a local minima. A well-know tech-

nique to cope with this initialization problem is the

use of a hierarchical (coarse-to-ﬁne) technique. How-

386

Montoliu R. and Pla F. (2007).

ACCURATE IMAGE REGISTRATION BY COMBINING FEATURE-BASED MATCHING AND GLS-BASED MOTION ESTIMATION.

In Proceedings of the Second International Conference on Computer Vision Theory and Applications - IFP/IA, pages 386-389

 SciTePress

ever, even using hierarchical techniques, optimization

methods are not able to cope with very large motion.

Given these two approaches to image registration,

which is preferable? A possible solution of this prob-

lem is to combine both methods, feature-based and

optimization-based to form an accurate image reg-

istration technique able to cope with large deforma-

tions. That is the strategy that uses our approach. First

a feature-based method is used to obtain a good initial

motion parameters that are not very far from the true

solution. Using this initialization, in the second step,

an optimization-based algorithm is applied, which re-

ﬁnes the estimation of the motion parameters until the

accuracy level desired by the user.

At the ﬁrst step, to cope with changes of scale,

rotations, illumination changes and partially afﬁne in-

variance, the SIFT technique (Lowe, 2004) has been

used to detect and descript interest points due to

its excellent performance (Mikolajczyk and Schmid,

2005). As the main contribution of this paper, we pro-

pose to use a Generalized Least-Squared (GLS) mo-

tion estimation method in the second step. As it will

be shown, the use of a GLS estimator is an effective

way of solving regression problems, allowing to ob-

tain accurate estimation of the parameters in Image

Registration.

The rest of the document is organized as follows:

The next section explains the GLS estimator for gen-

eral problems and particulary for motion estimation.

Section 3 comments in detail the proposed algorithm.

Section 4 shows the experiments performed using our

approach and ﬁnally the last section presents the main

conclusions drawn from this paper.

2 GLS MOTION ESTIMATION

In general, the GLS estimation problem can be ex-

pressed as follows:

minimize [Θ

= υ

υ] subject to F

(χ, L

) = 0, ∀L

(1)

where υ is the vectors of residuals of the observa-

tions, χ = (χ

, . . . , χ

) is a vector of p parameters,

each L

is an observation vector with n components

= (L

, . . . , L

), and F

is a set of f functions that

depend on the common vector of parameters χ and on

an observation vector L

, with i = 1 . . . r.

Brieﬂy summarizing, in the GLS method, the it-

erative optimization is started with an initial guess

of the parameters χ

. At each iteration j, the algo-

rithm estimates ∆χ to update the parameters as fol-

lows: χ

= χ

j−1

+ ∆χ. The increment ∆χ is calcu-

lated (Danuser and Stricker, 1998) based on the par-

tial derivatives of the functions F

with respect to the

parameters, χ, and the observation vectors L

, using

the following expressions:

∆χ =

∑

i=1...r

−1

∑

i=1...r

(2)

where N

= A

)

−1

and R

= A

)

−1

with







∂F

(χ

j−1

)

∂L

. . .

∂F

(χ

j−1

)

∂L

∂F

(χ

j−1

)

∂L

. . .

∂F

(χ

j−1

)

∂L







( f ×n)







∂F

(χ

j−1

)

∂χ

. . .

∂F

(χ

j−1

)

∂χ

∂F

(χ

j−1

)

∂χ

. . .

∂F

(χ

j−1

)

∂χ







( f ×p)







−F

(χ

j−1

, L

)

−F

(χ

j−1

, L

)







( f ×1)

(3)

In motion estimation problems, the objective func-

tion is usually based on the assumption that the gray

level of all the pixels of a region ℜ remains constant

between two consecutive images in a sequence, i.e.

the Brightness Constancy Assumption (BCA), which

is based on the principle of assuming that the changes

in gray levels between the reference image and the

test one are only due to motion.

In order to directly use the BCA instead of its lin-

earized version, i.e. the optical ﬂow equation, a non-

linear estimator should be used. The GLS estimator

can be used in this context. In our formulation of

the motion estimation problem, the function F

is ex-

pressed as follows (note that in this case the number

of functions f is 1):

= I

, y

) − I

, y

), (4)

where I

, y

) is the gray level of the ﬁrst image in

the sequence (reference image) at the point [x

, y

], and

, y

) is the gray level of the second image in the

sequence (test image) at transformed point [x

, y

]. In

this case, each observation L

is related to each pixel

, y

], with r being the number of pixels. ℜ is the

area of interest.

Let us consider the test image (I

) as the data

model to match, and the reference image (I

) as ob-

servation data. For each pixel i, let us deﬁne the ob-

servation vector as L

= (x

, y

, I

, y

)), which has

ACCURATE IMAGE REGISTRATION BY COMBINING FEATURE-BASED MATCHING AND GLS-BASED

MOTION ESTIMATION

387

three elements (n = 3): column, row (pixel coordi-

nates) and gray level of reference image at these co-

ordinates. The gray level of the reference image has

been selected as an element of the observation vec-

tor since it is the observation that we want to match

with the given gray level in the test image using the

BCA. The spatial coordinates have also been selected

as part of the observations, since inaccuracy in their

measurement can happen, because of the image ac-

quisition process.

In order to calculate the matrices A

, B

and E

(see

equation 3), the partial derivatives of the function F

with respect to the parameters and with respect to ob-

servations must be worked out.

For instance, using afﬁne motion, the terms B

, A

and E

are expressed as follows:



− a

, I

− b

, 1.0



(1×3)



−x

, −y

, −I

, −x

, −y

, −I



(1×6)

= −



, y

) − I

, y

)



(1×1)

(5)

where I

, I

and I

have been introduced to sim-

plify notation as: I

= I

, y

), I

= I

, y

), I

, y

) and I

= I

, y

), being I

, y

), I

, y

the gradients of the reference image and I

, y

) and

, y

) the gradients of the test image.

3 ALGORITHM PROPOSED

Our proposed algorithm can be summarized in these

four sequential steps:

1. Detection and description of interest points:

The SIFT technique is applied for detecting and

performing the description of the points of inter-

est in both images.

2. Matching of interest points: For each interest

point belonging to the ﬁrst image a K-NN search

strategy is performed to ﬁnd the k-closest inter-

est points at the second image. At the end of this

process, a set of point pairs is obtained.

3. Estimate ﬁrst approximation using random

sampling: For estimating the ﬁrst approximation

of the motion parameters a random sampling tech-

niques is used to determine a good initial solution.

4. Final motion estimation using GLS: The GLS

motion estimator is applied using as observations

all the pixels into the overlapped area in order to

move to more accurate solution. The process is

ﬁnished when ∆χ is close to 0, which is usually

fulﬁlled in a few iterations.

4 EXPERIMENTS

In order to test our approach in Image Registration

problems, a set of challenging sets of image pairs

have been selected. They can be downloaded from

Oxford’s Visual Geometry Group web page

. They

present ﬁve types of changes between images in 8

different sets of images: Blur: bikes and tree sets,

illumination: leuven set, jpg compresion: ubc set,

zoom+rotation: bark and boat sets, and viewpoint:

graf and wall sets. To check the accuracy of the regis-

tration, the normalized correlation coefﬁcient (NCC)

similarity measure has been calculated using the pix-

els of the overlapped area of both images. The NCC

gives values from −1.0 (low similarity) to 1.0 (high

similarity). The NCC is expressed as follows, with

,µ

being the average of the gray level of both im-

ages and ℜ the overlapped area:

NCC(I

, I

) =

∑

)∈ℜ

[(I

− µ

)(I

− µ

)]

∑

)∈ℜ

− µ

)

∑

)∈ℜ

− µ

)

(6)

and I

have been introduced to simplify notation as:

= I

, y

), I

= I

, y

)

We have focused on showing the results when

solving challenging situations like the zoom+rotation

and viewpoint sets of pair of images, particularly on

boat, bark, graf and wall sets. The afﬁne motion

model has been used for images from sets bark and

boat since there is not a viewpoint change. The main

difﬁculty of this set is the presence of large rotations

and changes of scale. The presence of moderate and

large viewpoint changes forces to use the projective

motion model instead of the afﬁne one for images

from graf and wall sets.

Table 1 shows the average NCC calculated for the

experiments performed with the images belonging to

each set, after initial estimation and after ﬁnal GLS

estimation. In general, the feature-based technique

provides a good but not excellent (in terms of ac-

curacy) initial estimation of the motion parameters,

which are accurately improved after the GLS estima-

tion.

Figure 1 and 2 shows the results of the registration

process obtained for some of the most difﬁcult pairs

from the four studied sets. The discontinuous white

line mark the boundary of the reference image (i.e the

ﬁrst image of the pair).

In general the proposed method is able to regis-

ter all the images from bark and boat sets, but suffers

http://www.robots.ox.ac.uk/ vgg/research/afﬁne/

index.html

VISAPP 2007 - International Conference on Computer Vision Theory and Applications

388

Table 1: Average NCC obtained for each set.

bark boat graf wall

After initial estimation 0.84 0.81 0.64 0.85

After GLS estimation 0.95 0.91 0.88 0.92

Figure 1: Registration results for images from boat. (1,4)

set.

Figure 2: Registration results for images from graf (1,3) set.

in the case of strong viewpoint transformation, like in

the case of registering the last images from wall and

graf sets. For instance, when registering images 1 and

5 from graf set. The main problem in those cases is

that, for the initial registration, there are not enough

good feature point matches, due to the SIFT limita-

tions to the presence of strong viewpoint changes.

The registration of images from graf has an addi-

tional difﬁculty, the car changed its position between

the image capture process (see the right-bottom cor-

ner of images). Therefore, the NCC obtained is not

as good as the obtained for the other images (see Ta-

ble 1) since those pixels are also been included in the

calculation of the NCC. However, those pixels do not

affect the accurate estimation of the motion parame-

ters.

5 CONCLUSIONS

In this paper an image registration approach has been

presented. It uses a feature-based method, which al-

lows to cope with large magnitude of changes in scale,

rotation and viewpoint, combined with an accurate

Generalized Least Squares motion estimation tech-

nique, which uses the result obtained by the feature-

based method as initialization, and reﬁnes the estima-

tion of the motion parameters.

The proposed approach has been successfully

tested using challenging real pairs of images with il-

lumination changes, different blur level, different jpg

compression, large changes of scale, large rotations

and moderate viewpoint changes, obtaining high ac-

curacy in the estimation of motion parameters. How-

ever, some problems arise in presence of very large

viewpoint changes. This is due to the use of a non-

viewpoint invariant interest point detector. As part of

our future work we would introduce an viewpoint in-

variant point detector to overcome this shortcoming.

REFERENCES

Brown, L. G. (1992). A survey of image registration tech-

niques. ACM Computing Surveys, 24(4):325–376.

Danuser, G. and Stricker, M. (1998). Parametric model-

ﬁtting: From inlier characterization to outlier detec-

tion. IEEE Transaction on Pattern Analysis and Ma-

chine Intelligence, 20(3):263–280.

Lowe, D. G. (2004). Distinctive image features from scale-

invariant keypoints. International Journal of Com-

puter Vision, 60(2):91–110.

Mikolajczyk, K. and Schmid, C. (2005). A performance

evaluation of local descriptors. IEEE Transactions on

Pattern Analysis and Machine Intelligence, 27(10).

Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A.,

Matas, J., Schaffalitzky, F., Kadir, T., and Gool, L. V.

(2005). A comparison of afﬁne region detector s. In-

ternational Journal of Computer Vision, 65(1/2).

Szeliski, R. (2004). Image alignment and stitching: A tu-

torial. Technical Report MSR-TR-2004-92, Microsoft

Research.

ACCURATE IMAGE REGISTRATION BY COMBINING FEATURE-BASED MATCHING AND GLS-BASED

MOTION ESTIMATION

389