FAST AND ROBUST IMAGE MATCHING USING CONTEXTUAL
INFORMATION AND RELAXATION
Desire Sidibe, Philippe Montesinos and Stefan Janaqi
LGI2P/EMA - Ales School of Mines, Parc scientifique G. Besse, 30035 Nimes Cedex 1, France
Keywords:
Relaxation, Image matching, Point matching, Scale invariant features.
Abstract:
This paper tackles the difficult, but fundamental, problem of image matching under projective transformation.
Recently, several algorithms capable of handling large changes of viewpoint as well as large scale changes have
been proposed. They are based on the comparison of local, invariants descriptors which are robust to these
transformations. However, since no image descriptor is robust enough to avoid mismatches, an additional step
of outliers rejection is often needed. The accuracy of which strongly depends on the number of mismatches. In
this paper, we show that the matching process can be made robust to ensure a very few number of mismatches
based on a relaxation labeling technique. The main contribution of this work is in providing an efficient and
fast implementation of a relaxation method which can deal with large sets of features. Futhermore, we show
how the contextual information can be obtained and used in this robust and fast algorithm. Experiments with
real data and comparison with other matching methods, clearly show the improvements in the matching results.
1 INTRODUCTION
The problem of finding correspondences between im-
age features is fundamental in many computer vi-
sion applications such as stereo-vision, image re-
trieval, image registration, robot localization and ob-
ject recognition. Recently, local and invariant fea-
tures have provento be very successful in establishing
image-to-image correspondences. The local charac-
ter yields robutsness to occlusion and varying back-
ground, and invariance makes them robust to scale
and viewpoint changes. Interest points are one of
the most widely used local features. In many appli-
cations, one aims to obtain a set of corresponding
points between two images. Therefore, the extracted
points have to be characterized by a descriptor and
then matched using a similarity measure.
Different methods for detecting invariant features
are proposed (Baumberg, 2000; Mikolajczyk and
Schmid, 2002; Tuytelaars and Van Gool, 2004; Lowe,
1999; Schaffalitzky and Zisserman, 2002; Matas
et al., 2002). Among them, it is worth mention-
ing those based on interest points. (Mikolajczyk and
Schmid, 2002; Mikolajczyk and Schmid, 2004) pro-
pose a scale and affine invariant interest points de-
tector using a scale-space representation of the im-
age. First, points are detected at multiple scales us-
ing the Harris detector. Then points at which a lo-
cal measure of variation is maximal over scales are
selected. Finally, an iterative algorithm modifies lo-
cation, scale and local shape of each point and con-
verges to affine invariant points. Scale-space repre-
sentation is also used by (Lowe, 1999) who uses lo-
cal extrema of Difference-of-Gaussian (DoG) filters
as key-points. Similar ideas are used by other au-
thors (Baumberg, 2000; Schaffalitzky and Zisserman,
2002). For a more detailed review on affine invariant
features detection, please refer to (Mikolajczyk et al.,
2005).
Once the points are detected, the region around
each of them is used to compute a descriptor. In-
variance to affine transformations is provided by the
fact that each point is characterized by a specific scale
which defines the size of its region and that each re-
gion has a specific shape. Many different techniques
for describing local image regions have been devel-
oped and it has been shown that the SIFT (Scale and
Invariant Feature Transform) descriptor performs bet-
68
Sidibe D., Montesinos P. and Janaqi S. (2007).
FAST AND ROBUST IMAGE MATCHING USING CONTEXTUAL INFORMATION AND RELAXATION.
In Proceedings of the Second International Conference on Computer Vision Theory and Applications - IU/MTSV, pages 68-75
Copyright
c
SciTePress
ter than others (Mikolajczyk and Schmid, 2005). This
descriptor is based on the gradient distribution in the
detected regions around the points and is represented
by a 3D histogram of gradient locations and orienta-
tions (Lowe, 1999).
Affine invariant points combined with a distinc-
tive descriptor such as SIFT lead to very good re-
sults in the presence of significant transformations.
However, while in the aforementioned works much
effort is done for computing distinctive descriptors,
less attention is paid to the matching strategy. A sim-
ple comparison of the descriptors, for example using
Euclidean or Mahalanobis distance, and matching to
nearest neighbour will always give some mismatches.
This is because no image descriptor is robust enough
to be perfectly discriminant and avoid mismatches.
Thus, an additional step of outliers rejection is often
needed. One approach is to estimate the geometric
transformation between the pair of images and use
this information to reject inconsistent matches (Zhang
et al., 1995). This can, of course, be done only in
stereo-vision or in matching images containing pla-
nar structures for which the epipolar constraint or a
plane homography can be estimated. The accuracy
of the estimation relies on the number of mismatches.
This number can be reduced by considering the ratio
between the first and second nearest neighbour, i.e.
matching a point to its nearest neighbour if this one is
much more closer than the second nearest neighbour
(Zhang et al., 1995; Lowe, 1999). Taking into acount
a kind of ambiguity measure, this strategy reduces the
number of mismatches. Unfortunately, it reduces the
number of correct matches as well.
Moreover, when the ambiguity is high as it is in
the presence of repetitive patterns, see Figure 1, the
previous methods fail to find correct matches. That is
because, in these cases, all the points have almost the
same SIFT descriptor. So, matching to nearest neigh-
bour gives a lot of mismatches. Taking some addi-
tional information into account during the matching
process could reduce the ambiguity. This is the main
idea of the widely used relaxation labeling technique.
However, most of the existing algorithms (Rosenfeld
et al., 1976; Faugeras and Berthod, 1981) have pro-
hibitive complexity and are therefore limited to the
assignment of a small number of labels.
In this paper, we present a matching method based
on relaxation which can handle large point sets and
provide a very few number of mismatches under im-
portant transformations. This work is based on an al-
gorithm presented by (Faugeras and Berthod, 1981)
and our main contribution is in providing a fast and
efficient implementation of this algorithm. Futher-
more, we show how the contextual information can
be obtained and used in this robust and fast algorithm.
The remainder of the paper is organized as follows.
In Section 2, we decribe the relaxation labeling tech-
niques and show their limits. Then our efficient im-
plementaion is given in Section 3. Experimental re-
sults showing the improvements of the method over
other existing techniques are presented in Section 4.
Finally, concluding remarks are given in Section 5.
Figure 1: A difcult case of matching. Matching to nearest
neighbour fails because of repetitive patterns.
2 RELAXATION MATCHING
2.1 Relaxation Labeling Techniques
The relaxation labeling technique was first introduced
by (Rosenfeld et al., 1976) to deal with ambiguity
and noise in vision system. Let u = {u
1
,...,u
n
}
and v = {v
1
,...,v
m
} be two sets of points from two
images. Each point is characterized by a descrip-
tor. The principal idea of relaxation is to use the
information provided by the neighbourhood of each
point to improve consitency and reduce ambiguity.
More precisely, let define for each point u
i
a set of
initial probabilities p
0
i
(k),k = 1,...,m; p
0
i
(k) being
the probability that point u
i
is matched with point
v
k
. An iterative process is designed to update the
probabilities until a consistent distribution is reached.
The update is based on a support, or compatibility,
function q
i
defined in the neighbourhood V
i
of
the point u
i
. This support function measures the
likelihood of a point u
i
to be matched with a point
v
k
, given the configuration of its neighbours. Many
probabilistic relaxation schemes have been proposed
and they essentially differ in the definition of the
support function and the updating rule. For example,
one standard updating rule is defined by (Hummel
and Zucker, 1983) as:
p
t+1
i
(k) =
p
t
i
(k)q
t
i
(k)
k
p
t
i
(k)q
t
i
(k)
(1)
where
q
t
i
(k) =
j
w
ij
"
l
p
ij
(k,l)p
t
j
(l)
#
(2)
and p
ij
(k,l) is the probability that point u
i
is matched
with point v
k
under the condition that point u
j
is
matched with v
l
. p
ij
(k,l) is the contextual informa-
tion that helps improving consistency. The scalars w
ij
are weights that indicate the influence of point u
j
on
point u
i
. They are normalized and verify
j
w
ij
= 1.
(Faugeras and Berthod, 1981) propose a relax-
ation scheme based on an optimization approach.
They define a global criterion to be minimized con-
sidering both consistency and ambiguity:
C = αC
1
+ (1 α)C
2
(3)
where the consistency measure is:
C
1
=
1
2n
n
i=1
kp
i
q
i
k
2
(4)
and the ambiguity measure is:
C
2
=
m
m 1
"
1
1
n
n
i=1
kp
i
k
2
#
(5)
Let x be the vector obtained by concatenating the
vectors p
i
, i.e. x = [p
1
,..., p
n
]
T
. Then, the problem
of finding a set of corresponding points comes down
to minimizing C(x) subject to the linear constraints:
n
k=1
x
i
(k) = 1 i = 1, . . . , n
x
i
(k) 0 i = 1,...,n k = 1,...,m
(6)
The optimization problem is solved by a projected
gradient method and for each point u
i
, the point v
k
with highest final probability is retained as its corre-
spondent. This approach seems better since the final
set of matches will be more consistent and less am-
biguous. However, it is limited in practice by its high
complexity.
2.2 Drawbacks of the Original Method
The main limitation of the optimization approach
(Faugeras and Berthod, 1981) and the nonlinear
approaches (Rosenfeld et al., 1976; Hummel and
Zucker, 1983) is their high complexity. The former
is in fact a O(nm
2
V) algorithm where V is the size of
V
i
for i = 1,...,n. Thus, these algortihms are appro-
priate to applications such as image segmentation or
classification issues where one needs to assign a small
number of labels, i.e. m O(10
2
). For applications
such as image matching where one needs to assign a
large number of points from one image to the other,
m O(10
4
), the methods become impractical. This
is mainly because the compatibility function q
i
(see
Equation 2) has to be re-estimated at each iteration.
Another limitation is the fact that the final prob-
abilities critically depend on the initial and the con-
ditional probabilities (Hummel and Zucker, 1983;
Price, 1985). If these quantities are not correctly es-
timated, then the final probabilities will provide a lot
of mismatches.
In the next section we address these two problems
and we show how the complexity can be considerably
reduced in order to handle large point sets. We also
provide a way of computing the conditional probabil-
ities.
3 FAST AND EFFICIENT
MATCHING ALGORITHM
3.1 Reducing the Complexity
In order to reduce the complexity of the optimization
approach, we show that the criterion C of Equation 3
can be written in the following form:
C(x) =
1
2
x
T
Hx+ cte (7)
i.e.
C([x
1
,...,x
n
]
T
) =
1
2
n
t=1
n
p=1
x
T
t
H
tp
x
p
+ cte (8)
where
H =
H
11
··· H
1n
.
.
. H
ij
.
.
.
H
n1
··· H
nn
and each matrix H
ij
contains the conditionalprobabil-
ities p
ij
(k,l), i.e. the contextual information needed
to compute the support function q
i
. See appendix for
details about obtaining the matrices H
ij
.
Firstly, if we consider in the definition of the sup-
port function (Equation 2) only points u
j
which are in
the neighbourhood V
i
of point u
i
, then it is clear that
some of the matrices H
ij
are equal to zero. In par-
ticular, it is easy to show that for i = 1,...,n and for
j = 1,...,n:
H
ij
6= 0 if
i = j or
u
j
V
i
or
k/(u
i
,u
j
) V
k
×V
k
(9)
Therefore, using a sparse matrix representation for
H we reduce the complexity of the method. To reduce
the complexity further, we bring down the set of po-
tential matches for each point u
i
to the set of its K
nearest neighbours given a similarity measure. Thus,
each matrix H
ij
is of size K × K instead of m × m.
With K m, this reduces memory requirement of the
algorithm.
Secondly, the matrix H is computed only once and
that makes the algorithm faster. At each iteration the
gradient of the criterion is obtained by the following
equation:
C
x
=
1
2
(H + H
T
)x (10)
In general, H is not a symmetric matrix. But in the
case it is, the gradient is given by the classical equa-
tion:
C
x
= Hx (11)
3.2 Initial and Conditional Probabilities
We mentioned already, see Section 2.2, that the re-
sults, i.e. the final probabilities, of a relaxation
scheme critically depend on the initial probabilities
and the conditional probabilities. So, estimation of
these quantities is of great interest.
Initial probabilities are computed based on Euclid-
ean distance between descriptors. We used SIFT as it
is considered to be the best local descriptor (Mikola-
jczyk and Schmid, 2005) and we choose the K nearest
neighbours points v
k
as the potential matches for each
point u
i
. Then, the initial probabilities are given by
the following equation:
p
0
i
(k) =
1/d
ik
K
k=1
1/d
ik
i = 1, . . . ,n k = 1,...,K
(12)
where d
ik
is the Euclidean distance between the de-
scriptors of points u
i
and v
k
.
For each interest point u
i
, the compatibility func-
tion q
i
indicates how a match assigned to point u
i
is
consistent with those of its neighbours in V
i
. Thus,
q
i
can be seen as an estimation of p
i
given the prior
knowledge represented by the p
ij
(k,l) for points u
j
in V
i
. Estimation of the p
ij
(k,l) can be done using
geometric and photometric information of the scene.
Geometric semi-local constraints are used by (Schmid
and Mohr, 1997; Montesinos et al., 2000; Tuytelaars
and Van Gool, 2004). In (Pelillo and Refice, 1994)
the compatibility coefficients are learned from train-
ing examples. We based the estimation of our contex-
tual information on photometric information because
in case of large viewpoint changes, geometry is badly
preserved. Moreover, as the SIFT descriptor gives a
geometric description of a point’s neirghbourhood, it
makes sense to use a complementary photometric in-
formation for matching.
For each point u
i
and each of its neighbours u
j
V
i
, we define a rectangular patch M
ij
of length l
ij
and
width l
ij
/2, l
ij
being the distance between u
i
and u
j
.
Note that in order to discard very small patches, we
consider in V
i
only points u
j
which are at a distance
from u
i
greater than 5σ
i
, σ
i
being the specific scale of
point u
i
:
u
j
V
i
if d
ij
5σ
i
(13)
This is because the detector can find two or more
points that are at the same location but with differ-
ent orientations. Each patch is normalized to a unit
square, and conditional probabilities are computed as
normalized cross-correlation between patches in both
images.
3.3 Matching Strategy
In many applications, one-to-one correspondence is
desired. But in general, because of occlusions, vary-
ing background and scale and viewpoint changes, not
all points in u will have a correspondance in v. To
solve this problem, one adds a nil point, v
nil
, to the set
of potential matches of each point u
i
. Thus, for each
point u
i
u, the set of potential matches is:
PM
i
= {v
i
1
,...,v
i
K
,v
nil
} (14)
where v
i
1
,...,v
i
K
are the K nearest neighbours points
v
k
of u
i
based on the Euclidean distance between SIFT
descriptors as described in Section 3.2. The matrices
H
ij
are of size (K + 1) × (K + 1) and can be written
as follows:
H
ij
=
p
ij
(1,1) ··· p
ij
(1,K)
p
∗∗
.
.
.
.
.
.
.
.
.
p
ij
(K,1) ··· p
ij
(K,K)
p
∗∗
p
∗∗
··· p
∗∗
p
∗∗
(15)
where p
∗∗
is a constant value defining the conditional
probabilities for v
nil
. Initial probabilities for v
nil
are
also set to a constant value:
p
0
i
(v
nil
) = p
i = 1,...,n (16)
Once the matrices H
ij
are obtained, the matrix H
is computed (see Section 3.1) and the optimization
problem is solved by a projected gradient method.
The algorithm converges to a local minimum after a
reduced number of iterations and for each point u
i
,
the potential match with highest final probability is
retained as its correspondent. Points in one image
which have no correspondent in the other image are
expected to match with v
nil
.
Figure 2: Examples of images used for wide baseline matching. Top: first, third and fifth frame of the Graf. sequence.
Bottom: first, third and fifth frame of the Boat sequence.
4 EXPERIMENTAL RESULTS
In this section we report some experimentscarried out
on real images to evaluate the performance of the al-
gorithm. First, we have conducted experiments in the
case of matching with large baseline using some im-
ages from (Mikolajczyk and Schmid, 2005)
1
and the
images presented in Figure 1. Secondly, we apply the
method to features-based object recognition. In all
our experiments we set V = K = 5, i.e. each point
has 5 neighbours and 5 potential matches. We set
the constant α of the criterion C (Equation 3) to 0.5,
i.e. consistency and ambiguity measures are given the
same importance. And initial and conditional prob-
abilities for the nil point are taken equal to 0.1, i.e.
p
= p
∗∗
= 0.1.
We compare our method, named ORELAX for op-
timization approach, with the three following tech-
niques:
CRELAX: relaxation technique using the classi-
cal updating rule defined in Equation 1 (Hummel
and Zucker, 1983);
NNDR: nearest neighbour distance ratio (Lowe,
1999). That is a point is matched to its nearest
neighbour if this one is much more closer than the
second nearest neighbour:
d
ik
= min(D
i
) < 0.6min(D
i
{d
ik
})
where D
i
= {d
il
,l = 1,... ,m};
SVD: a SVD-based method using SIFT features
(Delponte et al., 2006). A proximity matrix G is
computed using SIFT descriptors of points, and
matches are found based on a SVD decomposition
of G:
G
ij
= e
d
2
ij
/2σ
2
1
Images are available at
http://www.robots.ox.ac.
uk/
˜
vgg/research/affine/
4.1 Wide Baseline Matching
For these experiments the homographies between dif-
ferents views are available and we can compute the
matching rate (MR) of a matching method as the ratio
between the number of correct matches and the num-
ber of detected matches.
MR =
# o f correct matches
# o f detected matches
(17)
A couple of corresponding points (p, p
) is said to be
a correct match if:
kp
H pk < 5 (18)
where H is the homography between the two images.
We use two sequences, the Graf and the Boat se-
quences for evaluation. The former is a six frames
sequence with important viewpoint change between
the first and the following frames, and the later is a
six frames sequence with rotation and scale change.
Some of these images are shown in Figure 2.
We present an example of matching results ob-
tained by our method for the Graf sequence in Fig-
ure 3. There are respectively 1401 and 1279 interest
points detected in the first and fourth frames. The al-
gorithm finds 82 matches between these two frames
and 71 of them are correct.
Figure 3: Example of matching results with the Graf. se-
quence: 71 corrects matches are found between the first and
fourth frames.
Table 1 shows comparative results obtained with
the different methods. It can be seen that for this diffi-
cult case, ORELAX gives considerably more matches
than NNDR while maintining a high MR (matching
rate). SVD and CRELAX also provide a large num-
ber of matches but with a very poor MR, less than
0.5. This is mainly because the SVD decomposition
algorithm has stability problems when dealing with
large matrices. We used the algorithm implemented
in MATLAB for our experiments. Therefore, instead
of improving results, SVD based method spoils the
results obtained by simple descriptors comparison as
with NNDR. Moreover, ORELAX is much faster than
SVD. It is slower than NNDR since the later sim-
ply compares Euclidean distance between descriptors.
ORELAX is faster than CRELAX because of the op-
timization step which lead to a fewer number of iter-
ations for the former.
Table 1: Comparison of different algorithms using the first
and fourth frames of the Graf. sequence.
Methods # of # of MR time
matches correct in s
ORELAX 82 71 0.86 8.56
NNDR 35 26 0.74 2.9
SVD 118 58 0.49 47.1
CRELAX 84 34 0.40 11.41
Additional results for the whole Graf sequence
are shown in Table 3. We see that the relaxation
method with optimization gives better results for
varying viewpoints. ORELAX returns the highest
number of matches with the highest MR. NNDR gives
the second best performance but it returns about twice
less matches than ORELAX. SVD and CRELAX
have poor MR for the last three frames. The num-
ber of matches goes down sensibly when viewpoint
change becomes important. For example, viewpoint
change between the first and fifth frames of the Graf
sequence is greater than 50 degres. The SIFT descrip-
tor cannot cope with such large viewpoint as reported
in (Mikolajczyk and Schmid, 2005; Delponte et al.,
2006).
Results obtained for the Boat sequence are pre-
sented in Table 4. They are similar to those obtained
with the Graf sequence but there are more correct
matches for the last frames. Moreover, the MR of
ORELAX and NNDR is almost always equal to 1 ex-
cept for the last frame. This means that the SIFT de-
scriptor is more suited to rotation and scale changes
than to large viewpoint changes.
In the case of repetitive patterns as in Figure 1,
the relaxation method with optimization gives better
results. Results presented in Table 2 show that a sim-
ple comparison of descriptor fails to find enough cor-
rect matches. The number of matches provided by
NNDR is too small and the the proportion of outliers
obtained by SVD is too high. Therefore, an estima-
tion of the geometric transformationby a method such
as RANSAC will fail. On the contrary, ORELAX
gives almost six times more matches than NNDR with
a high MR. It is important to emphasize that in this
difficult case, one should consider a high number of
potential matches for each point to reduce ambiguity.
We set K = 7. See Section 4.3 for a discussion about
the influence of algorithm’s parameters.
Table 2: Comparison of different algorithms using the im-
ages in Figure 1.
Methods # of matches # of correct MR
matches
ORELAX 38 25 0.66
NNDR 6 3 0.50
SVD 60 21 0.35
CRELAX 120 26 0.22
4.2 Object Recognition
We also compare the different algorithms in a case
of features-based object recognition. The results are
shown in Figure 4. For the first experiment, a book is
placed on a desktop such that it is partially occluded
and has its scale, orientation and viewpoint changed.
There are respectively 193 and 2541 interest points
detected on the object, shown in the top of Figure 4a,
and on the entire scene shown in the bottom of Fig-
ure 4a. ORELAX finds 20 matches, all of which are
corrects, while NNDR finds only 7. CRELAX finds
8 correct matches and SVD gives 12 correct matches
over 18 detected matches.
For the more difficult case shown in Figure 4b,
ORELAX finds 5 correct matches over 8 detected
matches, while the other methods fail to find any cor-
rect matches. Note that the images are shown at their
actual relative scale.
4.3 Influence of Algorithm’s Parameters
Results depend on the values of the algorithm’s pa-
rameters. It is clear that the greater V and K are,
the more accurate the method will be at the cost of
a longer processing time.
To measure the influence of the parameter α, we
use the same pair of images. The results presented in
Table 5 show that if more importance is given to the
consistency term of the criterion, i.e. α > 0.5, then
the MR of the method increases but the number of
matches decreases. On the contrary, if more impor-
tance is given to the ambiguity term, i.e. α < 0.5,
Table 3: Comparison of different algorithms using the Graf. sequence.
Frame ORELAX NNDR SVD CRELAX
number # of matches MR # of matches MR # of matches MR # of matches MR
2 530 0.98 261 0.97 363 0.89 384 0.94
3 180 0.93 91 0.70 222 0.67 198 0.58
4 82 0.86 35 0.74 118 0.49 84 0.40
5 11 0.72 3 0.67 60 0.13 35 0.03
6 5 0.6 9 0 40 0.1 27 0
Table 4: Comparison of different algorithms using the Boat. sequence.
Frame ORELAX NNDR SVD CRELAX
number # of matches MR # of matches MR # of matches MR # of matches MR
2 620 0.99 294 0.98 427 0.91 490 0.93
3 488 0.99 183 0.99 365 0.87 289 0.94
4 127 0.99 58 0.99 125 0.84 97 0.45
5 75 0.99 46 0.99 76 0.83 79 0.3
6 8 0.75 6 0.5 47 0.45 46 0.08
a)
b)
Figure 4: Examples of recognition results. a): ORELAX
finds 20 matches all of which are correct. b): ORELAX
finds 8 matches and 5 of them are correct.
then the number of matches increases while the MR
of the method decreases.
There is a balance to find between the matching
rate and the number of detected matches. We have
found that the values V = k = 5 (K = 7 in the case
of repetitive patterns) were sufficient for our experi-
ments. The results presented in the previous sections
are obtained with α = 0.5, i.e. consistency and ambi-
guity measures are given the same importance.
Table 5: Influence of the parameter α using the images of
Figure 1.
α # of matches # of correct MR
matches
0.3 88 42 0.48
0.5 38 25 0.66
0.7 23 17 0.74
0.9 15 13 0.87
5 CONCLUSION
In this paper a fast and robust image matching method
is proposed. The method is based on relaxation la-
beling technique and optimization. We showed that
writing the criterion to mimize in a convenient way
and using a distinctive descriptor such as SIFT, the
complexity of the algorithm can be considerably re-
duced. Futhermore, we showed how the necessary
contextual information can be obtained in order to im-
prove matching results and reduce the number of mis-
matches. Experimental results in case of wide base-
line and in case of object recognition show that this
approach gives superior results compared with other
matching methods. Roughly speaking, we gain at
least 30% on the number of matches and the number
of correct matches as well. We obtain, in most experi-
ments we have done, a very small error rate which al-
low us to avoid an additional step of outliers rejection
by estimating the geometric transformation between
the pair of images.
In the future we are going to investigate the use of
several image features into a single matching process
and the matching of non-rigid objects and non-planar
scenes.
REFERENCES
Baumberg, A. (2000). Reliable feature matching across
widely separated views. In Proc. Conf. Computer Vi-
sion and Pattern Recognition, pages 774–781.
Delponte, E., Isgro, F., Odone, F., and Verri, A. (2006).
Svd-matching using sift features. Graphical Models,
68:415–431.
Faugeras, O. D. and Berthod, M. (1981). Improving consis-
tency and reducing ambiguity in stochastic labeling:
An optimization approach. IEEE PAMI, 3(4):412–
424.
Hummel, R. A. and Zucker, S. W. (1983). On the funda-
tions of relaxation labeling processes. IEEE PAMI,
5(3):267–287.
Lowe, D. G. (1999). Object recognition from local scale-
invariant features. In International Conference on
Computer Vision, pages 1150–1157. Corfu, Greece.
Matas, J., Chum, O., Urban, M., and Pajdla, T. (2002). Ro-
bust wide baseline stereo from maximally stable ex-
tremal regions. In Proc. 13th British Machine Vision
Conference, pages 384–393.
Mikolajczyk, K. and Schmid, C. (2002). An affine invariant
interest point detector. In European Conference on
Computer Vision (ECCV’2002). Copenhag, Denmark.
Mikolajczyk, K. and Schmid, C. (2004). Sacle & afne
invariant interest point detectors. Internationl Journal
of Computer Vision, 60(1):63–86.
Mikolajczyk, K. and Schmid, C. (2005). A performance
evaluation of local descriptors. IEEE Trans on PAMI,
27(10):1615–1630.
Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A.,
Matas, J., Schaffalitzky, F., Kadir, T., and Gool, L. V.
(2005). A comparison of affine region detectors. Inter-
nationl Journal of Computer Vision, 65(1/2):43–72.
Montesinos, P., Gouet, V., Deriche, R., and Pele, D. (2000).
Matching color uncalibrated images using differential
invariants. Image and Vision Computing, 18:659–671.
Pelillo, M. and Refice, M. (1994). Learning compatibility
coefcients for relaxation labeling processes. IEEE
PAMI, 16:933–945.
Price, K. E. (1985). Relaxation matching techniques - a
comparison. IEEE PAMI, 7(5):617–623.
Rosenfeld, A., Hummel, R., and Zucker, S. (1976). Scene
labeling by relaxation operations. IEEE Trans. Sys-
tems. Man Cybernetics, 6:420–433.
Schaffalitzky, F. and Zisserman, A. (2002). Multi-view
matching for unordered image sets. In Proc. 7th Euro-
pean Conference on Computer Vision, pages 414–431.
Schmid, C. and Mohr, R. (1997). Local grayvalue invariants
for image retrieval. PAMI, 19(5):530–534.
Tuytelaars, T. and Van Gool, L. (2004). Matching widely
separated views based on afne invariant regions. In-
ternational Journal of Computer Vision, 59(1):61–85.
Zhang, Z., Deriche, R., Faugeras, O., and Luong, Q.-T.
(1995). A robust technique for matching two uncal-
ibrated images through the recovery of the unknown
epipolar geometry. AI Journal, 78:87–119.
APPENDIX
Re-writing the Criterion with Matrices
The criterion to be minimized can be written:
C(x) = αC
1
(x) + (1α)C
2
(x)
=
α
2n
n
i=1
kp
i
q
i
k
2
+
(1 α)m
m 1
"
1
1
n
n
i=1
kp
i
k
2
#
= c
1
n
i=1
kp
i
q
i
k
2
c
2
n
i=1
kp
i
k
2
+ c
3
with c
1
=
α
2n
, c
2
=
(1α)m
(m1)n
and c
3
= nc
2
.
One wants to put C on the form:
C([x
1
,..., x
n
]
T
) =
1
2
n
t=1
n
p=1
x
T
t
H
tp
x
p
+ cte
Let remark that the constant is equal to c
3
. So one has:
C(x) =
n
i=1
(c
1
kx
i
q
i
k
2
c
2
kx
i
k
2
) + c
3
=
n
i=1
(c
1
(x
i
q
i
)
T
(x
i
q
i
) c
2
x
T
i
x
i
) + c
3
= (c
1
c
2
)
n
i=1
x
T
i
x
i
|
{z }
A
2c
1
n
i=1
x
T
i
q
i
|
{z }
B
+c
1
n
i=1
q
T
i
q
i
|
{z }
C
+c
3
The criterion is the weighetd sum of three terms which one
notes respectively A, B and C. Let define the following two
symbols:
δ
tp
=
1 if t = p
0 otherwise
Λ
tp
=
1 if a
p
V
t
0 otherwise
Then, it is easy to show that:
A =
n
t=1
n
p=1
x
T
t
A
tp
x
p
where t, p {1. ..n}, A
tp
= δ
tp
I
m
B =
n
t=1
n
p=1
x
T
t
B
tp
x
p
where t, p {1,.. . ,n}, B
tp
=
Λ
tp
|V
t
|
w
tp
P
tp
and P
tp
is the
matrix of size m× m containning the conditional probabili-
ties p
tp
(k,l), and |V
t
| = #{V
t
}.
C =
n
t=1
n
p=1
x
T
t
C
tp
x
p
where t, p {1,. .. ,n}, C
tp
=
n
i=1
(B
T
it
B
ip
)
Finally,
C([x
1
,. .., x
n
]
T
) =
1
2
n
t=1
n
p=1
x
T
t
H
tp
x
p
+ c
3
with
t, p {1,... ,n}, H
tp
= 2(c
1
c
2
)A
tp
4c
1
B
tp
+2c
1
C
tp