motion directions with high support, in this case from
wrong tentative matches.
Figure 3(a) shows an even more difficult pair since
only 1.4%, i.e. 50, tentative matches are consistent
with the true motion. There are very many wrong ten-
tative matches on bushes where local image features
are all small and green. Thus, many motion directions
get high support from wrong matches. The true mo-
tion has the highest support but its peak is very sharp
and thus difficult to find in limited time.
Figure. 4(a) shows a very difficult pair that con-
tains large camera rotation and many repetitive fea-
tures which generate wrong tentative matches. In this
case, the motion supported by the largest number of
tentative matches is incorrect. Notice that the peak of
the likelihood of matches in Fig. 4(c) does not corre-
spond to the direction of the true motion.
All the above examples can be solved correctly by
the technique presented in this paper.
The state of the art technique for finding
relative camera orientations from image matches
first establishes tentative matches by pairing image
points with mutually similar features and then uses
RANSAC (Fischler and Bolles, 1981; Hartley and
Zisserman, 2004; Chum and Matas, 2005) to look for
a large subset of the set of tentative matches which
is, within a predefined threshold θ, consistent with
an epipolar geometry (Hartley and Zisserman, 2004).
Unfortunately, this strategy does not always recover
the epipolar geometry generated by the actual cam-
era motion. This has been observed, e.g., in (Li and
Hartley, 2005).
Often, there are more models which are supported
by a large number of matches. Thus the chance that
the correct model, even if it has the largest support,
will be found by running a single RANSAC is small.
Work (Li and Hartley, 2005) suggested to generate
models by randomized sampling as in RANSAC but
to use soft (kernel) voting for a physical parameter,
the radial distortion coefficient in that case, instead
of looking for the maximal support. The best model
is then selected as the one with the parameter closest
to the maximum in the accumulator space. This strat-
egy works when the correct, or almost correct, models
met in the sampling provide consistent values of the
parameter while the incorrect models with high sup-
port generate different values of the parameter. Here
we show that this strategy works also when used for
voting in the space of motion directions.
It has been demonstrated in (Chum and Matas,
2005) that ordering the tentative matches by their
similarity may help to reduce the number of sam-
ples in RANSAC. Paper (Chum and Matas, 2005)
brought two main contributions. First, PROSAC sam-
pling strategy has been suggested which allows to uni-
formly sample from the list of tentative matches or-
dered ascendingly by the distance of their descriptors.
It allows to start by drawing promising samples first
and often hit sufficiently large configuration of good
matches early. The second contribution concerned a
modification of the RANSAC stoping criterion (Hart-
ley and Zisserman, 2004, p. 119) to be able to deal
with very long sets of tentative matches without the
necessity to know their number beforehand.
When working with perspective images, it is gen-
erally accepted (Hartley and Zisserman, 2004) that
the best way to evaluate the quality of an epipolar
geometry is to look at image reprojection errors. This
is, for two images, equivalent to evaluating the dis-
tances of image points to their corresponding epipolar
lines. We compared the image reprojection error with
the residuals evaluated as the angle between rays and
their corresponding epipolar planes, which we refer
as the angular error here. In our experience, when
cameras are calibrated, the angular error can safely be
used instead of the image reprojection error. To be
absolutely correct, every ray should be accompanied
by a covariance matrix determining its uncertainty.
The matrix depends on (i) image measurement error
model and (ii) on the point position in the image. The
point position determines how the unit circle around
the point maps into the cone around the ray. In this
paper we neglected the variability of the covariance
matrix across the field of view and assumed it to be a
scaled identity.
Next we describe how we combine ordered sam-
pling of tentative matches, soft voting, and the ori-
entation (cheirality) constraint (Hartley and Zisser-
man, 2004) on minimal five points used for comput-
ing camera motions to get an algorithm which solves
all camera motions.
2 THE ALGORITHM
Algorithm 1 presents the pseudocode of the algorithm
used to generate results described in this work. Next
we describe the key parts of the algorithm in detail.
2.1 Detecting Tentative Matches and
Computing their Descriptors
MSER (Matas et al., 2004), Harris-Affine and
Hessian Affine (Mikolajczyk et al., 2005) affine
covariant feature regions are detected in images.
These features are alternative to popular SIFT fea-
tures (Lowe, 2004) and work comparably in our situ-
ation. Parameters of the detectors are chosen to limit
VISAPP 2008 - International Conference on Computer Vision Theory and Applications
578