parameter sets contaminated with outliers is an indis-
pensable part of our algorithms, so in the majority of
cases, robust methods must be applied and every pos-
sibility of speeding up the processing must be consid-
ered. Therefore, manipulating simple RANSAC by
means of T
d,d
test (with d = 1 or 2) has turned out to
be quite useful in our implementation. Also, we must
take care of critical motions since the results obtained
during this stage of reconstruction of a sub-sequence
will be used to obtain camera parameters in the fol-
lowing frames. The following observations have been
made:
• If for a large number of frames GRIG(F) >
GRIG(H), then either the scene contains some
dominant plane(s) or the baseline made up by
the cameras between two key-frames is not wide
enough. In the first case, the linear solution for
camera resection will not work ((HarZis2000),
pp.178–180). In certain cases, one can use
homography-based reconstruction methods as
the method of camera resection by plane-by-
parallax, as proposed in (HarZis2000), chapter
18, see also (Mat2005).
• If the epipole lies inside of the image domain, the
points close to the epipole should be discarded
from triangulation, because their position in at
least one direction will be unstable. Another pos-
sibility is to take only the points which satisfy
some severe cost function such as:
2
∑
i=1
(
ˆ
x
∗
i
−
ˆ
x
i
)
2
< s· exp
−
b
d
2
i
, x
∗
i
= P
i
X ,
where P
1
,P
2
are the camera matrices extracted
from the key-frames, x,X is a 2D (respectively:
corresponding 3D) point, d
i
is the distance from
ˆ
x
i
to the epipole
ˆ
e
i
and s,b are some positive con-
stants.
• The forward and backward motion usually has
both of the negative effects described above. Ac-
tually, the homography will be the suitable model
to describe the position of points in the direction
of the epipole and the epipole will be found inside
of the image. In this case, we not only discard
the points close to the epipole but also reduce the
threshold s by the factor 2.
The reconstructionof a sub-sequence continues by
extrapolation of the previous results to the frames af-
ter the second key frame. We obtain new camera ma-
trices by resection with the already known 3D-points
(via RANSAC followed by a non-linear error mini-
mization) and we obtain new 3D-points by triangu-
lation from the known cameras (usually 3–5). The
frame, where the number of either triangulation- or
resection-inliers is small, marks the end of the sub-
sequence. If the number of the unfeasible frame is
n, then the frame number n − 1 is the last frame of
the first sub-sequence and the first key-frame of the
next sub-sequence is n− 2. This is because we cannot
trust the camera number n of the first sub-sequence,
and, as we will see below, we need at least a dou-
ble camera overlap. Of course, the second recon-
struction will be obtained in a different coordinate
system, therefore both reconstructions are ”fused” by
means of the common cameras P
old
n−2
,P
old
n−1
,P
new
1
,P
new
2
and points X
new
,X
old
seen both in old and new views.
The task is to find a 3D-homographyH which satisfies
P
old
= P
new
H and X
old
= H
−1
X
new
(such a homogra-
phy exists by Theorem 9.10 in (HarZis2000)). The
method we propose works as follows:
First of all, the linear solution is calculated: if
we consider camera matrices P
old
,P
new
H as row
vectors with 12 elements, the vector representing
the algebraic error from a single camera pair is
(P
old
)
k
(P
new
H)
1
− (P
old
)
1
(P
new
H)
k
for k = 2, ...,12.
Clearly, each pair of projection matrices contributes
11 equations, therefore a double camera overlap is
enough to determine 16 entries of the homogeneous
quantity H. In order to refine the initial value for H,
the squared geometric error
ε =
overlap
∑
j=1
P
new
j
HX
old
−
ˆ
x
n− j
2
(1)
is calculated for each 3D-points X
old
obtained in the
first reconstruction and visible in the relevant views.
Similar error is obtained for 3D-points in the new co-
ordinate frame. Now, if the error obtained by repro-
jecting an old 3D-point with the new cameras (as in
(1) ) or vice versa is low, this point is considered to
be an inlier. In the case where there are only a few
inliers, the initial estimate of H is poor. In this sit-
uation (which, for example, can happen if the cen-
ters of both cameras coincide), we consider just a
single camera overlap P
new
1
,P
old
n−2
and the correspon-
dences of reprojected points X,x, as pointed out in
(Nister2001), pp.64–65. Four such correspondences
are enough to generate a RANSAC-hypothesis from
which H can be computed. At each case, after an
initial estimate of H has been obtained, the iterative
minimization of the error given by (1) is performed
over all inliers. Given H, the new cameras and points
can be mapped into the old coordinate frame.
VISAPP 2008 - International Conference on Computer Vision Theory and Applications
478