SFM approach, which supplies a good starting point
for global bundle adjustments. In addition, such a in-
cremental system itself is useful in many applications.
Imagining the case that a global bundle adjustment is
not affordable due to the efficiency requirement, an
incremental SFM approach which can provide a re-
sult with almost the same precision is really needed.
The local bundle adjustment is adopted in our ap-
proach for incrementally expanding the current recon-
struction. A problem arising is that the local bundle
adjustment of each expansion step might change the
existing reconstruction. Obviously these changes will
bring extra errors to the previous reconstruction. And
more importantly, with increasing sequence length,
these relatively small changes can be accumulated to
a sufficient amount that invalidate the previous recon-
struction. In this case, the incremental algorithm is
at risk of losing the basis for further expansion. To
alleviate this problem, we adopts an update propaga-
tion method, which modify the entire reconstruction
to cater for changes brought by the local bundle ad-
justment. Compared to the pure local bundle adjust-
ment, our approach is clearly slower due to the extra
expense of the update propagation, but gains a signif-
icant accuracy improvement.
In this paper, we assume the intrinsic parameter of
the camera is known and fixed, i.e., we will not deal
with the self-calibration problem.
The rest of this paper is organized as: Section
2 introduces some math notations used in this paper
and presents an overview of our approach. Section
3 describes how the reconstruction is initialized from
reference frames. Section 4 introduce our incremen-
tal method for reconstruction expansion. Section 5
presents the results. Section 6 concludes this paper.
2 THE FRAMEWORK
We first introduce the math notation used throughout
this paper. Suppose we are given a sequence with n
frames: {I
i
}
n−1
0
. For each I
i
, the corresponding cam-
era is modeled by the intrinsic parameter K, which is
a 3 ×3 up-triangle matrix, (we assume K remains un-
changed across the frame), and the extrinsic param-
eter [R
i
| t
i
], where R
i
is a 3 × 3 orthonormal rota-
tion matrix and t
i
= (tx
i
,ty
i
,tz
i
) is a translation vec-
tor. The projection matrix of such a camera is that:
P = K · [R
i
| t
i
].
To reduce the parameter number, the rotation of
each camera can be described by its Euler angle:
ω
i
= (α
i
, β
i
, γ
i
). In this case, the camera C
i
is parame-
terized by a 6-vector: C
i
= (α
i
, β
i
, γ
i
,tx
i
,ty
i
,tz
i
). The
projection of a 3D points X
j
= [x
i
, y
i
, z
i
]
T
onto a 2D
image by C
i
can be expressed by a non-linear func-
tion Θ such that: Θ(C
i
, X
j
).
For a sequence to be reconstructed, sparse points
are matched consecutively, i.e., I
1
against I
0
, I
2
against I
1
and etc. The point matching is based on cer-
tain feature tracking techniques, (we have tried bother
SIFT (Lowe, 2004) and KLT points (Shi and Tomasi,
1994)). The Random Sample Consensus (RANSAC)
algorithm is used to fit the fundamental matrix that
encapsulates the epipolar constraint. Therefore, the
track of a point can be lost in a given frame for two
reasons: there is no matched point or the match does
not conform to the epipolar constraint. For each track,
a 3D point can be constructed if corresponding cam-
era parameters are known.
If the camera moves slowly, we resample the se-
quence to select some keyframes that have wider
base-lines. In addition, since information contained
in a long track is more reliable than a short one. We
only consider tracks that are not shorter than certain
minimal track length (T ). T is usually set to be 3
keyframes.
The reconstruction process is initialized from first
several frames. Then more frames are added incre-
mentally to expand the current reconstruction. Each
expansion is accomplished by two steps: 1. Firstly,
the local bundle adjustment is performed over several
neighbouring frames around the newly added frames.
2. Then, the update brought by the local bundle ad-
justment is propagated to the entire reconstruction.
After all frames are processed, if the remaining re-
projection error is still large, an optional global bun-
dle adjustment is applied to refine the result. Finally,
Cameras of non-keyframes are calculated based on
the reconstructed 3D points, if the sequence has been
resampled.
3 RECONSTRUCTION
INITIALIZATION
The first (0
th
) frame is always selected as the refer-
ence frame, such that R
0
= I and t
0
= 0. We select
the first s (s ≥ 3) frames to initialize the reconstruc-
tion. The length of s is based on how many tracks
can be seen from both the 0
th
and the i
th
(0 < i < s)
frame. The minimal number of such tracks is set to
30. For each of non-reference initialization frames,
its fundamental matrix against the reference frame F
i
is discovered based on inlier matches. With the intrin-
sic matrix K, the epipolar constraint can be upgraded
from the fundamental matrix to an essential matrix E
i
:
E
i
= K
T
· F
i
· K.
Then, the essential matrix is decomposed into an
VISAPP 2010 - International Conference on Computer Vision Theory and Applications
362