2.2 Triangulation Accuracy
The baseline is the line between two camera centers.
The baseline length is typically very small in consec-
utive frames. Long baselines are required for accu-
rate triangulation. The size of a 3D point’s region
of uncertainty increases as the distance between two
frames decreases. Therefore, the frame selection pro-
cess should seek to maximize the baseline between
the camera positions for key frames, subject to the
constraint that a sufficient number of correspondences
are retained.
2.3 Degeneracy Avoidance
There are two conditions for non-general camera mo-
tion and non-general position of structure known as
degenerate cases when the epipolar geometry is not
defined and methods based on estimation of the fun-
damental matrix will fail (although note that the frame
pair may still be useful for resectioning, in which we
estimate only the camera position from known 3D-2D
correspondences):
Motion Degeneracy: If the camera rotates about its
center with no translation, the epipolar geometry
is not defined.
Structure Degeneracy: When all of the 3D points in
view are coplanar, the fundamental matrix cannot
be uniquely determined from image correspon-
dences alone.
3 PREVIOUS WORK
Here we provide an overview of the most relevant re-
cent work in key frame selection. We mention the
most relevant. Seo et al. (2003) consider three fac-
tors: (a) the ratio of the number of point correspon-
dences found to the total number of point features
found, (b) the homography error, and (c) the spatial
distribution of corresponding points over the frames.
Hartley and Zisserman (2004) state that the homogra-
phy error is small when there is little camera motion
between frames. Homography error is a good proxy
for the baseline distance between two views. Seo et
al. also encourage the use of evenly distributed corre-
spondences over the entire image to obtain the funda-
mental matrix. They derive a score function from the
above mentioned factors which is used to select key
frames. The pair with the lowest score is selected as
a key frame. The authors do not discuss any measure
for degenerate cases.
Pollefeys and van Gool (2002) select key frames
for structure and motion recovery based on a motion
model selection mechanism (Torr et al., 1998) to se-
lect next key frame only once the epipolar geometry
model explains the relationship between the pair of
images better than the simpler homography model.
The distinction between the homography and the fun-
damental is based on the geometric robust informa-
tion criterion (GRIC, Torr, 1998). They discard all
frames based on degenerate cases.
Seo et al. (2008) use the the ratio of the number
of correspondences to the total number of features
found. If the ratio is close to one this means the im-
ages overlap too much and the baseline length will be
small. Under these assumptions, a frame should not
be selected as a key frame. The second measure is
the reprojection error. The pair of frames with mini-
mum reprojection error are categorized as key frames.
But as in their earlier work, no measures are taken for
degenerate cases.
4 METHOD
We treat key frame selection as constrained optimiza-
tion. Given the first frame of a video sequence, we
seek to find the successor frame that 1) has a suffi-
ciently long baseline (via a correspondence ratio con-
straint), 2) does not lead to degenerate motion or
structure, and 3) has the best estimated epipolar ge-
ometry. We introduce our methods to achieve these
criteria in this section.
4.1 Correspondence Ratio Constraint
We use Seo et al.’s (2008) correspondence ratio R
c
as
a proxy for baseline length:
R
c
=
T
c
T
f
, (1)
where T
c
is the number of frame-to-frame point fea-
tures in correspondence for the frame pair under con-
sideration, and and T
f
is the total number of point fea-
tures considered for correspondence. R
c
is inversely
correlated with camera motion: as the camera moves,
features in view tend to leave the scene, and the ap-
pearance of objects in view tends to change with per-
spective distortion, occlusion, and so on.
Although a long baseline is desirable for triangu-
lation accuracy, if the number of corresponding fea-
tures is too low, camera pose estimation accuracy will
suffer. We therefore constrain candidate key frames to
those having a correspondence ratio R
c
between up-
per and lower thresholds T
1
and T
2
. Currently, we set
these thresholds through experimentation.
VISAPP 2010 - International Conference on Computer Vision Theory and Applications
232