sary. Experimental results on real and synthetic data
sets indeed show an overall improvement in the ac-
curacy of the dense correspondence set after applying
the procedure.
Error detection for dense correspondences has
been done in the past, but either under simplify-
ing assumptions or with respect to ground-truth data.
In (Xiong and Matthies, 1997), matching errors are
identified and corrected, but only one specific scene
type is handled. The algorithm in (Mayoral and Au-
rnhammer, 2004) evaluates matching algorithms by
introducing an error surface from matching errors.
In both cases, the simplifying assumption of search-
ing for disparity along scanlines is made. An ex-
haustive overview and evaluation of dense correspon-
dence algorithms is given in (Scharstein and Szeliski,
2002), though the comparisons are done with respect
to ground-truth values. As for error correction, an al-
gorithm known as optimal triangulation (Hartley and
Zisserman, 2004) makes an attempt to correct corre-
spondences based on the pre-computed epipolar ge-
ometry between the scenes. However, such a cor-
rection, while mathematically correct and obtained
by minimizing a geometrically meaningful criterion,
does not necessarily produce matches that are correct
in reality; it also reduces reprojection error after re-
construction to zero, thus preventing error detection
using such a criteria.
An initial reconstruction of the scene from the two
input views is needed as part of the algorithm, so a
brief overview of the relevant literature on this sub-
ject is now given. In general, a reconstruction pipeline
consists of obtaining matches (correspondences) be-
tween the images, then computing the relative camera
poses between them and finally computing the struc-
ture of the scene. The matches used for the initial
pose estimation can either be sparse features (for ex-
ample corners) or dense correspondences, which as-
sign a correspondence in a destination image to each
source image position, and can be computed through
a variety of methods (Scharstein and Szeliski, 2002).
For two views, the epipolar geometry between them,
encapsulated by the fundamental matrix F (Hartley
and Zisserman, 2004), can be computed from the ini-
tial matches. This matrix can be computed through
direct methods, such as in (Stew´enius et al., 2006;
Hartley and Zisserman, 2004) as well as through non-
linear methods (Hartley and Zisserman, 2004). The
RANSAC algorithm can be coupled with these meth-
ods to help obtain more robust estimates for F. Using
the computed epipolar constraints, more matches can
be generated across the images to obtain dense cor-
respondences (details can be found in (Hartley and
Zisserman, 2004)). Again, an issue with such con-
strained correspondences is that the new matches de-
pend directly on the quality of the estimated epipolar
geometry, making them mathematically valid but not
necessarily correct.
Once matches are available, either sparse or dense,
the relative pose (rotation and translation) between the
cameras viewing the scene can be computed. Several
methods exist, and an overview of different pose es-
timators is given in (Rodehorst et al., 2008). In the
particular case that the F matrix is available or has
been computed from matches, and if the camera’s in-
trinsic parameters (such as the focal length, skew and
principal point) are assumed known, the essential ma-
trix E can be computed and decomposed into the rel-
ative rotation and translation. Finally, the scene’s 3D
structure can be obtained using the available sparse
or dense matches. Typically, linear or optimal tri-
angulation (Hartley and Zisserman, 2004) is applied
on each correspondence pair to generate a 3D posi-
tion corresponding to the scene structure. Once pose
and structure estimates are available, a common fine-
tuning step for both estimates is to carry out a bun-
dle adjustment, where the total reprojection error of
all computed 3D points in all cameras is minimized
using non-linear techniques (Hartley and Zisserman,
2004). Fortunately, sparsity in the data has allowed
for great speed-ups in this process (Lourakis and Ar-
gyros, 2000).
By coupling the use of unconstrained dense cor-
respondences in a bundle-adjusted reconstruction
pipeline, a novel mechanism to identify the most inac-
curate dense correspondences and correct them using
an iterativemethod can be achieved. The entire proce-
dure will be described in detail in Section 2, followed
by experimental results (Section 3) and conclusions
(Section 4).
2 PROPOSED ALGORITHM
2.1 Pose and Structure Estimation
based on Dense Correspondences
The first step in our algorithm is to compute uncon-
strained dense correspondences between two images,
for which a sub-pixel accuracy direct method which
solves coarse-to-fine on 4 − 8 mesh image pyramids
with a 5x5 local affine motion model was used, as out-
lined in (Duchaineau et al., 2007). There are several
reasons for starting out with such a general-purpose
dense correspondence algorithm. First of all, our in-
tended applications, such as dense scene reconstruc-
tion and image stitching, call for the use of dense
ITERATIVE DENSE CORRESPONDENCE CORRECTION THROUGH BUNDLE ADJUSTMENT
FEEDBACK-BASED ERROR DETECTION
401