been explored in recent years. In this section, we
discuss the most closely related work in image-based
3D reconstruction.
(Brown and Lowe, 2005) presented an image-
based modelling system which aims to recover
camera parameters, pose estimates and sparse 3D
scene geometry from a sequence of images.
(Snavely et al., 2006) presented the Photo
Tourism (Photosynth) system which is based on the
work of Brown and Lowe, with some significant
modifications to improve scalability and robustness.
(Schaffalitzky and Zisserman, 2002) proposed
another related technique for calibrating unordered
image sets, concentrating on efficiently matching
points of interest between images. Although these
approaches address the same SFM concepts as we
do, their aim is not to reconstruct and visualise 3D
scenes and models from images, but only to allow
easy navigation between images in three dimension.
(Debevec et al., 1996) introduced the Facade
system for modelling and rendering simple
architectural scenes by combining geometry-based
and image-based techniques. The system requires
only a few images and some known geometric
parameters. It was used to reconstruct compelling
fly-throughs of the Berkeley campus and it was
employed for the MIT City Scanning Project, which
captured thousands of calibrated images from an
instrumented rig to compute a 3D model of the MIT
campus. While the resulting 3D models are often
impressive, the system requires input images taken
from calibrated cameras.
(Hua et al., 2007) tried to reconstruct a 3D
surface model from a single uncalibrated image. The
3D information is acquired through geometric
attributes such as coplanarity, orthogonality and
parallelism. This method only needs one image, but
this approach often poses severe restrictions on the
image content.
(Criminisi et al., 1999) proposed an approach
that computes a 3D affine scene from a single
perspective view of a scene. Information about
geometry, such as the vanishing lines of reference
planes, and vanishing points for directions not
parallel to the plane, are determined. Without any
prior knowledge of the intrinsic and extrinsic
parameters of the cameras, the affine scene structure
is estimated. This method requires only one image,
but manual input is necessary.
2.2 Surface Reconstruction
Surface reconstruction from point clouds has been
studied extensively in computer graphics in the past
decade. A Delaunay-based algorithm proposed by
(Cazals and Giesen, 2006) typically generates
meshes which interpolate the input points. However,
the resulting models often contain rough geometry
when the input points are noisy. These methods
often provide good results under prescribed
sampling criteria (Amenta and Bern, 1998).
(Edelsbrunner et al., 1994) presented the well-
known α-shape approach. It performs a
parameterised construction that associates a
polyhedral shape with an unorganized set of points.
A drawback of α-shapes is that it becomes difficult
and sometimes impossible to choose α for non-
uniform sampling so as to balance hole-filling
against loss of detail (Amenta et al., 2001).
(Amenta et al., 2001) proposed the power crust
algorithm, which constructs a surface mesh by first
approximating the medial axis transform (MAT) of
the object. The surface mesh is then produced by
using an inverse transform from the MAT.
Approximate surface reconstruction works
mostly with implicit surface representations
followed by iso-surfacing. (Hoppe et al., 1992)
presented a clean abstraction of the reconstruction
problem. Their approach approximated the signed
distance function induced by the surface F and
constructed the output surface as a polygonal
approximation of the zero-set of this function.
Kazhdan et al. presented a method which is based on
an implicit function framework. Their solution
computes a 3D indicator function which is defined
as 1 at point inside model and 0 as point outside
model. The surface is then reconstructed by
extracting an appropriate isosurface (Kazhdan et al.,
2006).
3 METHODOLOGY
3.1 Feature Matching
The input for our reconstruction algorithm is a
sequence of images of the same object taken from
different views. The first step is to find feature
points in each image. The accuracy of matched
feature points affects the accuracy of the
fundamental matrix and the computation of 3D
points significantly. Many sophisticated algorithms
have been proposed such as the Harris feature
extractor (Derpanis. K, 2004) and the SUSAN
feature extractor (Muyun et al., 2004). We use the
SIFT (Scale Invariant Feature Transform) operator
to detect, extract and describe local feature
descriptors. Feature points extracted by SIFT are
GRAPP 2011 - International Conference on Computer Graphics Theory and Applications
68