ing over time (Hall-Holt and Rusinkiewicz, 2001;
Rusinkiewicz et al., 2002). These stripe encodings
are efficient for the purpose of structured light scan-
ning, but have shortcomings. They can only deter-
mine one-to-one pixel mappings. While acceptable
for many 3D scanning purposes, the inability to deal
with mixtures of pixels can result in artifacts.
Scharstein and Szeliski (2003) projected both
Gray-coded stripes and sine waves of different spatial
frequencies. They note that binary codes can be dif-
ficult to measure in the presence of low scene albedo
or low signal-to-noise ratio and overcame this by pro-
jecting both the binary code and its inverse. In gen-
eral though, binary codes are very robust. Methods
based on absolute amplitude measurements are highly
dependent upon accurate radiometric calibration and
consistent scene albedo.
Environment Matting estimates the occlusion-free
light transport matrix between a 2D background and
camera image. Zongker et al. (1999) used binary
stripe patterns both horizontally and vertically to ob-
tain correspondences in the form of rectangular axis-
aligned regions on the background for each camera
pixel. This method suffers from ambiguities in cases
where two disjoint regions on the background map to
a single camera pixel Chuang et al. (2000) proposed a
number of improvements to this algorithm, including
one that generalizes the axis-aligned boxes to oriented
Gaussian regions of influence, and one that resolves
the bimodal distribution ambiguity via additional (po-
tentially redundant) diagonal sweeps.
For specular correspondences with small spatial
support, it is possible to derive algorithms that require
significantly fewer images by employing learning ap-
proaches (Wexler et al., 2002) or even single images
by optical flow (Atcheson et al., 2008).
Peers and Dutr
´
e (2003) proposed the use of
wavelets as illumination patterns for environment
matting. Their initial algorithm was adaptive, i.e. it
required processing the results of captured images to
decide which patterns to project next, drastically in-
creasing the acquisition time. This disadvantage was
remedied in a later work (Peers and Dutr
´
e, 2005), in
which the authors use sparsity priors to project re-
sults obtained with a fixed set of illumination patterns
into a new wavelet representation. While this method
produces excellent results for wide point spread func-
tions, it is less applicable to sharp PSFs.
Zhu and Yang (2004) have proposed a temporal
frequency-based coding scheme whereby the inten-
sity of each pixel is set according to a 1D signal (a
sinusoid). Our intra-tile coding scheme is based on
this method but employs a second carrier, ninety de-
grees out of phase of the primary sinusoid, to double
the information density at no extra cost. The use of
only integral frequencies satisfies the Nyquist ISI cri-
terion and allows for very fast, easy and robust DFT-
based decoding. We choose to uniquely code individ-
ual pixels (within each tile) rather than coding whole
rows and columns of the illuminant. This allows
our method to scale up to higher illuminant resolu-
tions, and to naturally handle PSFs of arbitrary (small)
shape, rather than assuming a parametric form.
Light Transport Matrix. Recent papers have fo-
cused on the general problem of estimating the light
transport matrix between illuminant and camera pix-
els. Most employ strategies similar to those used
in environment matting. Sen et al. (2005) propose
a hierarchical decomposition into non-interfering re-
gions. The adaptive approach requires many images
to resolve PSFs overlapping multiple regions.
Garg et al. (2006) note that the light transport ma-
trix is often data-sparse. They exploit this, along with
its symmetry due to Helmholtz reciprocity, in their
adaptive acquisition algorithm that divides the matrix
into blocks and approximates each with a rank-1 fac-
torization. Wang et al. (2009) similarly seek a low-
rank approximation to the full matrix. However, they
do so by densely sampling rows and columns of the
matrix (which requires a complex acquisition setup)
and then using a kernel Nystr
¨
om method to recon-
struct the full matrix. These methods assume the ma-
trix to be data-sparse (compressible).
Methods based on compressed sensing are begin-
ning to appear. Sen and Darabi (2009) and Peers et al.
(2009) both describe promising, non-adaptive, meth-
ods that transform the light transport into a wavelet
domain in which it is more sparse. While these meth-
ods allow for capturing very complex light trans-
port, they still require on the order of hundreds to
thousands of images at typical resolutions, and many
hours of decoding time to obtain results.
Our method combines advantages of many of the
aforementioned works in that it is both scalable and
robust, while being conceptually simple and easy to
implement. For typical configurations we require on
the order of a few hundred images that can be ac-
quired non-adaptively in seconds and then processed
in minutes on a standard desktop computer. Unlike
more advanced light transport acquisition methods,
we cannot acquire large, diffuse PSFs (one-to-many
correspondences). But for the case of small, finite
PSFs, those methods require many images to resolve
high frequency detail. In contrast, our method effi-
ciently captures accurate data at much lower cost in
terms of acquisition and processing time.
VISAPP 2012 - International Conference on Computer Vision Theory and Applications
248